Introduction

The Global Burden of Disease study has highlighted that low back pain (LBP) is the leading global contributor to years lived with disability and the sixth global contributor to disability-adjusted life years [1, 2]. The global prevalence of activity-limiting LBP was recently estimated to be approximately 39 % for lifetime prevalence and 18 % for point-prevalence [3]. Only a small proportion of people experiencing LBP seek health care but these account for high costs that represent an important burden to society [4, 5]. The large majority of patients with LBP are labelled as having non-specific LBP (NSLBP) because no underlying pathology or cause can be found [68]. A wide range of health interventions exists for patients with NSLBP and related clinical trials are often summarized in systematic reviews [9, 10]. However, authors of these reviews report that outcomes are inconsistently measured and reported across trials [1113]. This inconsistency may limit the comparison of findings among trials and hinder statistical pooling [14]. In addition, inconsistent reporting can be due to selective reporting bias (e.g. reporting only favourable outcomes in a publication), which may strongly affect the conclusions of systematic reviews [15].

The development and use of core outcome sets (COS) for specific health conditions has been suggested to reduce inconsistency in outcomes measured and reported across clinical trials [14]. A COS represents an agreed set of outcomes that should be measured and reported, as a minimum, in all clinical trials for specific health conditions [16]. Such a set does not restrict measurement or the choice of the primary outcome, but mandates collection and reporting of the COS alongside the outcomes of interest [16]. A COS thus creates a minimum standard of outcomes reported, reducing the risk of selective reporting bias and increasing the validity and statistical power of meta-analyses [17].

The recently launched Core Outcome Measures in Effectiveness Trials (COMET) initiative fosters methodological research and provides methodological guidance on the development of a COS [16]. The expertise accumulated by the Outcome Measures in Rheumatology (OMERACT) initiative is also a fundamental guidance in COS development [18]. A stepwise approach is suggested by both initiatives: first, the core outcome domains should be selected (i.e. ‘what’ to measure), and then the measurement instruments for each domain (i.e. ‘how’ to measure) [16, 19].

In the field of LBP, recommendations for standardized reporting of outcome measurement instruments in clinical studies were formulated at an expert panel discussion held at the 1997 International Forum on LBP in Primary Care (The Hague, The Netherlands) [20]. Specific recommendations were made for five outcome domains (i.e. ‘pain symptoms’, ‘back-related function’, ‘generic well-being’, ‘disability social role’ and ‘satisfaction with care’) [20, 21]. A workshop discussion among LBP researchers during the 2012 International LBP Forum (Odense, Denmark) agreed on the need of updating the existing recommendations [22]. This was motivated by recent advances in understanding of construct development and measurement properties that stress the need to explore whether relevant domains are missing and to critically appraise recommended instruments [22]. Deyo et al. [20] proposed also a parsimonious set of six questions covering the five domains suggested for measurement in LBP clinical research. These questions were extracted from existing questionnaires and were proposed as the minimum to be used in a wide variety of settings, including routine clinical care [20]. This brief set was labelled as ‘Core Outcome Measures Index’ (COMI) by other investigators who assessed its measurement properties and feasibility of implementation [23, 24]. However, it is out of the scope of this study to update the set of questions included in the COMI for LBP.

The aim of this study is to update the existing standardized set of outcome domains and measurement instruments recommended for LBP [20, 21], through the development of a COS. This COS is intended for the measurement of efficacy or effectiveness of health interventions assessed in all clinical trials for patients with NSLBP. We defined NSLBP as “low back pain not attributable to a recognizable, known specific pathology (e.g. infection, tumour, fracture, axial spondyloarthritis)” [25]. The first step in the development of this COS and focus of this manuscript was to perform a Delphi study to reach international consensus on core outcome domains.

Methods

A detailed description of the methods of this Delphi study is presented elsewhere [26]. An International Steering Committee with members from four continents, including researchers, care providers and patients’ representatives, worked on the development of this COS. The day-to-day conduction of the study was performed by a project team of four people (AC, CT, MB, RO) working at the same institution (VU University/VU Medical Center, Amsterdam) who designed and addressed key aspects of the study. The other members of the Committee were regularly consulted by e-mail regarding critical decisions.

The Steering Committee decided to involve four groups of stakeholders in the Delphi study: health care researchers, health care providers, professionals working both as researchers and providers, and patients with NSLBP. Professionals from many fields of clinical research relevant for NSLBP (e.g. orthopaedics, physiotherapy, epidemiology, psychology, rheumatology, rehabilitation medicine) were involved. Patients are judged to be essential in developing COSs as they can bring the perspective of those living with a health condition [16, 18]. Previous COS efforts involving patients or the public identified core outcome domains that were not previously identified by other stakeholders [2729].

The main advantages of a Delphi method include the involvement of informed individuals, anonymity of responses that reduces influence of prominent personalities, and the possibility for Delphi panellists to reconsider their views based on feedback reports of previous rounds [30, 31]. As this project did not involve experiments with patients or study subjects, according to the Dutch Medical Research in Human Subjects Act (WMO), it was exempt from ethical approval. All patients involved were asked for their consent prior to participation and all procedures were conducted according to the Declaration of Helsinki.

Selection of panellists

A list of health care researchers who had extensively published on LBP over the last 10 years (2003–2013) was made by one reviewer (AC) through a structured search in Web of Science (accessed October 7, 2013) and PubMed [26]. Other researchers and health care providers were added to this list through convenience sampling. Patients were recruited through the Steering Committee, seeking people who sought care for a present or past episode of NSLBP and had a fluent understanding of written English. When patients willing to participate were identified, they were contacted by email, given further information on the study and asked for consent to participate. Patients agreeing to participate were sent an information document giving simplified explanations of the terminology used in the study. Members of the Committee were also selected to participate in the Delphi so that they could express their vote on core domains. The final list of potential panellists was managed by the project team and names in the list remained blinded to all those selected for participation.

Generation of a list of potential core domains

The Steering Committee took responsibility for drawing a list of potential core domains that was used in the Delphi study. This list resulted from a search of outcome domains measured in clinical trials included in five recent systematic reviews [12, 13, 32, 33] (one of which not published yet) with addition of the (sub) domains included in the comprehensive International Classification of Functioning (ICF) core set for LBP [34], and in a conceptual model developed to characterize the burden of LBP [35]. This conceptual model and the ICF core set were adopted to account for the patients’ perspective in this early phase. The model on the burden of LBP was developed by asking different stakeholders (including patients) which aspects of health were the most relevant to them [35]; the comprehensive ICF core set was shown to cover all health issues identified by patients with LBP [36]. The OMERACT Filter 2.0 framework was used to structure the list of potential core domains, subdividing it into four core areas that encompass the complete content of what is potentially measurable in a clinical trial (“Appendix I”) [19]. To determine wording and definitions of the potential core domains, terminology used in existing health frameworks or COSs were consulted: ICF [37], Patient Reported Outcomes Measurement Information System (PROMIS) [38], Wilson and Cleary Model [39] and IMMPACT [40, 41].

Delphi procedure

Three Delphi rounds, including open- and close-ended questions, were used to reach consensus on core outcome domains. Individuals not participating in one round, and who did not explicitly express their desire to opt-out, were invited to each subsequent round. The Delphi study was conducted using SurveyMonkey software and invitations to participate were sent by email.

In the first round, panellists were asked to judge whether each potential core domain was important enough to be included in this COS with possible answers ‘yes’, ‘no’ and ‘unsure/not my expertise’. Panellists were given the opportunity to propose changes of wording and definitions of domains, to indicate if some domains had major conceptual overlap or had to be aggregated, and to suggest the inclusion of missing potential core domains. A question was asked about the ideal number of domains for this COS and another about reporting of adverse events (AEs). Panellists were always encouraged to provide a rationale for their answers. A priori cut-off criteria were established for excluding domains that were rejected by more than 60 % and favoured by less than 20 % of respondents.

In the second round, a proposal was made for exclusion of domains that did not have at least 67 % of the first round respondents answering ‘yes’ or ‘unsure/not my expertise’. Other proposals were made for excluding or retaining domains suggested as having large conceptual overlap. Consensus for the second round was a priori set at 67 % of respondents agreeing with a proposal. Panellists were also asked to judge whether the potential core domains suggested as missing were important enough to be included in the COS, as done for the other domains in the first round.

The remaining potential core domains were presented in the third round to ask the panellists if each was indeed core. A priori consensus was set at 67 % of the panel agreeing that a domain is core. In each round, descriptive statistics were used to summarize all the questions. All rationales provided by panellists were checked against the quantitative results to evaluate whether substantial inconsistencies emerged. Responses of the patients’ group were always analysed separately to assess whether discrepancies were emerging with the rest of the panel. In the third round, frequencies of responses for each domain were calculated for the whole panel and separately for each of the stakeholder groups.

Final decisions

The project team made some proposals to the Steering Committee regarding the interpretation of the final results of the Delphi. Committee members expressed their opinion on each proposal and the opinion supported by more than 50 % of members was followed. Some proposals concerned the inclusion of a ‘death’ and a ‘pathophysiological manifestations’ domain in the COS (as recommended by the OMERACT initiative for all COSs [19]), and what would be an appropriate approach for the reporting of adverse events (AEs).

Results

Panellists

We selected a sample of 280 experts to participate: 139 researchers, 108 care providers, 15 patients, and 18 members of the Steering Committee. A flowchart of the response rate in each round is presented in Fig. 1; 79 of the selected panellists (29 %) participated in all three rounds. People from five continents participated, with the United States, The Netherlands, Australia and the United Kingdom being the most represented countries (Table 1). Socio-demographic characteristics, panellists’ disciplines of expertise and experience with NSLBP clinical research were not substantially different between rounds (Table 1). Fourteen patients (seven men and seven women) participated in the first round: three had current and past episodes of NSLBP, six had only a current episode, and three had NSLBP only in the past. Among the nine with current NSLBP: seven sought care for their back problem, three were off-work due to their LBP, two had acute NSLBP (i.e. pain for less than a month), three chronic NSLBP from three months to a year, four chronic NSLBP for more than a year. None of the patients underwent a surgical operation for current and/or past episodes of LBP. In total, forty-six panellists of the first round (32 %) sought care for a present or past episode of LBP but only those specifically invited as patients were considered part of this stakeholder group.

Fig. 1
figure 1

Flowchart of participation rates per round

Table 1 Characteristics of participants of the Delphi study

List of potential core domains

The list of potential core domains generated by the Steering Committee included 41 outcome domains, subdivided as follows: 1 in the core area ‘death’, 21 in ‘life impact’, 6 in ‘resource use/economical impact’ and 13 in ‘pathophysiological manifestations’. The list with all definitions used in the Delphi study is presented in Table 2.

Table 2 Definitions of potential core domains considered for NSLBP clinical trials

Delphi round 1

The first round ran from February 18 to March 24, 2014. The results on inclusion of the 41 domains are presented in Fig. 2. Six domains met a priori criteria for exclusion: ‘legal services’, ‘body structures’, ‘muscle tone’, ‘structural stability’, ‘proprioception’ and ‘urination’. For 12 of the other domains, at least 67 % of respondents indicated that they should be included in the COS or were unsure about it (Fig. 2). The remaining 23 domains did not reach this threshold and their exclusion was proposed in the second round. No clear discrepancies between the patients’ perspective and overall panel responses were identified.

Fig. 2
figure 2

Ratings of 41 potential core domains in the first Delphi round

One hundred and thirty-one panellists answered the question on the ideal number of domains and 106 (81 %) indicated a specific number; the suggested median number of domains was 7 (interquartile range 5–10) and the majority of the comments were in favour of a small COS. The majority of respondents to the question on AEs (72 %) agreed that only AEs occurring outside of core outcome domains should be reported as AEs.

Several panellists emphasized the overlap of ‘health-related quality of life’ with other more specific domains (e.g. ‘physical functioning’, ‘psychological functioning’) (see “Appendix II”). To address this, a proposal was formulated for the second round to exclude ‘health-related quality of life’ from the list. Panellists also remarked that ‘work ability’ and ‘work productivity’ should not be included in all trials because they are not applicable to non-working populations, and because they overlap (“Appendix II”). These comments had to be balanced against favourable comments for inclusion and prompted a proposal for the second round to retain these two domains in the list with an adapted definition that includes also non-paid workers (e.g. students, housewives). Several panellists commented about the overlap of ‘pain interference’ with other domains (“Appendix II”) and these comments were addressed in a proposal to retain it in the list despite the overlap. Despite disagreements on inclusion of ‘non-health care services’ (Fig. 2), substantial arguments were put forward in its favour. Two patients emphasized that these services (e.g. alternative health care) can be very important, others highlighted that what constitutes ‘non-health care services’ can differ between countries and that they can be relevant cost-drivers (“Appendix II”). Based on these comments, a proposal for the second round was made to incorporate the content of this domain into ‘health care services’. In total, 16 new potential core domains were suggested by panellists for inclusion in the list. Appropriate definitions were searched for these domains and they were presented in the second round for rating (“Appendix III”).

Delphi round 2

The second round ran from April 27 to May 26, 2014. Consensus was reached for the exclusion of all but one domain (i.e. ‘social functioning’, 64 % consensus) that did not have at least 67 % support from the first round. No substantial arguments favoured the retention of these domains.

Consensus was not obtained for excluding the domain ‘health-related quality of life’ (55 % of the panel recommended its exclusion). Some substantial arguments (e.g. “Construct overlap can only be answered empirically. It is just as likely that the entire question set loads on a single factor, or that there are a few higher order factors. Pain, pain interference, physical functioning, QOL, work, sleep, self-rated health have all been showing to share variance in previous studies. […].”) explained the lack of consensus. Consensus was obtained (i.e. 85 %) for incorporating “non-health care services” into “health care services”, for retaining ‘work ability’ and ‘work productivity’ as independent domains (72 %), and for retaining ‘pain interference’ in the list (68 %).

However, relevant arguments were made against including ‘health care services’ and ‘work productivity’ in the list of potential core domains. These arguments outlined that, given the scope of the COS, it might not be appropriate to include these domains in efficacy trials (e.g. “[…] Often in trials patients are requested not to undertake/receive any other treatments during the intervention period, which means differences in use depend on things other than the patient’s health state […]”). Several panellists also questioned whether there are valid and reliable methods to assess these domains in all clinical trials (e.g. “[…] during follow-up the acquisition of accurate and reliable health care services data is questionable”, or “[…] both are difficult to assess, may be influenced by factors other than the presence of LBP, and I am not sure of the reliability of the assessment methods”). These domains were kept in the list but these arguments were highlighted in the third round.

None of the new potential core domains suggested in the first round reached consensus for inclusion. Votes for inclusion ranged from 60 % for ‘satisfaction with the outcome of treatment’ to 13 % for ‘travel and transportation’. No substantial differences between patients’ responses and the rest of the panel emerged. A total of 13 domains were retained in the list of potential core domains and presented in the last round.

Delphi round 3

The third round ran from June 23 to July 17, 2014. Three domains exceeded the a priori threshold for inclusion in the COS: ‘physical functioning’ (96 % of respondents indicating it as core), ‘pain intensity’ (90 %) and ‘health-related quality of life’ (73 %) (Fig. 3). These ratings were consistent across stakeholder subgroups with the only exception that the patients’ group that did not reach agreement (55 %) on ‘health-related quality of life’ (Fig. 3). ‘Work ability’ was rated as a core domain by 76 % of health care providers but only by 64 % of the whole panel and 36 % of the patients (Fig. 3). ‘Psychological functioning’ was considered a core domain by 76 % of care providers and 91 % of patients but not by the whole panel (Fig. 3). While providers and patients provided ten comments in favour of its inclusion, half of these supported its inclusion as a confounder or moderator, being these not appropriate arguments to support inclusion as an outcome domain. The other eight potential core domains did not reach consensus for inclusion in the COS for any stakeholder group, except 82 % of the patients that rated ‘self-rated health’ as a core domain (Fig. 3).

Fig. 3
figure 3

Ratings of 13 potential core domains in the third Delphi round

Final decisions

Based on the Delphi results, the majority of the Steering Committee members agreed on including ‘physical functioning’, ‘pain intensity’ and ‘health-related quality of life’ in this COS. The Steering Committee considered the inclusion of ‘health-related quality of life’ because there were strong arguments in its favour: overall consensus was reached, three groups of stakeholders were in favour, and its definition (Table 2) incorporated the excluded domains ‘psychological functioning’ and ‘self-rated health’ that were rated as core by some groups of stakeholders (Fig. 3). The Steering Committee also agreed on the exclusion of ‘work ability’ as overall agreement for inclusion was not reached, as several arguments for inclusion were weak and as it was not considered core by three groups of stakeholders (Fig. 3). These decisions were also taken with the intention of keeping this COS as short as possible to facilitate its implementation.

The majority of Steering Committee members agreed on including the domain ‘number of deaths’ in the COS as this emphasizes the need to report on the occurrence of deaths in every clinical trial. The Steering Committee acknowledges that death is a rare event for NSLBP clinical trials but a short statement, such as “no deaths occurred in this clinical trial”, would suffice to cover this outcome domain. The Steering Committee did not agree with the inclusion of a generic pathophysiological manifestation domain in this COS, as recommended by OMERACT [19]. The main rationale for this decision was that not all interventions for NSLBP are targeting a pathophysiological manifestation, as this disorder is characterized by the absence of a known pathophysiology [68, 25]. Furthermore, its inclusion could create unnecessary increases in research costs and impact upon the brevity of the COS. This recommendation does not imply that measuring pathophysiological manifestations is unimportant in relevant NSLBP clinical trials and researchers are encouraged to include them when appropriate for their individual studies.

In the first round of the Delphi, consensus was reached on the reporting of AEs only for those domains not already included in the COS. This approach ensures that, where appropriate, AEs that occur within a core outcome domain (e.g. an increase in ‘pain intensity’ or a decrease in ‘health related quality of life’) are included in the statistical analysis. However, taking into account some comments by Delphi panellists, the Steering Committee decided to adopt a flexible approach to the reporting of AEs. This approach leaves the option open to trialists to also report, as separate AEs, those negative outcomes occurring within core domains.

Discussion

Using the methodological guidance of initiatives like COMET and OMERACT [16, 19], we performed a Delphi study to provide an international, multidisciplinary and multistakeholder consensus-based update of an earlier standardized set of outcome domains for LBP research [20, 21]. Sufficient agreement was reached on core outcome domains that are part of a COS intended for clinical trials assessing efficacy or effectiveness of health interventions in patients with NSLBP. The domains included in this COS are ‘physical functioning’, ‘pain intensity’, ‘health-related quality of life’ and ‘number of deaths’ (see definitions in Table 2).

The domain ‘physical functioning’ reached the highest level of consensus in this study and the definition focuses on ability to engage in daily physical activities (Table 2). Our definition of ‘physical functioning’ will be fundamental to determine which measurement instrument would best measure this domain. IMMPACT recommendations for chronic pain clinical trials also suggest measuring ‘physical functioning’ as a core outcome domain [40, 41], and this convergence strengthens its inclusion.

‘Pain intensity’ also reached a very high level of consensus for inclusion in this COS. The inclusion of a pain domain is in line with the original core set [20, 21] and IMMPACT recommendations [40, 41]. ‘Pain intensity’ for this COS refers to the magnitude of the pain experience, whereas other pain (sub)domains were suggested for consideration by the previous core set and/or IMMPACT (e.g. ‘bothersomeness of pain’, ‘pain quality’, ‘temporal aspects of domains’, ‘pain medications’) [20, 21, 40, 41]. Some of those pain domains and others (i.e. ‘Pain behaviour’, ‘pain interference’) were presented as potential core domains in this Delphi but not sufficient agreement was reached to consider them as core (Figs. 2, 3).

‘Health-related quality of life’ included in this COS could be considered as the ‘successor’ of ‘general well-being’ included in the previous set [20, 21]. However, a definition of ‘general well-being’ was not given for the previous set and this makes a clear comparison of the two constructs challenging. Taking into account the widely accepted bio-psycho-social model for LBP [42, 43], it may be appropriate to have a domain like ‘health-related quality of life’ in this COS as its definition includes all components of the model (Table 2). The inclusion of all components of the bio-psycho-social model is also in line with the domains included in a conceptual framework developed to characterize the burden of LBP [35] and with the results of a review that attempted to summarize qualitative research conducted on the impact of LBP on people’s lives [44]. However, it will be clear only when choosing measurement instruments for this COS if the different components of ‘health-related quality of life’ can be treated as separate domains or as one multidimensional domain. The choice of instruments will also be guided by the intention of minimizing redundancy of measurement, to avoid large overlap of instruments and promote brevity of the COS.

Another key aspect in the development of a COS is the definition of contextual factors (i.e. potential confounders and effect modifiers) that should be measured alongside core outcome domains [19]. However, it was beyond the scope of this study to address contextual factors and for the measurement of these factors a reference is made to the prominent work of the National Institutes of Health (NIH) Task Force [45]. This Task Force recently published a report on minimum baseline standards that should be collected in clinical studies for chronic LBP, to standardize their assessment [45].

This COS includes refined versions of three domains included in the previous standardized set but does not incorporate the other two: ‘disability social role’ and ‘satisfaction with care’ [20, 21]. ‘Disability social role’ referred to work absenteeism and could be replaced by the domain ‘work productivity’ used in this study, while ‘satisfaction with care’ was formulated as ‘satisfaction with treatment services’ in this study, but neither was supported by the Delphi panel (Figs. 2, 3). ‘Work productivity’ refers to indirect non-medical costs that are the first cost drivers for LBP [5] and it is an undoubtedly important outcome for clinical trials with economic evaluations alongside. However, this domain poses the challenge of its measurement in clinical trials aimed at assessing efficacy of interventions, in which an economic evaluation might be out of the scope of the trial. To support the exclusion of ‘satisfaction with treatment services’ several panellists underlined that it could be highly influenced by factors unrelated to an intervention (e.g. waiting list, amiability of providers, unfriendly receptionist, parking difficulty) and, consequently, that it could say relatively little about efficacy or effectiveness of that intervention.

This is the first Delphi study conducted to explore international, multistakeholder, and multidisciplinary consensus on core outcome domains to be reported in NSLBP clinical trials. This study highlighted diverging opinions on the importance of some domains and reinforced the wisdom of a comprehensive exercise to determine which outcome domains are felt by the majority to be core. The strengths of this study include methods that followed guidance of initiatives like COMET and OMERACT [16, 19], having a large expert panel of varied stakeholders representing various disciplines and countries, giving the opportunity to Delphi panellists to provide comments for each choice, allowing panellists to reconsider their views after considering other panellists’ reasoning, attempting to address strong arguments emerging from the Delphi panel, and rigorous reporting of methods [26] and results. One limitation of this study could be the relatively small number of patients involved in the Delphi rounds, which could have led to under or overestimation of the importance of certain domains from their perspective. However, the goal of this study was not to develop a comprehensive range of outcome domains important to all stakeholders, but rather a core set for inclusion in all clinical trials. Patients can also be involved in trial management teams where they can shape the range of outcomes measures collected in individual trials and this should represent good practice. Finally, the definition of COSs places emphasis on the concept of a minimum set [16, 19] and the four domains included in this COS seem to fit perfectly within this definition. The existence of a small COS for NSLBP should facilitate its inclusion in clinical trials, alongside trial-specific outcomes.

The development of a COS is a stepwise approach [16, 19] and this study determined core outcome domains for clinical trials on NSLBP. The next step will be to reach consensus on which measurement instruments should be used to measure these outcome domains. The selection of instruments will be focused on those that have demonstrated adequate measurement properties for these domains with the least participant burden. Recently published methodological guidance on this topic [46, 47] will help to conduct the next step for this COS in NSLBP.

Conclusions

A consensus-based COS for NSLBP was developed and included the domains ‘physical functioning’, ‘pain intensity’, ‘health-related quality of life’ and ‘number of deaths’. This COS represents the update of the standardized set proposed by Deyo et al. in 1998 [20, 21]. The brevity of this COS should facilitate its implementation in clinical trials assessing efficacy or effectiveness of health interventions for NSLBP. Future research should establish which measurement instruments are the most appropriate to measure these core outcome domains.