Article Text

Download PDFPDF

Original article
Diagnostic accuracy of inflammatory back pain for axial spondyloarthritis in rheumatological care
  1. Denis Poddubnyy1,
  2. Johanna Callhoff2,
  3. Inge Spiller1,
  4. Joachim Listing2,
  5. Juergen Braun3,
  6. Joachim Sieper1 and
  7. Martin Rudwaleit4
  1. 1 Department of Gastroenterology, Infectious Diseases and Rheumatology, Campus Benjamin Franklin, Charité Universitätsmedizin Berlin, Berlin, Germany
  2. 2 Epidemiology Unit, German Rheumatism Research Centre, Berlin, Germany
  3. 3 Department of Rheumatology, Rheumazentrum Ruhrgebiet, Herne, Germany
  4. 4 Department of Medicine and Rheumatology, Klinikum Bielefeld Rosenhöhe, Bielefeld, Germany, Charité Universitätsmedizin Berlin, Berlin, Germany, and Gent University, Gent, Belgium
  1. Correspondence to Dr Martin Rudwaleit; martin.rudwaleit{at}


ObjectiveInflammatory back pain (IBP), the key symptom of axial spondyloarthritis (axSpA), including ankylosing spondylitis, has been proposed as a screening test for patients presenting with chronic back pain in primary care. The diagnostic accuracy of IBP in the rheumatology setting is unknown.

Methods Six rheumatology centres, representing secondary and tertiary rheumatology care, included routinely referred patients with consecutive chronic back pain with suspicion of axSpA. IBP (diagnostic test) was assessed in each centre by an independent (blinded) rheumatologist; a second (unblinded) rheumatologist made the diagnosis (axSpA or no-axSpA), which served as reference standard.

Results Of 461 routinely referred patients, 403 received a final diagnosis. IBP was present in 67.3%, and 44.6% (180/403) were diagnosed as axSpA. The sensitivity of IBP according to various definitions (global judgement, Calin, Berlin, Assessment of SpondyloArthritis international Society criteria for IBP) was 74.4%–81.1 % and comparable to published figures, whereas the specificity was unexpectedly low (25.1%–43.9%). The resulting positive likelihood ratios (LR+) were 1.1–1.4 and without major differences between sets of IBP criteria. The presence of IBP according to various definitions increased the probability of axSpA by 2.5%–8.4% only (from 44.6% to 47.1%–53.0%).

Conclusions The diagnostic utility of IBP in the rheumatology setting was smaller than expected. However, this was counterbalanced by a high prevalence of IBP among referred patients, demonstrating the effective usage of IBP in primary care as selection parameter for referral to rheumatology. Notably, this study illustrates potential shifts in specificity and LR+ of diagnostic tests if these tests are used to select patients for referral.

  • spondyloarthritis
  • low back pain
  • ankylosing spondylitis
  • epidemiology

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.

Key messages

What is already known about this subject?

  • Inflammatory back pain (IBP), the key symptom in axial SpA, is used for diagnostic purposes.

  • Despite, its diagnostic accuracy in the rheumatology setting according to high level standards (e.g. STARD, Standards for Reporting of Diagnostic Accuracy) is unknown.

What does this study add?

  • In the rheumatology setting IBP was a common finding among referred patients suggesting the effective use of IBP as selection parameter for referral in primary care.

  • As a result, the specificity of IBP was lower and the diagnostic gain of IBP was smaller than expected.

  • In the rheumatology setting no striking differences among defined IBP criteria (Calin, Berlin, ASAS) were found, yet the Calin criteria were least specific.

How might this impact on clinical practice?

  • The study shows global judgement on the presence or absence of IBP as done in routine rheumatological care is influenced by knowledge of other SpA features.

  • Rheumatologists should realize that diagnostic test characteristics such as specificity and likelihood ratios vary if the diagnostic test is used as selection parameter in primary care.


Inflammatory back pain (IBP) is the key clinical symptom of axial spondyloarthritis (axSpA), including non-radiographic axSpA (nr-axSpA) and ankylosing spondylitis (AS, r-axSpA), and is often present at disease onset. IBP describes a spectrum of symptoms including (1) insidious onset of back pain, (2) morning stiffness in the lower back, (3) improvement of back pain with exercise, (4) no improvement with rest, (5) awakening at night or early morning because of back pain and (6) alternating buttock pain. Defined sets of IBP criteria are the Calin criteria,1 Berlin criteria2 and the Assessment of SpondyloArthritis international Society (ASAS) IBP experts criteria3; the latter was applied in the ASAS classification criteria for axSpA study.4 5 The published sensitivities (70.1%–95%) and specificities (72.5%–81.3%%) for various sets of IBP criteria reveal calculated positive likelihood ratios (LR+), a measure of diagnostic utility, of 2.9–3.9. Accordingly, a LR+ of 3–4 for IBP has been proposed as best estimate for diagnostic purposes in daily practice.6 Assuming a background prevalence of 5% of axSpA among patients with chronic back pain in primary care, it has further been estimated that the presence of IBP increases the likelihood of axSpA by 9%–11% (from 5% to 14%–16%).6 IBP has also been proposed7 and has been proven effective as a parameter for selecting patients with chronic back pain in primary care for referral to rheumatology.7–13 No diagnostic accuracy study on IBP in the rheumatology setting is available to date.


Study centres

The Diagnostic Accuracy of Inflammatory Back pain study (DIVERS) was designed according to recommendations in the field of diagnostic studies including the Standards for Reporting of Diagnostic Accuracy recommendations.14 15 DIVERS was conducted in six rheumatology centres across Germany: four general rheumatology practices (secondary care) distributed across the country, one large hospital for rheumatic diseases and one university hospital (tertiary care).


Eligible patients were routinely referred to rheumatology, had to have chronic back pain of ≥3 months and age at onset of ≤45 years and no clear diagnosis. Patients with a known diagnosis of axSpA were excluded. To avoid selection bias, participating centres were strongly encouraged to include consecutively all newly referred patients with chronic back pain with suspicion of axSpA. No specific referral strategies were set up for this study.

Study procedures

In each centre, two rheumatologists were involved: R-care and R-blind. R-care took routine care of the patient, ordered diagnostic tests as needed and made the final diagnosis of axSpA or no-axSpA. In contrast, R-blind took the clinical history regarding IBP features only, but was blinded to all other disease features otherwise. Both R-care and R-blind documented their findings independently of each on a prespecified case report form. The clinical diagnosis (axSpA or non-SpA) made by R-care served as reference standard. Since the study was conducted in routine rheumatology care, the sequence of consultation by either R-care or R-blind was left to the discretion of the participating centres and was driven primarily by feasibility aspects (R-care first 76%, R-blind first 24%).

In addition, patients were asked to complete a short self-reported questionnaire on IBP features in the rheumatologist’s waiting room prior to the consultation, with answering modalities of ‘yes’ or 'no’ to each IBP question.

The time period between first presentation to the rheumatology centre including the assessment of IBP by R-blind and final diagnosis (axSpA or no-axSpA) was in the range of 2–8 weeks. The first patient was included in March 2009 and the last patient in June 2010.

Study end points and data analysis

The diagnostic test of interest (presence/absence of IBP) was assessed by R-blind according to four definitions: (1) IBP by global judgement by the rheumatologist (ie, judgement on IBP independent of formal IBP criteria; yes/no), (2) Calin criteria, (3) Berlin criteria and (4) ASAS criteria for IBP. The global judgement (yes/no) on IBP was further categorised into ‘uncertain’, ‘moderately confident’, ‘confident’ and ‘very confident’. Sensitivity, specificity, positive and negative LRs (LR+ and LR−), positive and negative predictive values (PPV and NPV) with corresponding 95% CI of IBP and the net increase (%) in disease probability of axSpA were calculated. The agreement on the presence of IBP between R-blind and R-care was assessed by percentage agreement and Cohen’s kappa; the latter interpreted according to the method of Landis and Koch.16


Of 476 eligible patients, 13 patients did not fulfil the inclusion criteria and were excluded. R-blind decided on the presence/absence of IBP in 461/463 patients. IBP by global judgement (R-blind) was present in 306/461 patients (66.4%). The level of confidence on IBP by R-blind in these 461 patients was as following: ‘very confident’ in 17.1% of patients, ‘confident’ in 62.5%, ‘moderately confident’ in 18.9% and ‘uncertain’ in 1.5%. The final clinical diagnosis by R-care (reference standard) was missing in four and considered uncertain in 54 patients. Thus, 403 patients with an assessment of IBP by R-blind (67.3% had IBP) and a definite diagnosis by R-care (180 with definite axSpA—88 with AS, 92 with nr-axSpA—and 223 with no-axSpA) were included in the final analysis (patient flow chart shown in figure 1). The clinical characteristics of the patients are presented in table 1. The prevalence of IBP was generally higher in patients with axSpA (with no difference between AS and nr-axSpA) as compared with patients without axSpA (table 2).

Figure 1

Flow chart of patients with chronic back pain included in diagnostic accuracy of inflammatory back pain study. IBP, inflammatory back pain; SpA, spondyloarthritis.

Table 1

Clinical, laboratory and imaging characteristics of patients with chronic back pain who had judgement on IBP and who received a final diagnosis

Table 2

The prevalence of IBP (%) according to different criteria in patients referred because of chronic back pain

The formal agreement on the global judgement on the presence of IBP between R-blind and R-care was moderate (kappa 0.45; 95% CI 0.36 to 0.54) with percentage agreement 74.9%. Similar rates of agreement between R-blind and R-care were obtained for the various defined IBP criteria: kappa 0.43 (95% CI 0.32 to 0.53; percentage agreement 80.2%) for the Calin criteria; kappa 0.52 (95% CI 0.43 to 0.61; percentage agreement 80.0%) for Berlin criteria and kappa 0.46 (95% CI 0.37, 0.56; percentage agreement 77.3%) for the ASAS criteria.

Diagnostic accuracy of IBP

For diagnostic accuracy analyses of IBP, data from AS and nr-axSpA were pooled (table 3). IBP by R-blind had a sensitivity of 74%–81% and a specificity of 25%–44% for the diagnosis of axSpA, depending on the IBP definition applied. Interestingly, global judgement on IBP by R-blind numerically exceeded the three sets of defined IBP criteria in terms of sensitivity and specificity; yet, the resulting positive LRs overall were low, ranging from 1.1 (Calin criteria) to 1.4 (global judgement of IBP) (table 3). The results were similar when we stratified for single study sites or for type of centre: for example, the LR+ for IBP according to ASAS criteria by R-blind was 1.3 (hospital based) and 1.1 (private practices), respectively. Overall, the LR+ as a measure of diagnostic utility of the symptom of IBP for the diagnosis of axSpA was small and substantially smaller than expected from published data (LR+ 3–4) due to low specificity, independently of the definition of IBP being applied.

Table 3

Sensitivity, specificity, PPV and NPV of IBP for the diagnosis of axSpA

The ASAS defined IBP criteria had the lowest sensitivity (74.4%), while the Berlin criteria and global judgement of IBP had the highest sensitivity (both 81.1%). The specificity of IBP according to R-blind varied from 24.9% (Calin criteria) to 43.9% (global judgement) (table 3). There were no striking differences between the three sets of IBP criteria with regard to PPV and NPV: the Calin criteria performed slightly less well than the Berlin and ASAS criteria.

In the original publication of the Berlin criteria, it was speculated that the presence of ≥3 out of 4 items (instead of ≥2 out of 4) would yield a high diagnostic gain (estimated LR+ 12.4; specificity 97.3%, sensitivity 33.6%).2 In the settings of DIVERS, however, the sensitivity and specificity of ≥3 out of 4 items of the Berlin criteria were 59.8% and 59.9%, respectively, and resulted in a small increase of the LR+ from 1.2 to 1.5 only.

Single IBP features

Analysing single IBP parameters, the highest sensitivity for the diagnosis of axSpA was observed for ‘improvement of back pain with exercise’ (88.9%), which, however, had a low specificity (22.7%) (table 4). ‘Alternating buttock pain’ had the lowest sensitivity (60.0%) but the highest specificity (58.2%), and the highest LR+ of 1.4 for a single IBP feature according to R-blind.

Table 4

Sensitivity and specificity of single IBP parameters as assessed by the diagnosing rheumatologist, the blinded rheumatologist and patient for the diagnosis of axSpA

Blinded as compared to unblinded assessment of IBP

It can be assumed that the judgement on a diagnostic test like IBP, which is subject to interpretation, will be influenced by knowledge of other diagnostic test results (‘diagnostic bias’). Indeed, both sensitivity 90.0% versus 81.1% and specificity 58.2% versus 44.0% of IBP according to global judgement were higher for R-care (unblinded) than for R-blind (table 3). This indeed demonstrates a moderate diagnostic bias of R-care in the assessment of IBP by knowledge of other SpA features. In contrast, the comparison between R-blind and R-care according to formal sets of IBP criteria (Calin, Berlin, ASAS) revealed no consistent differences in specificity but lower sensitivities for all three sets of criteria when assessed by R-blind (table 3). This suggests that ‘global judgement’ on IBP is more susceptible to diagnostic bias than defined sets of IBP criteria.

IBP self-assessment by the patient

With regard to fulfilment of defined sets of IBP criteria, little differences were found in the self-assessed prevalence of IBP between patients with axSpA and patients without axSpA (table 2): the specificities of defined IBP criteria, and of single IBP features, were even lower if self-assessed by the patient, while the sensitivities were comparable to those as assessed by R-blind (tables 3 and 4), resulting in even slightly lower LR+ of between 0.9 and 1.1 overall.

Diagnostic gain of IBP for the diagnosis of axSpA

In DIVERS, axSpA was diagnosed in 44.6% (pretest probability of axSpA). According to global judgement on IBP by R-blind, the presence of IBP (LR+ 1.4) resulted in a post-test probability of axSpA of 53%. Thus, a moderate diagnostic gain of IBP of 8.4% remained despite the low LR+. For comparison, the presence of IBP according to R-care—unblinded to other patient findings including human leucocyte antigen (HLA)-B27 and imaging, and therefore somewhat biassed—had resulted in an increase of axSpA by as much as 19.5% (from 44.6% to 63.9%).


DIVERS is the first real-life diagnostic accuracy study on IBP as a diagnostic test for axSpA in the rheumatology setting, that is, in secondary and tertiary care. We found a net diagnostic gain of only 2.5%–8.4%, if IBP is present, for the likelihood of a diagnosis of axSpA. Thus, one important finding at first glance is that IBP in the rheumatology setting contributes little to establishing the diagnosis of axSpA. On the other hand, the majority of referred patients had IBP, suggesting an effective selection in primary care of patients presenting with chronic back pain for referral to rheumatology. Moreover, our study shows that in the rheumatology setting, none of the defined sets of IBP criteria (Calin, Berlin, ASAS) clearly outperformed another one; yet, a tendency for the Calin criteria being the least specific set was found.

The sensitivity of IBP according to various defined IBP criteria (74.4%–81.1%) or to global judgement on IBP (81.1%) among patients with axSpA in DIVERS was very similar to published figures of 70.1%–95% in patients with AS/axSpA.1–3 However, we found substantially lower specificities for all defined sets of IBP criteria, ranging from 25.2% to 39.5%, as compared with the original publications on IBP (72.5%–81.3%).1–3 As a result, the calculated LR+ were low (1.1–1.2) as compared with published LR+ for these IBP definitions of 2.9–3.9.1–3 A plausible explanation for the differences in the specificities of IBP criteria could be the fact that in two of the three earlier studies,1 2 well-selected patients with either a clear diagnosis of axSpA or of mechanical back pain (no-axSpA) were included (convenience sample), whereas in our study undiagnosed and newly referred patients were included, thereby reflecting better daily rheumatology practice. Since the prevalence of IBP was high among referred patients, one must assume that IBP has operated as a selection parameter in primary care that had triggered referral to the rheumatologist. This unmeasured channelling process led to a population of referred patients with chronic back pain who were enriched for the presence of IBP. In fact, it has been proposed in 2005, and has subsequently proven effective, to select in primary care for referral to rheumatology chronic back patients with age at onset ≤45 years and at least one additional SpA feature such as IBP or a positive HLA-B27 test, both of which increase the likelihood of having axSpA.13 In epidemiological studies on unselected back pain, prevalence figures for IBP of 5%–15% for acute and of 28%–35% for chronic back pain have been reported.17 18 The high prevalence of IBP of 67.3% in our study indeed supports the notion that IBP had been used by primary care physicians to select patients for referral, although this was not specifically intended.

The selection of patients with certain SpA features for referral is also illustrated in our study by the high prevalence of other SpA features including HLA-B27 (57.6%), a positive family history for SpA (21.3%) or uveitis (10.4%); all of them leading to enrichment of patients with a higher likelihood of having axSpA. In fact, the selected referral of patients with back pain more likely to have axSpA is eventually reflected by the high rate of a final diagnosis of axSpA (44.6%) in DIVERS which is much higher than reported prevalences of axSpA of around 5%–12% in unselected patients with chronic back pain.19 20 Interestingly, studies with a study design similar to DIVERS also revealed high prevalence rates of axSpA among referred patients: in the international ASAS classification criteria for axSpA study, the prevalence of axSpA was 66%,5 and in the SPondyloArthritis Caught Early cohort, axSpA was diagnosed in 41%.21 Although no structured referral protocol was recommended in either of these studies, an unmeasured preselection process had undoubtfully taken place in both,22 suggesting ‘unmeasured’ selection of patients for referral likely occurring in other countries and settings as well.

The low LR+ of 1.1–1.2 for defined sets of IBP criteria suggests at first glance a minor, if any, diagnostic utility of IBP in the rheumatology setting. It seems that the diagnostic utility of IBP has been already ‘used up’ at the time when the patient is referred to the rheumatologist. Yet, a small diagnostic gain is indeed retained, partially because the pretest probability (prevalence of axSpA) was substantially higher in the rheumatology setting than in primary care: the presence of IBP according to global judgement by the blinded assessor (LR+ 1.4) resulted in an increase of the probability of axSpA from 44.6% to 53%, implicating a net diagnostic gain of 8.4%, whereas the increase with fulfilment of Calin criteria (2.4%), Berlin (4.5%) and ASAS criteria (4.5%) was lower than the estimated net diagnostic gain of 9%–11% for IBP among unselected patients with chronic back pain in primary care (probability of axSpA increases in primary care from 5% to 14%–16%).6 19 23

In DIVERS, we also analysed single IBP features, among which ‘alternating buttock pain’ had the highest specificity and the highest LR+ of 1.4, followed by ‘no improvement with rest’ (LR+ 1.3), suggesting that these items may provide some diagnostic information in the rheumatology setting. We also addressed the self-assessment of IBP by the patient. The resulting sensitivities were comparable to those by the blinded rheumatologist. However, the specificities were similar or even lower. The specificity of self-assessment of IBP symptoms in unselected patients in primary care is expected to be higher but cannot be properly addressed in our study.7 13 24

The knowledge of SpA features other than IBP that drive towards or away from a diagnosis of axSpA is likely to influence the global judgement on the presence or absence of IBP. Indeed, this potential bias is illustrated by both a higher sensitivity and higher specificity (with a resulting higher LR+ of 2.2 vs 1.4) for the diagnosing (unblinded) as compared with the blinded rheumatologist in DIVERS and underscores the necessity in diagnostic accuracy studies for a diagnostic test like IBP, which is subject to interpretation, to be assessed in a blinded fashion.14 15

The strengths of our study are the high standards for the conduct and reporting of diagnostic studies including the prospective study design, the independent assessment of the diagnostic test of interest (IBP) without knowledge of other test results,14 15 the multicentre study design with secondary care (four private practices) as well as tertiary care centres (one university hospital and one large community hospital specialised in rheumatology) and the large number of consecutive patients. A potential weakness of our study conducted in routine rheumatology care is the fact that not all patients diagnosed as no-axSpA underwent MRI. However, only 22 of patients without MRI (9.8% of all patients without SpA patients) had a clinical context (IBP plus positive HLA-B27) strongly suggesting axSpA: if all of these patients had shown sacroiliitis on MRI and had been diagnosed axSpA, a scenario that is unlikely, the specificity of IBP (global judgement) had increased from 43.9% to 48.8% and the LR+from 1.4 to 1.56.

Of interest, our study strikingly shows how test characteristics and the resulting LR+ vary, depending on the setting (primary vs secondary/tertiary care) where these tests are applied and depending on whether parameters have operated already as selection parameters for referral. The understanding of these potential shifts in specificity and LR+ of diagnostic tests is of general importance when interpreting data from any study on diagnostic test characteristics in medicine. The results of this study also confirm that a diagnosis of axSpA cannot be made by the presence or absence of single parameters (in this case of IBP) but only by assessment of all available clinical, laboratory and imaging parameters, interpreted by an experienced physician and after careful exclusion of other diagnoses.23

In summary, rheumatologists must be aware that their global judgement on IBP might be influenced by knowledge of other SpA parameters. Rheumatologists must also be aware that many patients referred to them for a diagnostic workup of axSpA will have IBP because IBP effectively operates in primary care as a selection criterion for referral. Although the specificity of IBP (and the resulting LR+) is low in the rheumatology setting, a small diagnostic gain remains.


We are grateful to Beate Buß for monitoring the data and Janis Vahldiek for data management support. We like to thank the following rheumatologists for participation in this project: Jan Brandt, Kirsten Karberg, Klaus Krüger, Dorothea Pick, Florian Schuch and Jörg Wendler. We further like to thank all colleagues who referred their patients with chronic back pain within this study.


  1. 1.
  2. 2.
  3. 3.
  4. 4.
  5. 5.
  6. 6.
  7. 7.
  8. 8.
  9. 9.
  10. 10.
  11. 11.
  12. 12.
  13. 13.
  14. 14.
  15. 15.
  16. 16.
  17. 17.
  18. 18.
  19. 19.
  20. 20.
  21. 21.
  22. 22.
  23. 23.
  24. 24.


  • Collaborators Jan Brandt, Klaus Krüger, Kirsten Karberg, Dorothea Pick, Florian Schuch, Jörg Wendler.

  • Contributors All authors contributed to acquisition, analysis and interpretation of the data and drafting the manuscript.

  • Funding This work was supported by a research grant from the German Research Foundation (Deutsche Forschungsgemeinschaft(DFG)), GZ RU 681/5-1.

  • Competing interests None declared.

  • Patient consent Obtained.

  • Ethical approval The study was approved by the ethics committee of the Charité Universitätsmedizin Berlin, and thereafter in the respective institutions of all participating centres.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Data sharing statement No additional data are available.