Accuracy of musculoskeletal imaging for the diagnosis of polymyalgia rheumatica: systematic review

Objectives To review the evidence for accuracy of imaging for diagnosis of polymyalgia rheumatica (PMR). Methods Searches included MEDLINE, EMBASE and PubMed. Evaluations of diagnostic accuracy of imaging tests for PMR were eligible, excluding reports with <10 PMR cases. Two authors independently extracted study data and three authors assessed methodological quality using modified QUADAS-2 criteria. Results 26 studies of 2370 patients were evaluated: 10 ultrasound scanning studies; 6 MRI studies; 1 USS and MRI study; 7 18-fluorodeoxyglucose-positron emission tomography (PET) studies; 1 plain radiography and 1 technetium scintigraphy study. In four ultrasound studies, subacromial-subdeltoid bursitis had sensitivity 80% (95% CI 55% to 93%) and specificity 68% (95% CI 60% to 75%), whereas bilateral subacromial-subdeltoid bursitis had sensitivity 66% (95% CI 43% to 87%) and specificity 89% (95% CI 66% to 97%). Sensitivity for ultrasound detection of trochanteric bursitis ranged from 21% to 100%. In four ultrasound studies reporting both subacromial-subdeltoid bursitis and glenohumeral synovitis, detection of subacromial-subdeltoid bursitis was more accurate than that of glenohumeral synovitis (p=0.004). MRI and PET/CT revealed additional areas of inflammation in the spine and pelvis, including focal areas between the vertebrae and anterior to the hip joint, but the number of controls with inflammatory disease was inadequate for precise specificity estimates. Conclusions Subacromial-subdeltoid bursitis appears to be the most helpful ultrasound feature for PMR diagnosis, but interpretation is limited by study heterogeneity and methodological issues, including variability in blinding and potential bias due to case–control study designs. Recent MRI and PET/CT case–control studies, with blinded readers, yielded promising data requiring validation within a diagnostic cohort study.

Conclusions: Subacromial-subdeltoid bursitis appears to be the most helpful ultrasound feature for PMR diagnosis, but interpretation is limited by study heterogeneity and methodological issues, including variability in blinding and potential bias due to casecontrol study designs. Recent MRI and PET/CT casecontrol studies, with blinded readers, yielded promising data requiring validation within a diagnostic cohort study.
Polymyalgia rheumatica (PMR) is an ageassociated, inflammatory musculoskeletal disease with a lifetime risk of 2.4% for women and 1.7% for men, 1 and affects 0.7% of the population over the age of 50 years. 2 Patients report pain and stiffness of the shoulder and/ or hip girdles, usually with elevation of inflammatory markers such as C reactive protein and erythrocyte sedimentation rate. 3 Accurate diagnosis of PMR is essential, given the impact of PMR on quality of life unless it is treated with systemic glucocorticoids, usually for a year or more. 4 Long-term glucocorticoids produce a significant risk of adverse events. [5][6][7][8] However, PMR can be mimicked by many other conditions, 9 many of which also respond initially to glucocorticoids. None of the various sets of classification What does this study add?
▸ Subacromial-subdeltoid bursitis is significantly more discriminatory for PMR compared to glenohumeral synovitis, in four studies with ultrasound data on both features. ▸ Data mostly come from diagnostic case-control study designs, which can overestimate values for sensitivity and specificity.
How might this impact on clinical practice?
▸ When evaluating patients with suspected PMR, clinicians may consider extra-articular locations of inflammation such as bursitis as supportive, but must bear in mind that there may be biases in current estimates of sensitivity and specificity of these findings.
criteria for PMR has yet been fully validated for clinical diagnostic use. There remains a need for additional tests providing diagnostic information, especially where the diagnosis is not clear-cut. In PMR, there is inflammation in and around the shoulders and hips; 3 this can often be visualised using imaging. 10 Based on small, single-centre studies, it has been hypothesised that PMR compared to RA has predominantly extra-articular rather than intra-articular imaging abnormalities. [11][12][13][14] However, the latest, datadriven provisional international classification criteria for PMR give equal weighting to extra-articular and intra-articular ultrasound features. 15 Since extra-articular features such as subacromial-subdeltoid bursitis (SAB) and trochanteric bursitis are commonly seen with normal ageing, 16 17 it is important to compare any imaging findings with those from non-PMR controls of similar ages.
The objective of this study was to review the evidence regarding the accuracy of musculoskeletal imaging for the diagnosis of PMR.

Data sources and searches
The systematic review protocol was uploaded to the PROSPERO database before running searches (registration number CRD42013005734). The reference standard was defined as a rheumatologist's diagnosis of PMR, without any better explanation of the presenting symptoms found during follow-up. Potential sources of heterogeneity, including study setting, eligibility criteria, technical aspects of the imaging and glucocorticoid therapy were pre-defined. A PICO-structured search was conducted to identify relevant studies in Pubmed, Ovid MEDLINE (1966−) and EMBASE (including EMBASE Classic) (table 1).

Study selection
A study was eligible if it included humans with either suspected PMR (diagnostic cohort design), or both a PMR group and a comparator non-PMR group (diagnostic case-control design), with systematic application of imaging test(s). Expert (rheumatologist) diagnosis was the minimum acceptable reference standard. Diagnostic accuracy data had to be extractable in 2×2 format (true positives, true negatives, false positives, false negatives). Non-systematic review articles, case reports and case series of less than 10 patients were excluded. No language restrictions were made. Case reports were excluded by the reviewers manually, rather than by using filters. Meeting abstracts ( previous 2 years of British Society for Rheumatology (BSR), European League against Rheumatism (EULAR) and American College of Rheumatology (ACR) conferences) were also screened, and experts in the field were contacted, to identify studies potentially in press or not fully published. Citations were exported to EndNote, duplicates removed in EndNote and results exported to Microsoft Excel. Polymyalg* AND (imaging OR ultrasono* OR sonograph* OR echogr* OR "computed tomography" OR "computer assisted tomography" OR "bone scan" OR "nuclear medicine" OR "scintigraph*" OR "PET" OR "positron" OR "MRI" OR "magnetic") Data extraction and quality assessment A study quality assessment tool, based on QUADAS-2, 18 and encompassing internal validity (risk of bias: test reliability, blinding to index test/clinical information, incorporation bias, diagnostic review bias) and external validity (relevance to our review question: participant selection, spectrum of disease and comparator condition, timing of test in relation to glucocorticoid treatment) was agreed in advance. Two reviewers (SLM and GK) independently extracted study characteristics (design, clinical spectrum, reference standard) and diagnostic accuracy data for the index test(s) of each study. Corresponding authors were contacted by email where queries arose. Assessment of methodological limitations and between-study clinical heterogeneity was guided by the study quality assessment tool. Data were entered into Review Manager V.5.2 (RevMan) and exported to Excel.

Data synthesis and analysis
For each imaging feature, where 4 or more studies were available, meta-analysis was performed in Stata SE V.12 (StataCorp, Texas, USA) with calculation of overall sensitivity, specificity and likelihood ratios (LRs) using the bivariate model, 19 and graphed using RevMan, allowing visualisation of between-study statistical heterogeneity.
Influential studies were identified by plotting Cook's distance for each study. Where fewer than four studies were available, 95% CIs for sensitivity, specificity and LRs for each study were calculated using a spreadsheet. 20 If a cell in the 2×2 table for a study contained a 0, 0.5 was added to each cell to avoid division-by-zero.
To directly compare accuracy of specific couples of tests, we used Hierarchial Summary Receiver-Operator Characteristic Curve (HsROC) modelling, 21 with test type as a covariate. We did so with paired data only (data from studies where both tests were evaluated together), to control for study-based biases. We first assessed whether the sROC's for the two tests had similar shapes (the beta parameter), as sROC's with different shapes will cross and whether one test is better than the other or not becomes threshold-dependent. 22 Where the two sROC's had similar shapes, we were able to compare overall accuracy using the α parameter (indicating proximity to the top left hand corner of the ROC space). Analysis was performed using PROC NLMIXED in SAS V.9.3 (SAS Institute, North Carolina, USA).

RESULTS
Literature searches were completed on 2 October 2013, yielding 1764 citations (figure 1). We identified 87 articles for full text review of which 23 studies from the original searches were chosen for full evaluation, with three further added on updating searches ( January 2015): 10 ultrasound scanning studies (including one published in full text on the updated search 23 ); 6 MRI studies; 1 USS and MRI study; 7 18-fluorodeoxyglucose-positron emission tomography (PET) studies (including two published in full text when the search was updated 24 25 ); 1 plain radiography 26 and 1 technetium scintigraphy 27 study. These last two studies did not meet our review inclusion criteria, one because of a lack of clear distinction between PMR and non-PMR 26 and the other because it was published in 1976 and we could not exclude the possibility that changes in definition of the diagnostic reference standard may have occurred since then. 27 Additionally, we reviewed four longitudinal studies. [28][29][30][31] Vascular imaging studies in patients with a diagnosis of PMR were also initially reviewed (six ultrasound and two PET), but subsequently excluded as the primary purpose of these studies was to diagnose giant cell arteritis in patients presenting with PMR symptoms.
Study characteristics and results of quality assessment are shown in table 2. All but one of the studies we identified used a diagnostic case-control design, which is associated with inflation of sensitivity and specificity estimates because of the 'grey cases' seen in real-life clinical practice but omitted from the study. 32 Other common sources of bias in this analysis included incomplete blinding of the person(s) performing the imaging test, diagnostic review bias (incomplete blinding of the diagnostician acting as reference standard) and spectrum bias (studies were generally conducted in academic Polymyalg* AND (imaging OR ultrasono* OR sonograph* OR echogr* OR "computed tomography" OR "computer assisted tomography" OR "bone scan" OR "nuclear medicine" OR "scintigraph*" OR "PET" OR "positron" OR "MRI" OR "magnetic") x-ray$.mp 35 or/11-34 36 10 and 35 The search was performed by combining the following search terms: polymyalgia/polymyalgic and (ultrasound or radiograph or X-ray or imaging or CT or MRI or PET or CT or isotope bone scan or positron emission tomography or MR). No language restrictions were made, in case the abstract reveals useful information.
rheumatology centres) (table 2). Table 3 summarises the accuracy of each imaging feature in PMR, using meta-analysis where appropriate. Original data used to create this table and further details regarding comparator subpopulations are found in the online supplementary. Many different abnormalities were reported by the studies, reflecting the widespread localisation of inflammation in PMR.

Accuracy of bursitis imaging (extracapsular inflammation)
Meta-analysis of four USS studies gave a sensitivity of 80% (95% CI 55% to 93%) and specificity of 68% (60% to 75%) for SAB; the same studies showed a sensitivity of 66% (36% to 87%) and specificity 89% (66% to 97%) for bilateral SAB. Examination of the HsROC plot indicates substantial heterogeneity of discrimination, with an early study 33 showing much higher diagnostic accuracy than subsequent studies (figure 2). Data on trochanteric bursitis were variable; very high sensitivity of ultrasound in an early single-centre study 34 was not replicated in a later multicentre study. 15 Pelvic-girdle symptoms were required for inclusion in the earlier study, whereas the later study required shoulder symptoms but did not require pelvic-girdle symptoms.
Other bursal sites around the hip/pelvic region (ischiogluteal, iliopsoas), while reportedly more specific for PMR than trochanteric bursitis, are technically difficult to detect using ultrasound compared to MRI. 34 Although a PET/CT study suggested inflammation around the ischial tuberosity may be informative for PMR diagnosis, the sample size was small, and thus CIs for sensitivity and specificity are wide; 35 sensitivity on an earlier MRI study was only 25%. 34 Similarly, inflammation (bursitis) between posterior vertebral elements, detectable by PET/CT 25 35 or MRI, 36 37 appeared to be highly specific compared to age-matched controls without inflammatory rheumatic disease, but may also be observed in RA; 25 most of the RA comparator patients were taking prednisolone (D Camellino, personal communication, January 2015). PET/CT can also identify iliopsoas (iliopectineal) bursitis, sometimes seen in RA as well. 24 Accuracy of imaging intracapsular inflammation and fluid around long head of biceps tendon Synovitis at shoulder (glenohumeral) or hip (coxofemoral) joints, and fluid around the long head of biceps tendon (which is related to synovial inflammation, since this space is synovium-lined and also communicates with the glenohumeral joint itself ), were reported by several studies. Combining the ultrasound studies, glenohumeral synovitis had a sensitivity of 62% (95% CI 46% to 76%) and specificity of 58% (45% to 69%), and hip synovitis had a sensitivity of 33% (24% to 43%) and specificity of 78% (66% to 87%). MRI and PET/CT were much more sensitive for detecting hip synovitis in PMR, but with a loss of specificity.
Comparison with RA Comparison with RA may identify imaging features specific to PMR and not seen in other inflammatory joint diseases. Two ultrasound studies recruited only patients with RA as controls. In both studies, to minimise the risk of misclassification, cases and controls were selected on the basis of already having an established diagnosis of (treated) PMR or RA. Methodological quality was difficult to assess in the earlier study, 38 but the later study 39 recruited only relapsing patients with new-onset bilateral shoulder pain; the authors reported in correspondence with us that low-dose prednisolone treatment did not seem to affect the ultrasound findings. It is difficult to recruit large numbers of patients with untreated RA and elevated inflammatory markers. One PET/CT study recruited 10 untreated RA patients 24 but in another, the RA patients were on treatment. 40 Combined features (defined by the provisional ACR/EULAR classification criteria for PMR In the provisional ACR/EULAR classification criteria for PMR, 15 ultrasound features of inflammation were         15 This use of OR had the effect of increasing sensitivity of the criteria. Second, based on the regression modelling used to define the final classification criteria set, one point was allocated for bilateral shoulder region inflammation, and one point for shoulder region inflammation plus hip region inflammation. This requirement for two regions involved had the effect of increasing specificity of the criteria. Bilateral shoulder region involvement had a sensitivity of 59% (50% to 68%) and specificity 57% (49% to 65%), whereas having one shoulder and one hip involved had a sensitivity 33% (26% to 42%) and specificity 84% (77% to 89%). 15 This may reflect the requirement for shoulder symptoms for inclusion of both patients and controls, whereas PMR characteristically causes symptoms at shoulders as well as hips. A later study produced similar sensitivity/specificity data, with the caveat that its control population was patients with early RA, and the completeness of sonographer and diagnostician blinding to each other's findings was unclear. 23 Change with treatment Comparison of before-treatment and after-treatment findings was reported for musculoskeletal USS, [28][29][30] MRI 31 and FDG-PET. 41 After 4 weeks of glucocorticoid treatment, shoulder USS normalised in half of the patients who had had bilateral USS abnormalities before treatment, and this persisted to 6 months. 28 In a second study, 11/24 patients had power Doppler signal (indicating microvascular hyperaemia, and suggesting chronicity of inflammation) in at least one shoulder structure; this was present in only 1/24 patients at 6 months. PMR still had abnormalities in shoulder ultrasound at 6 months compared to a group of 21 'normal' patients, but this was not seen for hip ultrasound findings. 30

Prognosis
In 57 patients with PMR, the presence of power Doppler signal prior to treatment in articular/periarticular shoulder structures significantly predicted PMR relapse/recurrence after 6 months. 29 Direct comparison of test accuracy using paired data Only two couples of tests had sufficient paired data (≥4 studies) for our analysis. The paired comparisons were only made where the relevant tests were carried out on cases and controls in all studies, so the same cases and controls had both tests. Ultrasound detection of bilateral SAB was compared to ultrasound detection of hip synovitis, but the two sROC's had different shapes and comparison of overall accuracy was not possible. Ultrasound detection of subacaromial-subdeltoid bursitis was compared to ultrasound detection of glenohumeral synovitis. The two sROC's had similar shapes, and we found ultrasound detection of SAB to be significantly more accurate than ultrasound detection of glenohumeral synovitis, for the diagnosis of PMR ( p=0.004).

DISCUSSION
Our objective was to determine whether musculoskeletal imaging is accurate enough to be useful to support clinical diagnosis of PMR. Although MRI and PET/CT revealed potentially characteristic features of focal inflammation between vertebral processes and within the pelvis, only the USS studies had enough control patients with inflammatory diseases for precise estimates of diagnostic accuracy. The most informative single USS feature appeared to be bilateral SAB, with a specificity of 89% (95% CI 66% to 97%) and sensitivity 66% (43% to 87%); however, the earliest study reported much higher diagnostic accuracy than subsequent studies. In general, substantial clinical and statistical between-study heterogeneity was noted, including important biases, and therefore the absolute sensitivity/specificity estimates given here must be interpreted with great caution. The effects of between-study heterogeneity can be minimised where each study reported the same two tests; this pairwise comparison across four studies showed that SAB was significantly more accurate than glenohumeral synovitis for PMR diagnosis. This suggests that it might not be appropriate to weight these two features equally in diagnosing PMR, as has been suggested by the latest criteria set. 15 Several potential biases were identified during the quality assessment. First, all studies, except one, had a case-control study design. This would introduce spectrum bias and produce heterogeneity in the specificity estimates depending on how the controls were recruited. The only study with a diagnostic cohort design suffered from incorporation bias because the ultrasound was used to help make the diagnosis. Second, most of the reports contained little detail on how blinding was achieved and maintained. This is particularly difficult for USS, which requires close patient contact. Recruitment of two controls following each case could have compromised blinding and was associated with much higher estimates of diagnostic accuracy. 33 34 Blinding of the treating rheumatologist and the patient until after the final adjudication of reference-standard clinical diagnosis (which may be 1 year later), would require explicit patient consent and may not always have been possible. There was often insufficient detail on whether and how the patients themselves were blinded to their imaging findings, and on how frequently the treating clinician had to be unblinded or patients excluded from analysis because of unexpected findings on the scan ( particularly relevant for MRI and PET studies). Lastly, intra/inter-rater reliability of imaging test was rarely fully reported, although this was arguably unlikely to introduce a systematic bias.
Some limitations of this analysis could have made imaging appear less accurate than it really is. First, the use of binary scores ( present/absent) rather than grades of intensity of inflammation or number of sites involved is a limitation of diagnostic accuracy meta-analysis methods. Second, the necessity of using rheumatologist diagnosis as an (imperfect) reference standard; for future studies, adding a 'test of treatment' 42 might be used to improve the reference standard, since ultrasound abnormalities were associated with complete response to glucocorticoid therapy. 30 Third, it is not known whether adding power Doppler to the ultrasound might offer superior diagnostic accuracy for PMR compared to grey-scale ultrasound alone.
Overall, the accuracy of musculoskeletal imaging tests cannot currently be accurately quantified for clinical diagnosis of PMR, primarily due to the limited amount of published data and biases in the studies. The reference standard is still rheumatologist diagnosis, which may use clinical intuition rather than formal criteria; 43 we might expect that tests adding additional information, including imaging tests, might help in 'grey cases' where the clinical diagnosis is not clear-cut, but there are no studies recruiting these 'grey cases' and evaluating them without incorporation bias. Finally, if the prognostic value of imaging were known, this might also have value for clinical practice and perhaps even for patient classification. This type of evidence would help determine the optimal place of imaging tests in diagnostic care pathways for patients with suspected PMR.