Article Text

Download PDFPDF

Original research
Imaging in diagnosis, monitoring and outcome prediction of large vessel vasculitis: a systematic literature review and meta-analysis informing the 2023 update of the EULAR recommendations
  1. Philipp Bosch1,
  2. Milena Bond2,
  3. Christian Dejaco1,2,
  4. Cristina Ponte3,4,
  5. Sarah Louise Mackie5,6,
  6. Louise Falzon7,
  7. Wolfgang A Schmidt8 and
  8. Sofia Ramiro9,10
  1. 1Department of Rheumatology and Immunology, Medical University of Graz, Graz, Austria
  2. 2Department of Rheumatology, Hospital of Bruneck (ASAA-SABES), Teaching Hospital of the Paracelsius Medical University, Brunico, Italy
  3. 3Rheumatology Research Unit, Instituto de Medicina Molecular, Faculdade de Medicina, Universidade de Lisboa, Lisbon, Portugal
  4. 4Rheumatology Department, Hospital de Santa Maria, Centro Hospitalar de Lisboa Norte, EPE, Lisbon, Portugal
  5. 5Leeds Institute of Rheumatic and Musculoskeletal Medicine, University of Leeds, Leeds, UK
  6. 6Leeds Biomedical Research Centre, Leeds Teaching Hospitals NHS Trust, Leeds, UK
  7. 7Health Economics and Decision Science, The University of Sheffield, Sheffield, UK
  8. 8Department of Rheumatology, Immanuel Krankenhaus Berlin, Medical Centre for Rheumatology Berlin-Buch, Berlin, Germany
  9. 9Department of Rheumatology, Leiden University Medical Center, Leiden, The Netherlands
  10. 10Department of Rheumatology, Zuyderland Medical Center, Heerlen, The Netherlands
  1. Correspondence to Dr Sofia Ramiro; sofiaramiro{at}


Objectives To update the evidence on imaging for diagnosis, monitoring and outcome prediction in large vessel vasculitis (LVV) to inform the 2023 update of the European Alliance of Associations for Rheumatology recommendations on imaging in LVV.

Methods Systematic literature review (SLR) (2017–2022) including prospective cohort and cross-sectional studies (>20 participants) on diagnostic, monitoring, outcome prediction and technical aspects of LVV imaging. Diagnostic accuracy data were meta-analysed in combination with data from an earlier (2017) SLR.

Results The update retrieved 38 studies, giving a total of 81 studies when combined with the 2017 SLR. For giant cell arteritis (GCA), and taking clinical diagnosis as a reference standard, low risk of bias (RoB) studies yielded pooled sensitivities and specificities (95% CI) of 88% (82% to 92%) and 96% (95% CI 86% to 99%) for ultrasound (n=8 studies), 81% (95% CI 71% to 89%) and 98% (95% CI 89% to 100%) for MRI (n=3) and 76% (95% CI 67% to 83%) and 95% (95% CI 71% to 99%) for fluorodeoxyglucose positron emission tomography (FDG-PET, n=4), respectively. Compared with studies assessing cranial arteries only, low RoB studies with ultrasound assessing both cranial and extracranial arteries revealed a higher sensitivity (93% (95% CI 88% to 96%) vs 80% (95% CI 71% to 87%)) with comparable specificity (94% (95% CI 83% to 98%) vs 97% (95% CI 71% to 100%)). No new studies on diagnostic imaging for Takayasu arteritis (TAK) were found. Some monitoring studies in GCA or TAK reported associations of imaging with clinical signs of inflammation. No evidence was found to determine whether imaging severity might predict worse clinical outcomes.

Conclusion Ultrasound, MRI and FDG-PET revealed a good performance for the diagnosis of GCA. Cranial and extracranial vascular ultrasound had a higher pooled sensitivity with similar specificity compared with limited cranial ultrasound.

  • giant cell arteritis
  • systemic vasculitis
  • ultrasonography
  • Magnetic Resonance Imaging

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Ultrasound and MRI have good diagnostic accuracies for cranial giant cell arteritis (GCA).

  • Ultrasound and fluorodeoxyglucose positron emission tomography (FDG-PET) have been used to detect extracranial artery inflammation.


  • FDG-PET for cranial and extracranial artery inflammation has a good diagnostic performance in GCA with a pooled sensitivity of 76% (95% CI 67% to 83%) and pooled specificity of 95% (95% CI 71% to 99%).

  • Ultrasound of extracranial (mostly including axillary artery) as well as cranial arteries has higher sensitivity and comparable specificity to only scanning cranial arteries.


  • The results of this systematic literature review will inform an international task force formulating the 2023 update of the European Alliance of Associations for Rheumatology recommendations on the use of imaging in large vessel vasculitis (LVV), and can help clinicians in choosing the most adequate imaging technique for diagnostic purposes in LVV.

  • Vascular ultrasound for GCA will accurately detect more cases if the axillary arteries are included in the scan.


Primary large vessel vasculitis (LVV) includes giant cell arteritis (GCA), Takayasu arteritis (TAK) and idiopathic aortitis. Previously, our group published a systematic literature review (SLR) and European Alliance of Associations for Rheumatology (EULAR) endorsed recommendations for the use of imaging techniques in LVV, including ultrasound, MRI, fluorodeoxyglucose positron emission tomography (FDG-PET) and CT.1 2

That SLR, containing studies until 2017, concluded that ultrasound and MRI had a good diagnostic accuracy for cranial GCA. FDG-PET studies were scarce and only assessed extracranial arteries, preventing conclusions on its diagnostic accuracy on cranial vessels.2 Furthermore, there was little data on the role of imaging for monitoring disease activity and outcome prediction.

Since the publication of the original EULAR recommendations, the use of large vessel imaging has become more widespread, leading to a noticeable increase in the number of publications reporting the use of imaging in diagnosis, monitoring disease activity and outcome prediction in LVV. Furthermore, new data on FDG-PET has been published,3 4 as well as novel studies evaluating the potential of ocular imaging techniques for diagnosing GCA,5 6 altogether adding to the existing literature. Lastly, more research has been done on the diagnostic value of extended ultrasound examinations in GCA, also including large vessels such as the axillary artery.7 8 Whether these studies yield better higher sensitivities and specificities than studies only assessing cranial arteries warrants further exploration.

Our aim was to update the previous SLR and inform the EULAR task force on new evidence on the use of imaging for diagnosis, monitoring and outcome prediction in LVV.9


Research questions

The protocol of this SLR has been published on PROSPERO (ID: CRD42022360545). The four research questions, depicted in the Population, Intervention, Control and Outcome format (PICO),10 were identical to those of the 2017 SLR. They comprised the use of different imaging techniques for LVV in (1) diagnostics, (2) monitoring, (3) outcome prediction and (4) technical aspects that should be considered when using them (online supplemental tables S1a–d). Apart from the studies on imaging techniques included in the previous SLR (ultrasound, MRI, FDG-PET and CT),2 studies on optical coherence tomography (OCT) and fluoresceine angiography (FA) were also included in this review, as studies reporting the diagnostic value of these imaging techniques had recently been published.

The population of interest was adults (≥18 years) with suspected primary LVV (for diagnostic accuracy studies) or with established primary LVV (for monitoring, prediction and technical aspects studies), particularly GCA, TAK and isolated aortitis. For diagnostic studies, the intervention was the imaging findings (=index test) and the comparator was either the clinical diagnosis or temporal artery biopsy (TAB) results (=reference standard). The outcome was test performance, including sensitivity, specificity, positive likelihood ratio (LR+) and negative LR (LR−). For monitoring studies, intervention and comparator were ‘performing imaging’ and ‘not performing imaging’, respectively, and the outcome was disease activity. For prediction studies, intervention and comparator were ‘positive imaging’ and ‘negative imaging’, respectively, with a variety of outcomes such as disease complications, cumulative glucocorticoid dose and disease activity. Lastly, studies on technical aspects compared different imaging settings regarding ‘optimal results’ (not prespecified).

Only full research articles (excluding commentaries and letters) were eligible. These included prospective cohort studies involving >20 patients, with <50% lost to follow-up. Cross-sectional studies were only included for studies of diagnostic accuracy and technical aspects. Case–control and retrospective studies were excluded.

Literature search, data extraction, risk of bias assessment

The search strategy was developed and run by an experienced librarian (LF). For the 2017 SLR, Medline, Embase, the Cochrane database and Epistemonikos were searched from inception to 10 March 2017.2 As some databases do not allow to perform a search from an exact date and manuscripts may have been published online ahead of print, or were accepted by a journal without being fully available at that time, the interval of the updated search was extended to 1 January 2016 to be sure not to miss any relevant papers. Consequently, for this SLR, the same databases as mentioned above were searched from 1 January 2016 to 16 November 2022 with the same search string used in the 2017 SLR. A new string concerning the new imaging techniques was additionally run from inception to 16 November 2022 (online supplemental table S2). Overlaps between the 2017 and updated SLR were checked manually and excluded.

The good sensitivity of the search strategy was confirmed by testing for key publications proposed by the steering committee. Two reviewers (PB and MB) independently screened all titles and abstracts using the online literature programme Rayyan.11 Subsequently, 20% of full texts of potentially eligible studies were read by both reviewers. As the agreement to include studies was higher than the prespecified minimal cut-off value (kappa>0.7), the remaining articles were split between the two reviewers for inclusion and exclusion. A similar interrater reliability exercise was performed for data extraction and risk of bias (RoB) assessment and the remaining articles were again allocated between the reviewers. Disagreement during the reliability exercise was discussed for consensus and resolved by a methodologist (SR) if necessary.

Data were extracted using a prespecified data extraction sheet as previously described.2 Briefly, this included general study characteristics, study design and details on the PICO of interest. For diagnostic accuracy studies, the number of true positives, true negative, false positives and false negatives (FN) were collected to calculate diagnostic test properties. A LR+ and LR− of >4 and <0.4, respectively, were prespecified as minimum thresholds for clinically useful imaging tests. Rules of thumb for interpreting LR+ and LR− at the bedside have been described elsewhere.12

RoB assessment was performed using the Quality Assessment of Diagnostics Accuracy Studies-2 (QUADAS-2) tool for diagnostic accuracy studies and the Quality In Prognosis Studies (QUIPS) for monitoring and prediction studies.13 14 No RoB assessment was performed for studies on technical aspects. For monitoring studies the domain ‘prognostic factor measurement”’of the QUIPS was omitted due to lack of such a factor per design. Overall RoB is depicted as ‘low’, ‘unclear’ and ‘high’ for QUADAS-2 and ‘low’, ‘moderate’ and ‘high’ for QUIPS (online supplemental tables S3a,b).

Data analysis

Meta-analyses were performed for diagnostic accuracy studies using the imaging findings as index test, and either clinical diagnosis or TAB as two separate reference standards. Ultrasound findings of different aspects of arterial inflammation, including halo sign, compression sign, stenosis and occlusion were pooled, and reported here as ‘halo sign’, because they represent different aspects of the same pathological process and because the field has converged on halo sign as the central imaging feature of GCA. If studies reported halo sign and other ultrasound features of GCA separately, the data relating specifically to the halo sign were used in the meta-analysis.

Studies were only included in the meta-analysis if they reported all findings that would allow the calculation of sensitivity and specificity. For all other diagnostic accuracy studies, as well as studies on monitoring, prediction and technical aspects, individual results are reported separately.

To conduct meta-analyses, data from the 2017 SLR2 and the current update were pooled. Random-effects bivariate generalised binomial mixed models were applied in case three or more studies were available for a certain index test and reference standard combination. These models are the adequate analytical method because they consider correlations between sensitivities and specificities and therefore estimate them together. If only three studies were available for a specific meta-analysis, univariate random-effects models were used instead. Meta-analysis was not performed if less than three studies were available. LR+/LR− was obtained using the delta method.15

We report the results of the meta-analysis both for all studies and separately for studies with low RoB. Further subanalyses were performed comparing diagnostic properties between studies assessing cranial arteries (ie, temporal, facial, occipital and maxillary arteries) only and studies assessing both cranial and extracranial arteries using imaging.

Sensitivity analyses were conducted considering studies, in which: (1) all patients were glucocorticoid naïve before imaging; (2) for ultrasound, ≥15 MHz probes were used; (3) for MRI, a 3 Tesla machine was used; (4) the clinical diagnosis was confirmed at a follow-up visit; (5) the index test and the reference standard were performed independently from one another (ie, without the assessors of one test being aware of the results of the other test). Sensitivity analyses were performed for all studies fulfilling the mentioned criteria, as well as restricted to the subset of these studies with low RoB whenever possible.

All analyses were performed using R V.4.2.1 using the ‘lme4’, ‘msm’ and ‘forestploter’ packages.


From a total of 6578 screened references, 38 eligible studies were found through the update. Together with the 43 studies identified by the 2017 SLR published, this gave 81 studies in total (online supplemental figure S1). Some of the 38 studies addressed multiple index tests, reference standards, research questions and mixed populations of GCA and TAK patients.

For GCA, 22 studies on diagnostic accuracy,3–5 7 16–33 8 on monitoring disease activity,34–41 5 on outcome prediction4 34 42–44 and 5 on technical aspects8 30 43 45 46 were found in the update. For TAK, there were no studies on diagnostic accuracy, four on monitoring38 41 47 48 and one each on prediction44 and technical aspects.46 No studies on isolated aortitis satisfied the inclusion criteria.

Diagnostic accuracy studies

Regarding diagnostic accuracy studies in GCA, 17 were on ultrasound,3 5 7 16–29 3 on MRI,5 30 31 6 on FDG-PET3 4 23 25 32 33 and 1 on FA in the update.5 No studies were found for CT. Study characteristics and RoB assessment are depicted in tables 1–3 and online supplemental table S4a,b respectively.

Table 1

Characteristics of diagnostic accuracy studies on ultrasound in GCA

Table 2

Characteristics of diagnostic accuracy studies on MRI in GCA

Table 3

Characteristics of diagnostic accuracy studies on FDG-PET in GCA


For the meta-analysis of ultrasound, with the clinical diagnosis as reference standard, and considering both the current update and the previous SLR, there were 12 (4 with low,3 7 21 22 7 with unclear5 16 18–20 23 24 and 1 with high RoB17) and 11 studies (4 with low,49–52 2 with unclear53 54 and 5 with high RoB),55–59 respectively, summing up to a total of 1981 patients included. Pooled sensitivities and specificities were 75% (95% CI 66% to 83%) and 91% (95% CI 86% to 94%) when considering all studies (online supplemental figure S2) and 88% (95% CI 82% to 92%) and 96% (95% CI 86% to 99%) when focusing on low RoB studies (figure 1). For the latter group, pooled LR+ was 20.07 (95% CI 6.23 to 64.64) and pooled LR− was 0.13 (95% CI 0.09 to 0.18).

Figure 1

Diagnostic performance of ultrasound, MRI and FDG-PET in comparison with clinical diagnosis as reference standard for the diagnosis of GCA according to low RoB studies Diagnostic performance according to all studies is depicted in online supplemental figure S3–S5. This plot only contains studies with low risk of bias. FN, false negative; FP, false positive; FDG-PET, fluorodeoxyglucose positron emission tomography; GCA, giant cell arteritis; N, number of participants; RoB, Risk of bias; TN, true negative; TP, true positive.

Pooling estimates from all studies assessing both cranial and extracranial arteries resulted in a higher sensitivity and similar specificity as compared with the analysis of only cranial arteries (see table 4). This result was also seen when filtering for low RoB studies (Sens: 93% (95% CI 88% to 96%) vs 80% (95% CI 71% to 87%) and Spec: 94% (95% CI 83% to 98%) vs 97% (95% CI 71% to 100%)) as shown in online supplemental table S5. The most frequently assessed extracranial arterial territories were the axillary (7/7 studies)3 7 21 25 26 51 54 and the common carotid arteries (4/7 studies).3 7 25 51 Sensitivity analyses revealed similar results, either when all studies were considered or when only low RoB studies were analysed (table 4 and online supplemental table S5).

Table 4

Subgroup and sensitivity analyses of all studies on diagnostic properties for giant cell arteritis using imaging as index test and clinical diagnosis as reference standard

Two studies (one with low26 and one with high RoB25) were not included in the meta-analysis due to missing data. Imfeld et al reported a sensitivity of 57% and a specificity of 97%,25 while van der Geest et al reported diagnostic test properties for the halo count (Sens: 78%, Spec: 55%) and the halo score (Sens: 78% and Spec: 61%) separately. Halo count and halo score include bilaterally the common temporal artery, the parietal and frontal branch as well as the axillary artery and score each segment with 0/1 or 0–3, respectively.26

Similar results for ultrasound were also found when using TAB as the reference standard. Five studies from the update (1 low,29 4 moderate RoB18 24 27 28 and 10 studies) from the previous SLR (1 low,60 1 moderate,54 8 high RoB),55–59 61–63 including 1126 participants, were analysed. These revealed pooled sensitivities of 73% (95% CI 61% to 83%) and specificities of 84% (95% CI 74% to 90%).


For MRI, with the reference standard defined as clinical diagnosis, there were three new studies (two low,30 31 one moderate RoB5) and five studies from the previous SLR (one low,64 three moderate65–67 and one high RoB68) included in the meta-analysis (492 participants). All these studies assessed exclusively cranial vessels. Pooled sensitivity was 82% (95% CI 76% to 86%) and specificity was 92% (95% CI 84% to 97%) for all studies and 81% (95% CI 87% to 89%) and 98% (95% CI 89% to 100%) for studies with low RoB, respectively (figure 1 and online supplemental figure S3). For the group of studies with low RoB, pooled LR+ was 49.67 (95% CI 7.09 to 348.1) and pooled LR− was 0.19 (95% CI 0.12 to 0.31). Sensitivity analyses assessing only studies using 3 Tesla MRI, or studies in which the index test and reference standard were performed independently, confirmed the primary result (table 4, online supplemental table S5). No new studies on diagnostic properties of MRI with the reference standard TAB were found.

Fluorodeoxyglucose positron emission tomography

Four new studies (three with low,3 4 32 one with unclear23 RoB) and one study with low RoB69 from the previous SLR on the value of FDG-PET in regard to the clinical diagnosis were included in the meta-analysis (n=259 participants). Pooled sensitivities and specificities were 80% (95% CI 70% o 97%) and 91% (95% CI 67% to 98%) for all studies, respectively (online supplemental figure S4), as well as 76% (95% CI 67% to 83%) and 95% (95% CI 71% to 99%) for studies with low RoB, respectively (figure 1), with pooled LR+ of 14.5 (95% CI 2.21 to 94.96) and pooled LR− of 0.26 (95% CI 0.18 to 0.37). Sensitivity analyses revealed similar results (table 4, online supplemental table S5). Two studies with high RoB, that were not included in the meta-analysis due to missing data, reported sensitivities of 72%25 and 89%,33 as well as specificities of 85% and 73%.

Only two studies with low RoB reported diagnostic properties for FDG-PET with TAB as reference standard (sensitivity: 92% (95% CI 62% to 100%) and specificity: 85% (95% CI 71% to 94%),4 sensitivity: 70% (95% CI 42% to 98%) and specificity: 86% (95% CI 72% to 100%).32

FA and indocyanine green angiography

One study with high RoB evaluated the value of FA and indocyanine green angiography (ICG).5 The delay in choroidal vessel filling (FA) or the presence of non-vascularised choroidal areas (ICG) were considered as positive imaging tests. Using the clinical diagnosis as reference standard, the authors found a sensitivity of 88% (95% CI 69% to 97%) and a specificity of 74% (95% CI 49% to 91%).

Direct comparisons of imaging techniques

Four studies directly compared ≥2 imaging techniques for GCA diagnosis (online supplemental table S6a,b).

Three studies (one with low,3 two with high23 25 RoB) used both ultrasound and FDG-PET as index test. Imfeld et al25 and Rottenburger et al23 reported a lower sensitivity (57% vs 94% and 53% vs 94%) and higher specificity (97% vs 59% and 94% vs 59%) for ultrasound compared with FDG-PET, respectively. While Imfeld et al assessed only cranial arteries with ultrasound and FDG-PET, Rottenburger et al investigated only extracranial arteries using both imaging modalities. Nielsen et al3 found a higher sensitivity (91% vs 79%) but a lower specificity (79% vs 100%) for ultrasound compared with FDG-PET, assessing both cranial and extracranial arteries.

Lecler et al5 (high RoB) compared ultrasound (Sens: 86%, Spec: 94%), MRI (100%, 84%) and FA/ICG (88%, 74%) at cranial arteries, reporting the highest sensitivity for MRI and the highest specificity for ultrasound.

Monitoring disease activity

Ten prospective studies assessing ultrasound or contrast enhanced ultrasound (CEUS),34–37 47 48 FDG-PET,38–40 MR-angiography (MRA)41 and CT-angiography (CTA)41 were retrieved. Overall, all imaging techniques were found to be capable of detecting vascular changes during follow-up, indicating their potential usefulness as tools for monitoring disease activity. A summary of the evidence for each imaging modality is provided below, while detailed information of each study is reported in online supplemental table S7.


Six studies (three with low,35–37 two with moderate34 48 and one with high RoB47) investigated the role of ultrasound in GCA34–37 and of ultrasound with or without contrast enhancement (CE) in TAK.47 48

With regard to GCA, findings of the previous SLR confirmed that, after treatment initiation, the improvement34 36 37 and normalisation34 of wall thickening tended to occur earlier and more consistently in temporal arteries as compared with large vessels. Moreover, some new evidence emerged: throughout follow-up, a weak to moderate correlation was observed between the halo sign (number of halos and intima–media thickness (IMT) of vessels affected by the halo) of temporal and axillary arteries and disease activity markers, assessed through the Birmingham vasculitis activity score (BVAS), or acute phase reactants (erythrocyte sedimentation rate, C reactive protein (CRP)).35 36 Furthermore, a new ultrasound score to monitor disease activity (the OMERACT GCA Ultrasonography Score) was presented and tested prospectively, showing a large to very large magnitude of change (standardised mean difference between −1.19 and −2.16) over a 24-week period of follow-up after treatment initiation.35 This score is calculated as the sum of IMT measured in every segment (bilateral common temporal arteries, frontal and parietal branches as well as axillary arteries) divided by the rounded cut-off values of IMTs in each segment and consecutively divided by the number of segments available.

For TAK, two prospective studies showed the association of ultrasound and CEUS47 48 findings with disease activity; specifically, a reduction of the mean IMT was observed from baseline to 3 months follow-up (2.24 mm vs 1.85 mm, p=0.041). Moreover, CE in the carotid wall decreased more frequently in patients who were clinically active at baseline as compared with those who were inactive (overall, patients having a grade of CE>2 decreased from 53% to 34%, p=0.036).47 48

Positron emission tomography

Two studies with low RoB in GCA39 40 and one study with moderate RoB in both GCA and TAK38 evaluated the role of FDG-PET for monitoring disease activity.

Sammel et al40 reported a decrement of the Total Vascular Score (TVS, considering 18 different vascular segments; scale: 0–54) in both cranial and extracranial arteries 6-month after treatment initiation in newly diagnosed GCA patients compared with baseline. Moreover, Quinn et al39 demonstrated that in tocilizumab-treated patients, the PET Vascular Activity Score (PETVAS, evaluating 4 segments of the aorta and 11 branch arteries; scale: 0–27) decreased over time. Six months after treatment discontinuation, PETVAS increased in a small proportion of patients. The third study, which included both GCA and TAK patients, reported a decrease in arterial standardised FDG uptake values (SUV) among patients with an increase in immunosuppressive treatment whereas those without a change in medication showed stable target to background ratio (obtained by the normalisation of the vascular FDG uptake to liver SUV) values.38


One study with moderate RoB, including both GCA and TAK patients, presented and validated a new quantitative composite score for clinical trials using CTA or MRA. This score evaluated the presence of dilation and stenosis in 17 arterial territories.41 The authors reported a larger change of the score in patients who were clinically active at baseline as compared with those who were inactive according to the Indian Takayasu Clinical Activity Score (Δ between active and inactive patients in the imaging change score: 1.56 (0.66), p=0.020), the change of patient global assessment (3.09 (1.02), p=0.003), but not the change of National Institute of Health score (1.19 (1.0) p=0.26). Moreover, using the radiologist’s assessment as reference standard, the score showed excellent discrimination (area under the receiver operating characteristic curve=0.996, sensitivity 77%, specificity 96%) for detecting disease progression.

Outcome prediction

Five prospective studies providing information on the role of ultrasound in GCA patients34 42 43 and FDG-PET in GCA40 and in both GCA and TAK patients44 for outcome prediction were included. No data were retrieved for the remaining imaging techniques. Overall, imaging did not predict response to treatment, occurrence of relapses or ischaemic complications. The presence of FDG uptake at baseline, however, was linked with the subsequent development of angiographic changes (stenosis, occlusion or aneurysm) in both GCA and TAK (online supplemental table S8).


Three studies (two with low,42 43 one with moderate RoB34 including GCA patients only, provided information on this aspect. Over a 2-year period, no differences were found between patients with and without a reduction of IMT during follow-up concerning subsequent relapses, cumulative glucocorticoid doses and need for glucocorticoid sparing agents.34 Moreover, no differences in terms of new ischaemic complications were reported between patients with or without a halo sign at baseline.43 Lastly, a composite model, taking into account the IMT of temporal and axillary arteries as well as the bilateral presence of the halo in the same vascular territories, showed no predictive value for any of these ultrasound parameters concerning the clinical outcomes evaluated (ie, visual loss, vascular damage index, glucocorticoid doses >10 mg/day of prednisone equivalent at 6 months and/or the need for adjunctive immunosuppressants).42

Positron emission tomography

One study with low RoB, including only GCA patients, reported that a baseline TVS>10 could not predict relapses, glucocorticoid cumulative doses and CRP levels over a 12-month follow-up.40 Another study with low RoB and a mixed population of both GCA and TAK patients revealed that territories with abnormal FDG uptake at baseline were at a higher risk of developing subsequent angiographic changes (OR 19.5, 95% CI 2.44 to 156).44

Technical aspects of imaging techniques

Five studies providing information on technical aspects of ultrasound,8 43 FDG-PET45 46 and MR-A30 were included. The time points after treatment initiation and examination of vascular territories were found to be important factors determining the sensitivity of ultrasound and FDG-PET. Details of each study are shown in online supplemental table S9. No information was retrieved for the remaining imaging techniques.


Two cross-sectional studies were eligible.8 43 A subanalysis of the Temporal Artery Biopsy versus ULtrasound study evaluated the temporal artery IMT during the first week after glucocorticoid treatment initiation.43 The results showed that the IMT was greater in patients undergoing ultrasound evaluation at the day of treatment start compared with those having the ultrasound exam 1 week after glucocorticoids were initiated (r=0.30, p=0.001). The second study compared the ultrasound examination of temporal arteries and axillary arteries (visualised from the axilla) with an extended protocol additionally including the proximal axillaries and brachial, the subclavian and the common carotid arteries.8 The extended approach was more sensitive in detecting large vessel inflammation than the limited one (p<0.001).

Positron emission tomography

Two prospective studies were retrieved.45 46 In the article by Nielsen et al, 24 GCA glucocorticoid-naïve patients with a positive FDG-PET exam at baseline were included and divided into two groups undergoing a second scan either at day 3 or day 10 after starting glucocorticoids.45 While in the first group, 100% of patients still had a positive scan after 3 days, only 36% of patients in the second group showed tracer uptake after ten days. The second study included both GCA and TAK patients. More scans were considered positive with an acquisition time of 2 hours (77%) as compared with 1 hour (56%, p<0.01).46


A single cross-sectional study on GCA patients, comparing different reformatting techniques and using TAB or the clinical diagnosis as reference standard, was included.30 The multiplanar reformatting 3D contrast enhanced vessel wall (CEVW) MR had a specificity of 100% to show inflammatory changes of extracranial arteries and was more sensitive than the axial only 3D (80% vs 73%, p=0.046) and 2D CEVW MRI (80% vs 70%, p=0.03).


This SLR confirms the good diagnostic accuracy of ultrasound and MRI, and furthermore shows a comparable performance of FDG-PET with these techniques for diagnosing GCA, taking clinical diagnosis as the reference standard. Ultrasound and FDG-PET studies also included extracranial arteries in their assessment for GCA, while for MRI, data are only available on cranial arteries. While studies reporting direct comparisons of imaging techniques were few and in part contradictory, indirect comparisons suggested that ultrasound might have a higher sensitivity than MRI and FDG-PET with a comparable specificity.

Since our first SLR, containing studies until 2017,2 several meta-analyses have reported diagnostic properties of ultrasound in GCA patients. While Nakajima et al70 found similar pooled estimates (Sens: 86%, Spec: 95%) than we did (Sens: 88%, Spec: 96%), Sebastian et al71 reported a lower sensitivity (Sens: 67%, Spec: 95%) and Rinagel et al72 a lower sensitivity and specificity (Sens: 68%, Spec: 81%).

One possible explanation is that the other reviews included retrospective and prospective studies, generally leading to a more heterogenous sample with higher RoB. Rinagel et al only included studies assessing temporal arteries, which may have reduced the sensitivity to detect pathology in patients with LV-GCA compared with studies also assessing extracranial arteries.

An important result of our SLR was the observation that the ultrasound assessment of cranial plus extracranial arteries had a higher sensitivity than the investigation of cranial arteries alone, while specificity was comparable. The increased sensitivity for a GCA diagnosis can be explained by the higher number of scanned arteries and therefore higher chance of finding imaging signs of vasculitis. To improve sensitivity, it may be sufficient to assess axillary arteries in addition to temporal arteries rather than applying a more extended scanning protocol. The axillary arteries were the most frequently scanned extracranial vessels in these studies and the data suggest that there is no incremental benefit of including additional extracranial arteries.7 51 73 A separate analysis of the diagnostic accuracy of FDG-PET and MRI studies for cranial and extracranial arteries was not possible because the number of studies reporting sensitivity and specificity for cranial or extracranial arteries separately was insufficient.

We only included prospective and cross-sectional studies in our SLR to increase precision, but also to reduce the RoB, which is often high in retrospective studies. Selection bias may occur when imaging is only performed in certain patient groups (eg, dubious cases), rather than in all patients with suspected disease. Expectation bias may lead to an overestimation of diagnostic properties when the imaging assessor is also aware of the clinical symptoms of a patient (which is common in retrospective studies). Lastly, the selection bias inherent in case–control studies can lead to an overestimation of the value of the imaging technique, as controls are usually not patients with suspected GCA, but rather healthy controls or patients with other diseases, leading to an unrealistically large contrast between cases and controls. We have, therefore, opted for the stricter approach by excluding both retrospective and case–control studies. We believe that further meta-analyses in the field should have a similar approach, which is now possible, due to the large number of high-quality studies.

For ultrasound studies we pooled the results of different imaging abnormalities (ie, halo sign, compression sign, occlusion and stenosis) as they not only describe the same underlying inflammatory process, that is, inflammatory thickening of the arterial wall, but also have similar diagnostic properties.2 49 50 This reflects new insights in the field and clarity that was obtained throughout the last years of GCA research.

SLRs including meta-analysis on FDG-PET have also been published before, reporting similar results to ours. These reviews, however, used the 1990 ACR criteria and TAB rather than the clinical diagnosis as reference standard.74 75 TAB has a considerable risk of FN results, especially in GCA patients with large vessel involvement, while the 1990 ACR criteria, also relying on TAB, largely focus on cranial symptoms that are not necessarily present in LV-GCA.76–79 Moreover, when assessing the diagnostic performance of an imaging test, we are interested in its performance compared with the clinical diagnosis and not to classification criteria that are intended to select a homogeneous population for clinical trials. Classification criteria should not be used for clinical diagnosis.

Using the clinical diagnosis as reference standard has the disadvantage of potential circular reasoning as imaging data are often considered to support the diagnosis. Nevertheless, when performing sensitivity analyses focusing on studies with independent index tests and reference standards, we did not observe different results concerning diagnostic properties, which confirms the robustness of our findings.

Only a single study on ocular imaging techniques was found by the SLR, reporting lower diagnostic yields than ultrasound and MRI.5 The role of this technique, as well as that of other ocular imaging modalities such as OCT, ultra-wide-field colour fundus photography or transocular sonography in LVV still remains unclear.6 80 81

Evidence on the role of imaging for monitoring disease activity or outcome prediction in LVV is still limited but, compared with the 2017 SLR,2 some new data emerged for ultrasound and FDG-PET reporting a change of imaging signs of inflammation after treatment initiation.

Previous studies on FDG-PET seemed not to discriminate between clinically active and inactive disease, while newer studies revealed an association between FDG uptake and change in treatment in both GCA and TAK.

There is a growing evidence that the reduction of ultrasound measured IMT occurs earlier in temporal than in axillary arteries. We can only speculate whether the different ultrasonographic behaviour of temporal arteries and large vessels is due to distinct histopathological features of the arteries or to technical limitations of ultrasound assessment (ie, some persistent thickening may be present also in the temporal arteries but remaining below the technical detection limit). The latter hypothesis is supported by histology studies that report a persisting T helper 1-driven tissue inflammation in TABs despite clinical remission.82 83

Weak to moderate associations between ultrasound signs of activity and markers of disease activity were found. New data on the sensitivity to change of different imaging techniques have emerged and imaging scores for ultrasound and MRI reported good results for construct validity and reliability.35 41 However, studies using imaging as a monitoring tool, especially to assess its value in guiding treatment decisions compared with a clinical approach alone, are still missing.

The number of studies on the predictive value of imaging was low. In line with previous results,2 no association was found between imaging and relapses, cumulative glucocorticoid doses or new ischaemic complications in LVV.34 40 43 Retrospective studies, partially reporting contradictory results for the predictive value of ultrasound in regard to cumulative glucocorticoid doses and relapses, need further investigation in prospective cohorts.76 79 84

Some studies have reported on the technical aspects of LVV imaging, highlighting the importance of timely imaging after treatment initiation. They have shown that the interpretation of activity based on FDG-PET and MR-A also depends on acquisition times and reformatting methods, respectively. Timing remains a crucial factor in performing imaging: as the sooner it is performed, the more sensitive the test is. However, it is unclear what the time limit is for obtaining positive results, and whether this differs across imaging modalities.

A strength of our SLR is the relatively high number of studies with a low RoB on diagnostic properties for ultrasound, MRI and FDG-PET, allowing us to present sensitivities and specificities, together with LR+/LR− of these studies as the main results of this SLR. The results of sensitivity analyses were in line with the primary results, confirming the robustness of our data. Having excluded retrospective and case−control studies increased the quality and precision of our data. The main limitations are the lack of new literature on the diagnostic properties of TAK and the heterogeneity and low number of prospective monitoring and prediction studies.

In summary, we report good diagnostic performances of ultrasound, MRI and FDG-PET for the clinical diagnosis of GCA and a better diagnostic sensitivity when adding the axillary arteries to the ultrasound examination. Prospective cohort studies on monitoring disease activity and on the prediction of relevant outcomes are needed, as well as studies on the value of imaging for the diagnosis of TAK. This SLR will inform the task force updating the EULAR recommendations on the use of imaging in LVV.9

Data availability statement

All data relevant to the study are included in the article or uploaded as online supplemental information.

Ethics statements

Patient consent for publication


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @philipp_j_bosch, @milena_bond, @cristinadbponte, @sofiaramiro82

  • PB and MB contributed equally.

  • Contributors All authors helped designing the study. LF performed the literature search. PB and MB performed literature screening, data extraction, risk of bias assessment and meta-analysis. All authors participated in the interpretation of the data. PB and MB prepared the manuscript. PB and SR are responsible for the overall content as the guarantors. All authors critically appraised the manuscript for important intellectual content and approved the final manuscript.

  • Funding Conduct of this review was financially supported by the European Alliance of Associations for Rheumatology.

  • Disclaimer The views expressed in this article are those of the authors and not necessarily those of the NIHR or the Department of Health and Social Care.

  • Competing interests PB has received speaker fees by Janssen and project grants by Pfizer. MB has received research grants from AbbVie. CD has received consulting/speaker’s fees from Abbvie, Eli Lilly, Janssen, Novartis, Pfizer, Roche, Galapagos and Sanofi, all unrelated to this manuscript. He is an editorial board member of ARD. WAS has received speaker honoraria from Abbvie, Amgen, Bristol Myers Squibb, Chugai, Lilly, Johnson & Johnson, Medac, Novartis, Pfizer, Roche, Sanofi, and UCB; consultancy fees from Abbvie, Amgen, Bristol Myers Squibb, Chugai, GlaxoSmithKline, Novartis, Roche, and Sanofi. He is principal investigator of phase 2 and phase 3 trials sponsored by Abbvie, GlaxoSmithKline, Novartis, and Sanofi. CP is or has been the principal investigator of studies by AbbVie, Sanofi and Novartis and has received consulting/speaker’s fees from CSL Vifor, AbbVie, AstraZeneca, GlaxoSmithKline and Roche, all unrelated to this manuscript. SLM reports: Consultancy on behalf of her institution for Roche/Chugai, Sanofi, AbbVie, AstraZeneca, Pfizer; Investigator on clinical trials for Sanofi, GSK, Sparrow; speaking/lecturing on behalf of her institution for Roche/Chugai, Vifor, Pfizer, UCB, Novartis and AbbVie; chief investigator on STERLING-PMR trial, funded by NIHR; patron of the charity PMRGCAuk. No personal remuneration was received for any of the above activities. Support from Roche/Chugai to attend EULAR2019 in person and from Pfizer to attend ACR Convergence 2021 virtually. SLM is supported in part by the NIHR Leeds Biomedical Research Centre. The views expressed in this article are those of the authors and not necessarily those of the NIHR, the NIHR Leeds Biomedical Research Centre, the National Health Service or the UK Department of Health and Social Care. SR has received research grants and/or consultancy fees from AbbVie, Eli Lilly, Galapagos, MSD, Novartis, Pfizer, UCB.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.