Objective Axial spondyloarthritis (axSpA) comprises both radiographic and non-radiographic disease. However, the paucity of specific objective measures for the disease and current classification criteria showing suboptimal specificity contribute to disease heterogeneity observed in clinical practice and research. We used a historical cohort of patients with axSpA to assess sources of heterogeneity.
Methods The study involved 363 axSpA probands recruited from membership of the Swiss Ankylosing Spondylitis Patient Society. Participants underwent examination by a rheumatologist, completed questionnaires and provided blood samples for HLA typing. Patients underwent radiography of sacroiliac joints and were categorised according to the New York (NY) criteria (ankylosing spondylitis (AS) or non-radiographic axSpA (nr-axSpA)) and HLA-B27 status. Genetic characterisation by single nucleotide polymorphism microarray was performed and AS polygenic risk scores (PRS) were calculated.
Results Considerable heterogeneity was observed. The male to female ratio for AS (NY+) was 3:1, but 1:1 for nr-axSpA. For HLA-27(+) AS, the ratio was 2.5:1, but nearly 1:1 for HLA-B27(−) disease. Women with nr-axSpA had strikingly lower mean PRS and lower HLA-B27 prevalence than men with nr-axSpA or NY(+) male and female patients with AS. PRS was able to distinguish male but not female patients with nr-axSpA from related healthy first-degree relatives. Radiographic sacroiliitis was strongly associated with HLA-B27, especially in men.
Conclusion Women clinically diagnosed with axSpA but without radiographic sacroiliitis as a group have a disease that is distinct from AS by the modified New York criteria overall and from nr-axSpA in men. Given the high degree of heterogeneity, stratified or adjusted analysis of effectiveness studies is indicated, taking genetics, sex and radiographic damage (sacroiliitis) into account.
- Ankylosing Spondylitis
- Polymorphism, Genetic
Data availability statement
Data are available upon reasonable request.
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.
What is already known about this subject?
Ankylosing spondylitis (AS), now called axial spondyloarthritis (axSpA), comprises both radiographic and non-radiographic disease (nr-axSpA).
The new broader axSpA concept encompasses a heterogeneous group of conditions.
What does this study add?
Women with nr-axSpA as a group have a disease that is distinct from AS by the modified New York criteria and from men with nr-axSpA.
How might this impact on clinical practice or further developments?
To be meaningful, clinical axSpA studies should be analysed taking genetics, sex and structural damage into consideration.
Findings are also relevant for patient education and counselling.
Ankylosing spondylitis (AS) is quite a common and frequently familial inflammatory rheumatic disorder that is largely genetically determined and strongly associated with the HLA-B27 allele. The estimates of the prevalence of ‘radiographic AS’, as defined by the modified New York (mNY) criteria, range between 0.1% and 0.4%, mostly influenced by the frequency of this genetic factor in the population.1–8
During the last few decades, the concept of AS has widened. It has been realised that radiographic sacroiliitis by no means is an early or obligatory manifestation of the disease.9 10 This has led to the concept of axial spondyloarthritis (axSpA), comprising both radiographic AS and non-radiographic axSpA (nr-axSpA). Nowadays, the disease is often called axSpA, a term we use to clearly refer to the full spectrum of the disease. The ASAS has developed the criteria for axSpA as a single disorder.11–14 These criteria are primarily intended for classification purposes and were not developed or validated for use in the diagnosis of individual patients. However, in daily practice the distinction might be less clear. Current criteria lack sensitivity and in particular specificity, although the ongoing CLASSIC study, which aims to improve the performance of the ASAS criteria, is underway.13 In the absence of a gold standard for a disease, look-alike conditions may give rise to false-positive diagnoses and induce or increase heterogeneity among patients diagnosed with conditions such as axSpA. Clinical heterogeneity might cause unwanted disparities in aetiopathogenesis, prognosis, outcome and response to treatment with conventional medications or treatment with biologicals.15 16 In this study we aim to assess potential sources of heterogeneity by using data from a family study that originated in 1985, that is, long before the concept of axSpA was coined. This implies that our data cannot be confounded by the introduction of the more recent classification criteria.
The study presented here is based on the findings of the 1985 baseline study and the 2019 follow-up study.
The baseline study required five steps. First, in 1985, all members of Schweizerische Vereinigung Morbus Bechterew, the nationwide Swiss Ankylosing Spondylitis Patient Society, and their first-degree relatives (FDR) were invited to participate in a family study that was performed in centres spread all over Switzerland. Informed consent of participants was obtained. Before becoming a member of the patient society, for all 363 probands the diagnosis AS (now axSpA) had been established by a Swiss rheumatologist. A total of 1178 persons consented to participate and completed questionnaires on disease manifestations. Second, the clinical diagnosis was established based on the clinical history and examination by the rheumatologist at the study centre who was blinded to any radiographic findings. All participants underwent physical examination of their axial and peripheral joints. Third, blood samples were drawn for HLA-A, HLA-B and HLA-C typing and peripheral blood nucleated cells (PBNCs) were stored in liquid nitrogen. Fourth, pelvic radiographs were taken to assess the presence of sacroiliitis. Consenting non-pregnant participants aged ≥18 years underwent pelvic radiography unless a recent radiograph was available. Fifth, sacroiliac (SI) joints were scored according to the mNY criteria. All 1081 pelvic radiographs of 360 probands and 713 FDR and 8 spouses were assessed twice by each of four experienced readers, that is, a total of eight (sometimes nine) blinded readings for each SI joint. This could be performed only once for 46% of the 360 radiographs of the probands and 3% of the 713 radiographs of the FDR because these radiographs were only available on-site for a few hours at the time of participants’ physical examination at the local hospital. All the readers were unaware of participants’ clinical findings and HLA-B27 status. Overall, 17.2% of 1081 radiographs were read once, 0.4% two to four times, 3.2% five to seven times and 79.2% eight to nine times. The sacroiliitis score ranged from 0 (normal) to 4 (ankylosis) for each SI joint assessment by a reader as per the mNY scoring system.17 Scores for a single SI joint were added and divided by the number of assessments (range 1–9). Scores below bilateral grade 2.0 were considered not fulfilling the mNY criteria, as did below unilateral grade 3.0 sacroiliitis. Therefore, probands were categorised according to the mNY criteria as AS if these criteria were met and as nr-axSpA if the radiographic mNY criteria were not fulfilled.17 Please note that this classification was done long before the introduction of the ASAS criteria or the availability of MRI scanning.11 12 Furthermore, at the group level, mean SI grades were calculated by sex, HLA-B27 and radiographic status. Interobserver and intraobserver reliability were assessed for five observers by evaluating a subset of 243 pelvic films. Observers read the films twice in sets of 40–50 radiographs. The interval between both readings was ≥7 days.18
The 2019 follow-up study comprised three additional steps. First, in January 2018, the ethics committee approved the follow-up study. Second, participants (n=485, including 125 probands) who had provided written informed consent to use their PBNCs for genetic analysis were mailed a postal questionnaire on their health status. Third, calculating polygenic risk scores (PRS), the DNA of 679 participants (226 probands and 453 healthy FDR) of the baseline study was extracted from PBNCs (stored since 1985) and genotyping was performed using the Illumina CoreExome single nucleotide polymorphism (SNP) microarray, as previously reported.19 20 The SNP genotype data were then used to calculate individual PRS, using a model developed in European-descent AS case and healthy control cohorts.20 In brief, the PRS is a numeric score that reflects an individual’s estimated genetic predisposition for a given trait and can be used as a predictor or diagnostic biomarker of the disease of interest (here AS/axSpA). SNP data from 14 337 unrelated European-descent healthy controls also genotyped using Illumina CoreExome arrays as previously reported were used as unrelated healthy control data,20 and unaffected FDR (after 35 years of follow-up) were used as related healthy controls. SPSS was used to perform area under the curve (AUC) calculation and statistical analysis.
Patient and public involvement
Two patients/coauthors were fully involved in the study.
Altogether 363 axSpA probands participated in the family study (table 1 and figure 1). We report on heterogeneity among probands due to genetics (HLA-B27 status and PRS), sex and severity, defined as structural damage to the SI joints (by the mNY criteria). Occurrence of axSpA among FDR is reported elsewhere.21 The mean age of the 249 male and 114 female probands was 44.19±11.1 years. Table 1 shows separately for men and women the number of probands by presence or absence of radiographic damage of the SI joints (by the mNY criteria) and by HLA-B27 status. The table also shows for each group the prevalence of chronic inflammatory back pain (by the Calin criteria22) and the mean grade of sacroiliitis (mNY grading). Considering reading of the pelvic radiographs, the interobserver and intraobserver reliability coefficients were 0.865 and 0.903, respectively.
There are remarkable differences in the male to female ratios (figures 1 and 2 and table 1). Among those with AS, the sex ratio is about 3:1, but for probands categorised as nr-axSpA it is about 1:1 (AS 204 men/69 women vs nr-axSpA 43 men/44 women; p=0.00001, OR=3.03, 95% CI 1.83 to 4.99) (table 1). Therefore, there is a significant association between male sex and presence of sacroiliitis by the mNY criteria. Within the mNY(+) group there is no significant association between sex and HLA-B27 status (HLA-B27(+) 185 men/62 women vs HLA-B27(−) 16 men/6 women; p=0.82, OR=1.12, 95% CI 0.42 to 2.99). The same holds for the nr-axSpA group (HLA-B27(+) 32 men/26 women vs HLA-B27(−) 10 men/18 women; p=0.09, OR=2.22, 95% CI 0.87 to 5.62) (table 1). In the nr-axSpA group the sex ratio for HLA-B27(+) men and women is 1.2:1 (32 men and 26 women), whereas it is 0.6:1 (10 men and 18 women) among HLA-B27(−) patients (table 1 and figures 1 and 2). Furthermore, among the probands with nr-axSpA, the prevalence of the HLA-B27 allele is non-significantly lower among women compared with men (26/44 or 59.1% vs 32/43 or 74.4%, p=0.13).
Irrespective of the radiographic status, there is a significant association between the presence of HLA-B27 and male sex, as 217 of 243 (89.3%) male probands are HLA-B27(+) compared with 88 of 112 (78.6%) female probands (p=0.007, OR=2.28, 95% CI 1.24 to 4.18). The sex ratio is 2.5:1 for all 308 HLA-B27(+) probands (men/women 219/89) and about equal (1.1:1) for all 50 HLA-B27(−) axSpA probands (men/women 26/24) (table 1). Among all 355 probands with known HLA-B27 and radiographic status, women relatively more often have nr-axSpA than men (44/112 or 39.3% vs 42/243 or 17.3%; p=0.00001, OR=3.10, 95% CI 1.87 to 5.13).
Table 2 provides the HLA-B27 carriage status and the mean PRS for axSpA, by sex and radiographic mNY status. Table 3, figures 3 and 4, and online supplemental figures 1 and 2 provide a comparison of HLA-B27 status and PRS values and discriminatory performance in terms of AUC in receiver operator characteristic analyses. Considering initially the clinical diagnosis of axSpA, the PRS was significantly lower for women than for men (0.293 vs 0.372, p=0.043). In contrast, no difference was observed among AS cases (0.418 vs 0.377, p=0.28; table 2). In patients with nr-axSpA, however, a higher proportion of women were HLA-B27(−) (women 16/32 vs men 5/24, p=0.030 Fisher’s exact test), and women had a lower PRS (0.104 vs 0.322, p=0.029) (table 2). Overall, PRS was lower in patients with nr-axSpA than those with AS (0.198 vs 0.387, p=0.000004), but this difference was restricted to women with nr-axSpA (0.104 vs 0.418 in women with AS, p=0.00021), with the PRS of male patients with nr-axSpA being no different from those with AS (0.322 vs 0.377 in those with AS, p=0.29).
Consistent with this, the PRS was able to distinguish female AS from female nr-axSpA cases (AUC=0.711, p=0.00085), but not male AS from male nr-axSpA cases (AUC=0.477, p=0.75). The PRS was not able to distinguish female nr-axSpA cases from female healthy FDR (AUC=0.498, p=0.98), whereas the PRS had good discriminatory capacity considering male nr-axSpA cases and male healthy FDR (AUC=0.717, p=0.00053). The AUC for female nr-axSpA compared with either related or unrelated female healthy controls was significantly lower than for male nr-axSpA (p=0.013 and p=0.049, respectively), whereas for AS the scores were not significantly different. Among patients with AS, the PRS had good discriminatory capacity in relation to healthy FDR (AUC of 0.772 overall, 0.774 in women and 0.786 in men) and excellent discrimination compared with unrelated healthy controls (AUC of 0.960 overall, 0.969 in women and 0.958 in men). This indicates that women clinically diagnosed with axSpA but who do not have radiographic sacroiliitis as a group have a disease that is distinct from AS by the mNY criteria overall and from nr-axSpA in men. In comparisons with unrelated healthy controls, the PRS performed significantly better than HLA-B27 for axSpA and AS (axSpA PRS AUC=0.869, HLA-B27 AUC=0.833, p=0.028; AS-PRS AUC 0.969, HLA-B27 AUC=0.923, p=0.0022), but not for nr-axSpA (PRS AUC=0.734, HLA-B27 AUC=0.709, p=0.63) (table 3). This indicates that the non-HLA-B27 component of the PRS was greater in patients with AS than in patients with nr-axSpA.
The new and wider concept of axSpA, comprising both AS and nr-axSpA, is important and clinically relevant. Radiographic sacroiliitis is by no means an early or obligatory manifestation of the disease. There is, however, a drawback that the wider concept increases heterogeneity of what we might consider as a single disease. Radiographic and nr-axSpA are reportedly not different regarding health status, disease activity and physical function, but they did differ in signs of inflammation; all these signs were found to be higher in patients with AS.23
Our study, starting long before the new concept was coined, shows a sizeable (~25%) proportion of probands (index patients with the disease) who have what we now call nr-axSpA (table 1 and figure 1). Of note, at baseline the diagnosis axSpA (in 1985 ‘AS’) was made clinically by the rheumatologist at the study centre. The findings on the pelvic radiograph of the participants were used for classification according to the mNY criteria, not for diagnosis. Although the concept axSpA was not yet known in 1985, the notion of ‘AS without radiographic sacroiliitis’ already existed.9 Establishing the diagnosis has always been quite possible without definite radiographic findings, for example, according to the Rome criteria (if four of the five clinical criteria are met).24 We strongly feel that the low mean SI scores in the several nr-axSpA groups (table 1) truly represent non-radiographic disease.
Our findings clearly show heterogeneity regarding the sex ratio, with a male to female ratio of 3:1 for AS by the mNY criteria, but a 1:1 ratio for nr-axSpA. We also noted considerable differences in PRS and HLA-B27 association between radiographic and nr-axSpA. The observed heterogeneity has potentially important clinical consequences. For example, due to heterogeneity it might be difficult, if not impossible, to develop appropriately performing classification criteria for the whole group with sufficient sensitivity and high specificity. Our findings suggest that including genetic analyses, such as PRS, provides a potential solution to this issue.
The observed genetic differences (PRS and association with HLA-B27) are also certainly relevant regarding differences in heredity. Recurrence of the disease is high among the offspring of HLA-B27(+) parents with AS, but rare in HLA-B27(−) families.8 21 25 In particular, disease recurrence among the offspring of female HLA-B27(+) probands is substantial.25 According to the threshold model of polygenic inheritance, the mechanism by which a continuous distribution of genetic risk leads to dichotomous trait or disease states,26 the genetic threshold for women to get the disease is increased relative to men, and women would be predicted to have higher PRS. In the current study women with AS have higher PRS than men (0.418 vs 0.377), although this does not reach statistical significance. This may in turn translate to a higher proportion of affected children of mothers with AS.
In addition to demonstrating genetic heterogeneity, particularly involving nr-axSpA, this study shows for the first time the high discriminatory capacity of an AS-PRS for men with nr-axSpA (AUC=0.881 compared with unrelated healthy controls, p<10−100). While this AUC was lower than that observed for men with AS also compared with healthy controls, it indicates that some of nr-axSpA probands who have increased PRS, which was developed using genotype data from patients with AS, will proceed to meet the radiographic mNY criteria. The study also provides further confirmation of the high discriminatory performance of AS-PRS in European-descent populations and confirms that it performs better than HLA-B27 testing alone (for AS overall, AS-PRS AUC=0.96, HLA-B27 AUC=0.913).
Heterogeneity of cohorts defined by classification criteria can significantly affect both basic and clinical research projects and, when those criteria are also employed as diagnostic criteria, adversely influence the accuracy of prediction of prognosis and treatment responses. There is considerable evidence that genetics and sex influence the clinical features of the disease. Male patients with AS have more extensive radiographic change,27–29 whereas female patients with AS have higher self-reported disease activity, similar functional incapacity and lower C-reactive protein (CRP) levels (reviewed in Rusman et al 30). In cohorts with nr-axSpA, women have been shown to have lower prevalence of objective MRI evidence of SI inflammation.31 HLA-B27(+) patients have earlier disease onset and are more likely to develop acute anterior uveitis (reviewed in Akkoç et al 32 and Brown et al 33). Evidence from the literature indicates that the heterogeneity of what we call one disease affects the efficacy of response to treatment with biologicals. In real-world observational studies of treatment of nr-axSpA with biologicals, significantly lower response rates are found among women than among men34 and shorter retention on biological treatment consistent with lower efficacy.35–37 The results are in line with randomised controlled trials of adalimumab, golimumab and certolizumab in nr-axSpA, showing lower response rates in women compared with men.38–40 Our data here, demonstrating genetic differences between axSpA patient subsets, suggest that genetics might also be used as a tool to assess and possibly predict the efficacy of treatment of axSpA, in particular of nr-axSpA, consistent with the known lower response rates of tumour necrosis factor inhibitor therapy in HLA-B27(−) AS.41 Among women with clinical features of axSpA but negative SI radiographs, the ability of the AS-PRS to distinguish between women with nr-axSpA and AS suggests that PRS may be particularly helpful in distinguishing those with true inflammatory back pain from those with other non-axSpA conditions causing similar symptoms.
Our study design has certain limitations. We address (1) the validity and confidence of clinical diagnosis, (2) the possible consequences of time frame, and (3) the accuracy of distinguishing AS (radiographic axSpA from nr-axSpA).
Validity and confidence of the diagnosis AS (now axSpA). The proband’s diagnosis was confirmed twice, initially by their local rheumatologist and subsequently again at the study centre, where the rheumatologist confirmed the diagnosis blinded to radiographic and HLA-B27 status. Based on the SI scores, probands were then categorised as AS or nr-axSpA. Objective documentation of inflammation of the SI joints is lacking for the nr-axSpA cases (MRI was not yet available in 1985). This might cause lower diagnostic confidence for nr-axSpA probands compared with AS cases. Indeed, the likelihood of a false-positive diagnosis of nr-axSpA seems higher than of AS. This translates to increased heterogeneity among nr-axSpA. However, in our view, it is unlikely that MRI of SI joints would have eliminated this source of heterogeneity. As we and others have shown, this technique lacks as yet—certainly outside experts’ centres—sensitivity and specificity. A substantial proportion of healthy individuals without current or past back pain have an MRI positive for sacroiliitis according to the ASAS definition.42 43 Further, in recreational and elite athletes, MRI revealed bone marrow edema changes meeting the ASAS definition of active sacroiliitis in 30%–41% of subjects.44 45
Possible consequences of time frame. The current spectrum of axSpA might differ from that in 1985. Although AS without radiographic sacroiliitis (now nr-axSpA) was already known9 at that time, awareness of the condition has risen considerably. This might have widened the spectrum of the disease and increased the likelihood of inclusion of look-alike conditions. This again tends to increase heterogeneity. Therefore, we may conclude that both lower diagnostic confidence for nr-axSpA cases as well as increased awareness of the disease might contribute to observed heterogeneity. This underpins the need to adjust the analysis of axSpA outcomes.
The radiographic heterogeneity of axSpA is associated with genetic differences and differences in heredity (discussed in the Discussion above). Therefore, accurate distinction between radiographic disease and nr-axSpA is important. However, there is as yet no standardised method to assess pelvic radiographs for the presence of sacroiliitis. For example, it is unclear how many readings by how many readers are needed to obtain high-quality results or how does one reach consensus avoiding His Master’s Voice bias. Intraobserver and interobserver reliability do not guarantee sufficient sensitivity and specificity. In the past, we have reported moderate sensitivity and specificity in reading pelvic radiographs for sacroiliitis: sensitivity (84.3%/79.8%) and specificity (70.6%/74.7%) for 23 radiologists and 100 rheumatologists, respectively. Training did not improve overall performance.46 In the current study, pelvic films were assessed up to nine times by up to four readers. The mean score was calculated for each SI joint. According to the central limit theory, the mean represents the true value of a certain measure better than single observations. Our quantitative scoring allowed assessing how ‘free of sacroiliitis’ patients with nr-axSpA really are (table 1). That means, the nr-axSpA group comprises truly patients with nr-axSpA (they do not have borderline sacroiliitis). Consensus judgement would not have enabled quantitative assessment. This finding supports good performance regarding sensitivity. Furthermore, we used the absence of sacroiliitis among all HLA-B27(−) FDR (of HLA-B27(+) probands) as the gold standard for sufficient specificity. Therefore, this post-hoc evaluation is in our view reassuring. However, for future studies we strongly suggest standardised blinded central reading by qualified readers with known sensitivity and specificity in assessment of sacroiliitis. Evaluation should include randomly inserted control radiographs of patients with AS and unaffected persons and assessment of observer reliability.
In conclusion, given the demonstrated heterogeneity of axSpA, clinical studies such as clinical trials should take genetics, sex and structural damage (radiographic sacroiliitis) into consideration, either in study design or retrospectively, for example by applying stratified or otherwise adjusted analysis. This heterogeneity also needs to be considered in clinical assessment and management of patients with axSpA. One size does not fit all.
Data availability statement
Data are available upon reasonable request.
Patient consent for publication
This study involves human participants and was approved by the University Hospital of Bern and the Ethical Committee of the Kanton Bern, Switzerland (#2017-00536). Participants gave informed consent to participate in the study before taking part.
We like to thank all patients, spouses and relatives for their kind cooperation and are grateful to the Swiss city and village administrations for retrieving current addresses. We also thank Hans-Ueli Rentsch* MD, Hans Valkenburg* MD, Arnold Cats* MD, Herman Kroon MD and Niklaus Gerber MD for their contributions to performing the study. Caroline Kaegi provided helpful secretarial assistance. *deceased.
Contributors All authors had full access to all of the data in the study and take responsibility for the data and the accuracy of the data analysis. Concept and design: SMvdL, MAK, HB, HvZ and MAB conceived the study. Designing the questionnaire: SMvdL, HB and HvZ. Drafting of the manuscript: SMvdL, MAK and MAB. Genetic analysis: ZL and MAB. Statistical analysis: SMvdL, ZL, MKK and MAB. Obtained funding: SMvdL and MAB. Administrative, technical or material support: HvZ and PMV. Acquisition of data: SMvdL, HvZ and PMV. Critical revision of the manuscript for important intellectual content: all authors. MAB acts in the role of guarantor.
Funding The 1985 baseline study was funded by the Swiss National Fund, Schweizer Rück Insurance and Ciba-Geigy, Switzerland. The 2018 follow-up study was funded/supported by the National Institute for Health and Care Research (NIHR) Biomedical Research Centre based at Guy’s and St Thomas' NHS Foundation Trust and King’s College London and/or the NIHR Clinical Research Facility.
Competing interests None declared.
Provenance and peer review Not commissioned; externally peer reviewed.
Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.