Increased primary care use for musculoskeletal symptoms, infections and comorbidities in the years before the diagnosis of inflammatory arthritis

Objectives Little is known about relevant events in the at-risk phase of rheumatoid arthritis before the development of clinically apparent inflammatory arthritis (IA). The present study assessed musculoskeletal symptoms, infections and comorbidity in future IA patients. Methods In a nested case–control study using electronic health records of general practitioners, the frequency and timing of 192 symptoms or diseases were evaluated before a diagnosis of IA, using the International Classification of Primary Care coding system. Cases were 2314 adults with a new diagnosis IA between 2012 and 2016; controls were matched 1:2. The frequency of primary care visits was compared using logistic regression. Results The frequency of visits for musculoskeletal symptoms (mostly of shoulders, wrists, fingers and knees) and carpal tunnel syndrome was significantly higher in IA patients vs controls within the final 1.5 years before diagnosis, with ORs of 3.2 (95% CI 2.8 to 3.5), 2.8 (95% CI 2.5 to 3.1) and 2.5 (95% CI 2.2 to 2.8) at 6, 12 and 18 months before diagnosis, respectively. Also, infections (notably of the genital and urinary tracts), IA-comorbidities and chronic diseases were more prevalent in cases than controls, but more evenly spread out over the whole 6-year period before IA. A decision tree was created including all symptoms and diseases. Conclusion There was an increased frequency of primary care visits for musculoskeletal symptoms, infections and comorbidities prior to the diagnosis of IA. This diverging trend is present for 4–6 years, but becomes statistically significant 1.5 years before the diagnosis. Validation of these results is warranted.


INTRODUCTION
Rheumatoid arthritis (RA) is usually diagnosed shortly after the appearance of clinically apparent inflammatory arthritis (IA). The time between the onset of persistent joint symptoms and the diagnosis RA by the rheumatologist varies 1 ; in the Netherlands, the median duration is 4 months. 2 3 Early recognition and start of treatment improve the outcome. 3 General practitioners (GPs) can play an essential role in earlier detection of IA as they are the first professional to be consulted for health problems and all Dutch inhabitants are listed with a GP. Furthermore, the GP has a gatekeeper role and therefore refers a patient with suspected IA to the rheumatologist. GPs have a complete overview of all health problems in their electronic health records (EHRs). The unique healthcare system in the Netherlands makes it possible to study symptom and morbidity patterns before the diagnosis.
It appears that GPs mostly use classical signs of inflammation such as pain and swelling to identify those with a high probability of having IA, and that those signs are the triggers for referral to secondary care. 4 However, additional symptoms or conditions may occur ► General practitioners (GPs) can play an essential role in earlier detection of inflammatory arthritis (IA) as they are the first professional to be consulted for health problems.

What this study add?
► This is the first study that gives full details of GPs electronic records in relation to the risk of developing IA. ► Musculoskeletal symptoms, infections and comorbidities were more frequent in future IA patients than controls in the years preceding diagnosis.
How might this impact on clinical practice? ► The use of specific ICPC codes, for example carpel tunnel syndrome, may help GPs to consider referring patients at risk for IA earlier to facilitate early diagnosis and treatment.
before the diagnosis that are at that time not attributed to emerging RA, but do lead to increased ambulatory care utilisation. 5 This is underscored by a higher rate of sick leave already 8 months before the first prescription of antirheumatic drugs. 6 Also, the number of comorbid diseases at the onset of IA is higher than in a control group, however, it is not clear whether these diseases were already present before the onset of IA. 7 In the phase before clinical RA, subclinical autoimmunity and inflammation often occur for several years. 8 9 This may be related to the influence of environmental factors, such as infections or lifestyle factors. 10 However, little is known about symptoms, pathogenetic events, other diseases and their timing during this phase. 11 Also, available clues mostly come from case-control studies and studies of at-risk populations. These studies have the limitation that only selected individuals are studied, usually after referral to secondary care because of more severe symptoms. 11 12 This limitation can be obviated by studying the at-risk phase of RA in the unselected primary care setting.
The present study focuses on pre-existing symptoms and diseases that are possibly related to RA, with the goals to improve early identification of future IA patients and to identify possible pathogenetic clues. Data from EHRs of GPs from a large Dutch national database were used to answer the following research questions: (1) To what extent are musculoskeletal symptoms, infections and/or RA-related comorbidities more prevalent before the diagnosis IA compared to control patients?, (2) What is the lead time between these early symptoms or disorders and the diagnosis IA?, (3) Is it possible to identify a combination of symptoms and diseases that can be used to predict IA development?

PATIENTS AND METHODS Study population
Data were used from Nivel Primary Care Database (Nivel-PCD). 13 Nivel-PCD collects data from routine EHR systems from a representative sample of approximately 500 general practices with a total of more than 1.5 million registered patients, including information about consultations, morbidity, prescriptions and diagnostic tests. In the Netherlands, EHR systems have been used for many years, and among other things, a guideline exists to help GPs to uniformly record complete qualitative data. 14 15 Diagnoses were recorded using the International Classification of Primary Care (ICPC-1) coding system. 16 Only data with sufficient quality were used: GPs had to have recorded data at least 46 weeks of the year with at least 70% ICPC coded visits. Adult patients (≥18 years) were selected based on having a new diagnostic code of IA (ICPC code L88) in the years 2012 to 2016, where identifying only incident cases with at least 1 year (with a maximum of 6 years) retrospective follow-up. L88 includes RA, psoriatic arthritis and ankylosing spondylitis, 17 which in the ICPC coding system cannot be coded individually. In case, the start date of IA was preceded by the prescription of a disease-modifying antirheumatic drug (DMARD) and/or biological, we assumed that documentation of the L88 code might have been delayed and the date of diagnosis was set on the start date of the first DMARD or biological. Selection included: methotrexate, leflunomide, sulfasalazine, abatacept, rituximab, etanercept, infliximab, adalimumab, certolizumab, golimumab, tocilizumab, anakinra and ustekinumab. Use of hydroxychloroquine was allowed before the diagnosis of IA since this is prescribed occasionally in the at-risk phase in patients not having arthritis. Each case was matched with two controls (without IA in the past) in the same general practice based on age (±3 years), gender and duration of follow-up (depending on the registration date of the patient in a general practice, and registration of that particular general practice in Nivel-PCD).

Procedures
We used data from EHRs containing information on consultations and prescriptions before the IA-date or matched end date of the control patients in the period 2006 to 2016. Consultations are mostly physical visits of patients to the GP, but can also be consultations by telephone or a debrief from a secondary care specialist. Throughout the rest of the manuscript, the term primary care visits is used. Prescriptions are those started by the GP as well as repeat prescriptions of medication started in secondary care. We preselected a list of 192 ICPC codes (online supplemen tary table 1) deemed relevant to RA development, which included musculoskeletal symptoms, infectious diseases and RA-related comorbidities. This selection was based on biological plausibility, literature research 5 11 18 and expert opinion. On the one hand, we included codes described earlier to be associated with RA (mostly RArelated comorbidities) or hypothesised to be related (such as cardiac problems related to other autoimmune diseases like ankylosing spondylitis) and, on the other hand, we included codes that may trigger GPs to think of the diagnosis of arthritis and be more aware of its presence (such as carpal tunnel syndrome, peripheral neuritis and musculoskeletal symptoms in general). Since it is assumed that infections may trigger the development of autoimmune phenomena, we included all ICPC codes addressing specific infections in different body parts. In Nivel-PCD comorbidities and chronic diseases are coded separately, as comorbidities can be diagnosed more than once and chronic diseases only once.
The study was approved according to the governance code of Nivel-PCD, under number NZR-00314.045. Dutch law allows the use of EHRs for research purposes under certain conditions. According to this legislation, obtaining informed consent nor approval by a medical ethics committee are obligatory for this type of observational studies containing no directly identifiable data (Dutch Civil Law, Article 7:458).

Statistical analysis
We first describe the presence of ICPC codes from the four predetermined groups (musculoskeletal symptoms, infections, RA-related comorbidities and chronic diseases) in the individuals with and without a diagnosis of IA. We therefore marked per quartile of the year whether a person was given an ICPC code from a particular group or not, and then summed all the cases which were coded (one or more times) into percentages of the total number of individuals that had retrospective follow-up in that quartile. Per group, based on these numbers we calculated ORs (with 95% CI, and p values) of developing IA using univariable logistic regression analysis within the time periods 6, 12 and 18 months prior to the diagnosis (or the matched end date in case of the control individuals).
Next, we performed two different approaches to predict the development of IA based on the ICPC codes within the 12-month period preceding IA. (1) Using univariable and multivariable logistic regression analyses. The univariable analysis was corrected for multiple testing using false-positive rate control. 19 Because of their low individual frequency, the codes from the group of infections were combined into 11 groups (see online supple mentary table 1). A backwards stepwise approach was used for the multivariable analysis, ultimately leaving only those ICPC codes with a p value <0.05. This led to one multivariable prediction model containing the ICPC codes from all groups and the diagnostic performance was described using the area under the curve (AUC) of the receiver operating curve. Age and gender were included irrespective of their significance level.
(2) Using Classification and Regression Tree (CART) analysis. 20 This nonparametric statistical procedure uses hierarchical variable selection to create a decision tree, and thereby creates the best and most simple combination of variables to predict a certain outcome. In short, it examines all splitting variables (ICPC codes) and first selects the best predictor for the outcome (IA diagnosis). This process is repeated and the next steps will include the prior steps, that is, step 2 is the best predictor given the fact that the answer in the first step was taken into account, and so on. We used this approach, because it resembles the way that a GP evaluates a certain patient.

Patient characteristics
In total, 2314 IA cases with a retrospective follow-up of at least 1 year could be matched to 4541 controls (see flowchart in figure 1) from 262 practices. For 23 cases, no controls could be matched. In 53 cases, the date of diagnosis was set on the start date of the first DMARD or biological (which in all cases was the date of DMARD start, 6 cases later on used a biological), with a mean time lag of 19 months (range 1-58). In the control group, none of the included DMARDs or biologicals were used. The mean age for cases was 57.6 years (IQR 24), compared to 56.6 years (IQR 23) in the control group. Both groups contained more women than men (60%). The number of individuals with retrospective follow-up decreased further from the IA diagnosis or end point in the controls. For cases/controls, these numbers were 2314/4541, 1749/3439, 730/1430 and 172/341, respectively, at 1, 2, 4 and 6 years.
Frequency of primary care visits prior to IA diagnosis In patients receiving the diagnosis of IA, the GP more frequently coded symptoms or diseases related to the musculoskeletal system than in control patients (figure 2). A diverging trend is already visible 4-6 years before the diagnosis, but becomes more pronounced in the final 1.5 years. Unadjusted ORs for the development of IA were 3.2 (95% CI 2.8 to 3.5, p<0.05), 2.8 (95% CI 2.5 to 3.1, p<0.01) and 2.5 (95% CI 2.2 to 2.8, p<0.01) at 6, 12 and 18 months prior to the diagnosis, respectively. The differences between cases and controls remain present over the entire study period.
Individual ICPC-1 codes and their relation with IA development Univariable logistic regression analyses showed an abundance of ICPC codes across all four groups that were statistically significantly related to the development of IA. Table 1 shows the most predominant relations (ORs ≥2.4) (for a complete overview, see online supplementary table 2). As expected from the results shown in figure 2, most of these ICPC codes came from the musculoskeletal system. The most frequent symptomatic joints were the shoulders, wrists, fingers and knees. Also, carpal tunnel syndrome was more frequently present in IA cases. Notably, specific infections were not found to be increased in future IA patients. The main associated recorded chronic diseases in future IA patients were psoriasis, inflammatory bowel disease and gout (the former two as expected due to the definition of IA).
We then used all ICPC codes to build a multivariable prediction model for IA development using data within 12 months prior to this diagnosis. The AUC of this model was 0.69. Table 2 shows the top 10 ICPC codes (for the complete prediction model containing 32 ICPC codes and age/gender, see online supplementary table 3). The top ten includes both joint symptoms (general, wrist, hand and shoulder) as well as more specific diagnoses such as psoriasis accompanying psoriatic arthritis  and chronic enteritis/ulcerative colitis accompanying ankylosing spondylitis.

Classification and regression trees (CART)
A CART analysis was performed to find the best and most simple combination of ICPC codes to predict IA. The tree is shown in figure 3. The AUC was 0.64. The classification tree starts with an a priori probability of 34% of developing IA in this dataset (predefined based on the matching process). Thereafter, all nodes residing to the right indicate the symptom mentioned in the node above is present and all nodes going to the left indicate the symptom is not present. For example, the chance of developing IA would be raised to 82% if a person has both 'joint pain not otherwise specified' and 'asthma'. On the other hand, the absence of a certain variable can also lower the chance of developing IA. Figure 3 CART analysis. A CART analysis was performed to find the best and most simple combination of ICPC codes to predict IA. The a priori probability of developing IA is 34% in this dataset; thereafter, all nodes residing to the left indicate the symptom in the node above is not present and all nodes going to the right indicate the symptom is present (with coinciding percentages of developing IA). Please note that everywhere a part of the musculoskeletal system is mentioned one should read "symptom/ complaint" behind it (i.e. joint NOS, hand/finger, shoulder, leg/thigh, knee, wrist, chest, back, foot/toe and neck). CART, Classification and regression tree; IA, inflammatory arthritis; ICPC, International Classification of Primary Care; NOS, not otherwise specified.

DISCUSSION
This study shows an increased frequency of musculoskeletal symptoms preceding the assumed IA-diagnosis date, mainly in the final 1.5 years. Infections, RA-related comorbidities and chronic diseases also were more prevalent in cases than in controls; however, this trend was less clear and more evenly spread out over the whole study period of 6 years. All recorded symptoms and diseases were assembled in a classification tree resembling the way a GP would detect patients to refer to the secondary healthcare system. However, the classification tree needs to be validated (AUC 64%). The present results are in line with those of another study in which ambulatory medical care utilisation was highest in the 2 years preceding RA. 5 As in our study, this was mainly attributed to diseases of the musculoskeletal system and connective tissue, although not further specified. We found high associations of the following symptoms/locations: knee, wrist, hand/finger, shoulder and carpal tunnel syndrome. The observation of IA starting with symptoms in hands, feet or shoulders was found before, 21 but the present data suggest that GPs should also consider emerging IA in patients with chronic problems of the knee or carpal tunnel syndrome.
One of the early events in RA pathogenesis appears to be inflammation or infection of mucous membranes, such as in the gums, lung or gut. [22][23][24][25][26][27][28] Rather than a onetime initiating event, the present data support a longerterm exposure, as infections as a total group were more prevalent in cases than controls during the complete follow-up. This contradicts the finding that recent infections would have a protective effect, 29 but complements data that simultaneous development of autoimmunity and an acute phase reaction appears 4-5 years before the diagnosis of RA. 8 30 31 Infections were combined into 11 groups, of which only genital infections, urinary tract infections, and general viral/bacterial infections were significantly related to IA in multivariable analysis (to our knowledge not linked to RA before), with low ORs of 1.4-1.5.
Comorbidities of IA have been studied extensively. 32-36 Seventy percent of patients were found to have at least one chronic disease at onset of IA, which was 10% more than in control patients. 7 We also found more RA-related comorbidities and chronic diseases in cases (ORs of 1.3-1.7). Main contributors were psoriasis, chronic enteritis/ulcerative colitis, gout, iron deficiency anaemia, vitamin B12/folate deficiency anaemia, asthma and diabetes mellitus. Gout hypothetically showed a higher association due to ICPC misclassification, as gout and IA have many similarities. 37 To our knowledge, the other contributing factors have not been described before in the pre-disease phases, but only in the phase of established RA, psoriatic arthritis and ankylosing spondylitis. 33 36 38-40 On the other hand, we did not find an (expected) association with osteoarthritis 41 and cardiovascular disease, 42 and it thus remains unclear when the excess risk of osteoarthritis and cardiovascular disease starts. 43 Several musculoskeletal symptoms, infections and comorbidities that were more frequently found in IA cases have not been previously described in the at-risk phase. Although for many of these variables we still have to find a scientific rationale, we have shown that with certain ICPC combinations a high percentage, up to 100%, of individuals will develop IA. This information can help GPs to earlier select individuals at higher risk for developing IA and thus aid in earlier referral. At present, the results are not robust enough to support the implementation of a prediction rule for IA in the EHRs of the GPs without further validation studies.
Our study has some limitations. First, validity of the results for the outcome IA may be lower than compared with studies in which the diagnosis of RA is supported by fulfilment of classification criteria. By definition, the present results are partly generated by patients with psoriatic arthritis or ankylosing spondylitis, the other two constituents of IA, because no individual ICPC codes exist. However, the mean age of 57 years and preponderance of females strongly suggest that the IA group mainly consisted of RA patients. Further, the diagnosis of IA is difficult for GPs to make, since it has a relatively low frequency (estimated 6 out of 400 patients with joint symptoms). 44 This is exemplified by the fact that the IA diagnosis in a prior study has been proven to be about 71% accurate after chart review. 17 However, this is not entirely a bad thing, since it merely reflects the GP's way of evaluating patients. It is their job to differentiate patients that need referral to secondary care from those that do not, and all IA patients benefit from early detection. Second, besides the fact that the diagnosis of IA is difficult for GPs, possible ICPC misclassification between IA, gout and other forms of arthritis may have occurred. We cannot estimate the frequency of ICPC misclassification, as chart review is not feasible, but we know that multiple types of arthritis may coexist which makes things even more complicated. 45 46 Also, information is lacking about other forms of arthritis, since they have not been given their own entity in the ICPC-coding. Misclassification could have led to both over-and underestimation of the found associations in the multivariable prediction model. Third, a time lag could exist between the diagnosis IA by the GP and by the rheumatologist, and in part of the IA patients, we used the first date of DMARD use as the date of diagnosis. In this large cohort, it was not feasible to perform a full chart review including free-text fields in the EHRs to correct this. Fourth, because of the limitations of our data source, no radiographic reports, autoantibody data, or personal habits such as smoking were available. Finally, the a priori chance of developing IA in this case-control study was 34%, in contrast to a prevalence of 0.5-1% for RA in the general population. 47 Therefore, it is warranted to perform an external validation of the study results in an unselected primary care setting. In future, further classification of IA may help to unravel more details on the specific diseases that form subclassifications of the L88 ICPC code. Also, future development of the coding systems in EHRs, including for instance certain algorithms, may make diagnoses more certain and prevent a delay in recording. 17 48 49 In conclusion, musculoskeletal symptoms, infections and comorbidities were more frequent in future IA patients than controls in the years preceding diagnosis. Primary care data, mainly on specific ICPC codes recording 'new' musculoskeletal symptoms such as shoulder pain, chronic pain in the knee and carpal tunnel syndrome, may help GPs to be more aware of IA development because according to our research patients are more likely to develop IA within 1.5 years. Consequently, they can consider referring these patients which may facilitate early diagnosis and treatment. Also, a higher frequency of iron deficiency anaemia, vitamin B12/folate deficiency anaemia, asthma, diabetes mellitus, genital infections, urinary tract infections and general viral/bacterial infections have not been described before to proceed the development of IA. Future validation of the ICPC codes most associated with IA development is warranted.
Contributors Substantial contributions to the conception or design of the work; or the acquisition, analysis, or interpretation of data for the work: MvB-T, MMJN, JWRT, JK, DvS, all authors contributed to the acquisition of the data. Drafting the work or revising it critically for important intellectual content: MvB-T, MMJN, JK, DvS. Final approval of the version to be published: all authors. Agreement to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved: all author.
Funding This manuscript was included in a PhD thesis which can be found in UvA repository (URL: https://pure.uva.nl/ws/files/38672232/Thesis.pdf, chapter 6).

Competing interests None declared.
Patient consent for publication Not required.
Ethical approval information The study was approved according to the governance code of Nivel-PCD, under number NZR-00314.045. Dutch law allows the use of EHRs for research purposes under certain conditions. According to this legislation, obtaining informed consent nor approval by a medical ethics committee are obligatory for this type of observational studies containing no directly identifiable data (Dutch Civil Law, Article 7:458).
Data sharing statement Data can be shared by the authors on reasonable request.
Provenance and peer review Not commissioned; externally peer reviewed. Open access This is an open access article distributed in accordance with the Creative Commons Attribution 4.0 Unported (CC BY 4.0) license, which permits others to copy, redistribute, remix, transform and build upon this work for any purpose, provided the original work is properly cited, a link to the licence is given, and indication of whether changes were made. See: https://creativecommons.org/ licenses/by/4.0/.