There are increasing needs for detailed real-world data on rheumatic diseases and their treatments. Clinical register data are essential sources of information that can be enriched through linkage to additional data sources such as national health data registers. Detailed analyses call for international collaborative observational research to increase the number of patients and the statistical power. Such linkages and collaborations come with legal, logistic and methodological challenges. In collaboration between registers of inflammatory arthritides in Sweden, Denmark, Norway, Finland and Iceland, we plan to enrich, harmonise and standardise individual data repositories to investigate analytical approaches to multisource data, to assess the viability of different logistical approaches to data protection and sharing and to perform collaborative studies on treatment effectiveness, safety and health-economic outcomes. This narrative review summarises the needs and potentials and the challenges that remain to be overcome in order to enable large-scale international collaborative research based on clinical and other types of data.
- dmards (biologic)
- autoimmune diseases
This is an Open Access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/
Statistics from Altmetric.com
There is increasing need for detailed real-world data on rheumatic diseases and their treatments.
This need goes beyond clinical data and calls for enrichment of data in the clinical registers through linkages to other data sources and for collaborative observational research across national borders.
Such enrichment and collaboration come with legal, logistic and methodological challenges.
Through collaboration between rheumatology registers on chronic inflammatory arthritides in Sweden, Denmark, Norway, Finland and Iceland, we hope to address these challenges and to study treatment effectiveness, safety and health-economic outcomes in rheumatoid arthritis, axial spondyloarthritis and psoriatic arthritis.
The need for real-world data from patients with inflammatory joint diseases
With a prevalence of 1%–2% in the general population and lifetime risks of at least 1 in 20, chronic inflammatory joint diseases including rheumatoid arthritis (RA), ankylosing spondylitis (AS), other spondyloarthritides (SpA), including psoriatic arthritis (PsA), represent a significant burden for afflicted individuals, for healthcare and for society at large, whether measured as pain, functional impairment, healthcare resource utilisation or costs.
The therapeutic approaches to chronic inflammatory joint diseases have changed substantially over the last two decades. A growing number of treatment options have enabled increasingly ambitious treatment goals, but also lead to complex treatment patterns and concerns regarding their costs. To determine the optimal treatment in the clinical setting, including its value for the individual and for society, studies assessing effectiveness, safety and long-term outcomes of different treatment options in different treatment contexts are necessary. A better understanding of the heterogeneities, comorbidities and societal outcomes of the treated diseases themselves is also required to enable individualised treatment.
Randomised controlled trials (RCTs), while still the gold standard for efficacy studies, often provide insufficient evidence to inform clinical practice as their size, strict inclusion and exclusion criteria, restricted treatment options and follow-up times typically preclude inferences regarding the long-term safety and effectiveness. Furthermore, RCTs have limited power to provide safety evidence with regard to rare events and the performance of treatment of patients not fulfilling the entry criteria for RCTs. Thus, some of the clinically most relevant questions are virtually impossible to address in a randomised controlled setting.
For all of the above reasons, we are increasingly facing situations where large-scale observational studies based on real-world data are needed. To this end, clinical rheumatology registers have been established in many countries, either as disease registers, such as an early RA register, or as registers to specifically monitor treatment, such as a biologics register, or as both.1–5 Rheumatology has been at the forefront in establishing population based regional or national longitudinal clinical disease registers (in the Scandinavian countries: SRQ/ARTIS, DANBIO, NOR-DMARD, ROB-FIN and ICEBIO). Detailed information about each of these clinical rheumatology registers has been published previously1 6–12 and summarised in table 1.
The need for enriching clinical register data through linkage to other registers
Clinical registers provide a potential for large-scale data at a level of clinical detail (eg, specific clinical metrics, such as Disease Activity Score 28) that is often much higher than that found in administrative or claims data. Similarly, clinical registers may provide a unique source for patient-reported outcome measures (PROMs) data. To be sustainable in the long run, clinical data collection for a register needs to be slim enough to fit (and benefit) clinical practice. Conversely, there is a limit with regard to the amount and nature of information that lends itself to collection in the clinical setting. While clinical registers are ideal to collect information that are also relevant and available at the clinical visit, clinical registers are often not the ideal vehicle for the collection of rare and unexpected events or factors that ‘occur’ outside of rheumatology (ie, may not be known to the rheumatologist, such as cost data) or for events for which there is considerable recall bias. For these latter types of events, obtaining data from other data sources can both bring down the burden of data collection in the clinical register and offer ‘objective’ measurements devoid of subjectivity (eg, cost, or work ability) or recall bias (eg, drug prescription data), at a high level of completeness.
In settings with other national registers available, such as in the Nordic countries, the clinical rheumatology registers may be enriched via linkages to such other population-based registers and ‘complete’ registers. Examples of such are registers on cancer incidence, mortality or work force participation. Existence of personal identifiers enables deterministic record linkage. Table 2 outlines different types of external data sources in each of the Nordic countries which are possible to link to the clinical rheumatology registers. As evident from the table, such linkages entail considerable administrative preparation, with approval from multiple authorities and a waiting time varying between several months and more than a year. Importantly, linkage to external register holders often puts additional restrictions on what the linked data may be used for, how they can be accessed and whether (if physically accessible) they may be exported. All of these processes and restrictions vary across country, to some extent across different public register holders within each country and also over time. Currently, there is no such thing as a ‘push-the-button’ mechanism to link all clinical data to all of the national data sources listed in table 2. Rather, work in this field still relies on practical experience from each of the registers and register holders’ modus operandi.
Linkage to external data sources offers additional benefit, namely, the possibility to assemble general population, or disease-specific comparator cohorts. For instance, for every individual in the clinical register, 10 (or 100 or 1000) general population comparator subjects may be sampled and then subjected to additional register linkages. Such a general population cohort will provide the possibility to contextualise any differences in risk among patients with a certain disease (say, treated with X instead of Y) to any risks associated with merely having (vs not having) the disease at all. Compare, for instance, the large increase in risk of tuberculosis in biologics-treated RA versus biologics-naive RA versus the only moderate risk increase of tuberculosis in biologics-naive RA versus the general population, or any marginal risk increase of malignant lymphomas in tumour necrosis factor inhibitors (TNFi)-treated versus TNFi-naive RA versus the clear increase in lymphoma risk in biologics-naive RA versus the general population.13–15
The need for collaborative observational studies and the need for data harmonisation
In a number of situations, collaboration across registers is necessary. For instance:
Studies of rare treatment exposures.
Studies of rare outcomes.
Studies at a maximal phenotypic resolution, acknowledging that individualised treatment rather than treatment on average should not only focus on the treatment but also on the characteristics of the treated disease, a fact that rapidly decreases statistical precision in the specific subset of individuals with those characteristics.
While collaborative studies are increasingly needed, they come with specific challenges, at several levels. A proactive approach to these challenges is absolutely vital for the success and interpretability of the outcome of any collaborative study.
First, there is heterogeneity in the primary clinical data collection in the clinical registers in terms of what is collected and how it is defined and collected.16 Harmonisation at this level can result in changes in the primary data collection, or in its subsequent coding or categorisation. Examples of such harmonisation efforts include the European League Against Rheumatism (EULAR) Task Force on RA data collection in clinical practice.17 It should be pointed out that harmonisation does not mean that all registers need to collect the same and only the same variables, only that core elements of the data collection should be defined in a way that ensures comparability or translatability across registers. Heterogeneity regarding population background risks between countries has also to be taken into account. A good example is the recent collaborative analyses on malignant melanomas and lymphomas under the umbrella of EULAR.18
Second, enrichment of the raw data in a clinical register through linkage to external registers should be comparable. Since such external data sources (eg, a national cancer register) are seldom amenable to changes in their primary data collection, harmonisation at this level will largely be about harmonising algorithms with which these data are curated. For instance, in a multicountry drug safety study of myocardial infarction using linkage of clinical RA treatment data to hospital data on myocardial infarction, harmonisation may be about defining what is meant by a ‘myocardial infarction’ in each of these hospital registers. ‘Myocardial infarction’ may, for instance, comprise various combinations of unstable angina, ST-segment elevation and non ST-segment elevation infarctions and include or exclude sudden cardiac death.
Third, also the analytical protocols need to be harmonised. In the above example of myocardial infarction, such harmonisation will ensure that, for instance, the risk windows during which each study subject is considered to be at risk for a myocardial infarction following a specific antirheumatic treatment are the same across all participating sites or countries, or that adjustment for demographics and comorbidities are performed in a comparable manner across sites or countries. Harmonisation at this level will require a reasonably detailed understanding of the data to be included and must therefore be a joint effort across all collaborators.
Even with perfect harmonisation, not all data sources may provide information on all the desired variables, a fact that effectively may preclude identical analyses to be performed. For instance, say that Register I holds information on covariates A, B and C, Register II holds information on covariates A, B, D but not C, and Register III holds information on A, C, E but neither B nor D (figure 1). To run one and the same model across these three registers would mean a model only containing variable A. Within each register, however, more elaborate models (each including three co-covariates) can be run. The trade-off here is whether it is preferable to let each register come up with its own ‘best’ model and apply meta-analytic techniques to weigh these ‘best’ estimates together even if the collation of relative risks across registers will no longer mean combining risk estimates from identical models, or whether a joint analysis based on fewer but identical covariates is the better choice. In situations where A, B, C, D and E above all represent aspects of the same item (say, treatment response and that A=EULAR DAS28 response, B=ACR response and so on) then one way forward may be to create a new variable (‘response’) and have each register categorise individuals into responders or non-responders according to the response-metric captured in each register. Another analytical challenge occurs when the relative importance of a covariate, such as obesity, on an outcome, such as cardiovascular risk, varies across registers.
Finally, treatment channelling, or confounding by indication, is an important aspect of all observational comparative effectiveness or safety research, collaborative or not. It reflects the fact that treatment allocation in clinical practice is not a random process, but determined by known and unknown factors related to the patient, to his or her rheumatic disease and other medical history, the treating physician and the treatment context. While there is no single method that effectively quantifies and eliminates confounding by indication, judicious analyses informed by hands-on experience from the very clinical practice that gave rise to the data at country level or at regional level within each country and by access to individual-level data beyond clinical data (such as data on socioeconomy) can help demonstrate the extent to which channelling is present. Different analytical techniques can be used to reduce its impact. When confounding by indication is likely to differ in magnitude (or even direction) across countries, harmonised but parallel analyses provide an opportunity for an assessment of the importance of confounding by indication, while also adding a point to the analysis plan as some variables may be completely necessary to adjust for in some countries, but of less importance in others. Epidemiologists need to consider differential selection bias by country when analysing multiregional data. The best method to combine data from various registries will therefore depend on the research question, the outcome, the presence of effect modification by country and the need of higher statistical power.
Technical, logistic and legal challenges in collaborative studies: can and should data travel?
After agreeing on a specific research question, collaboration between different registers can conceptually include different approaches (figure 2):
Analyses based on exports of harmonised, anonymised or de-identified, individual-patient level data from each register to a central database, where they are collated and analysed jointly as one data set.
Fully federated analyses, in which the curated individual-level data are analysed from a central unit and as one virtual data set, yet do not leave the local servers where they are stored.
Separate but harmonised analyses of curated data, conducted in parallel at each register. The curated data sets are analysed individually in each country based on a harmonised analysis protocol, with the results presented both individually per register and pooled through meta-analytic techniques.
There are advantages and drawbacks with each of the above alternatives. Currently, and as outlined above, cross-border transfer of data are often accompanied by uncertainties or bureaucratic bottlenecks regarding legal and logistic aspects of handling of the data and export across borders. Often, these are questions beyond the control of the individual researcher. Still, there are a series of examples that demonstrate the feasibility of this approach, at least for collaborations built exclusively around clinical register data.19–22 It should be pointed out that even anonymised or de-identified data may, by virtue of their richness, be personal data and have to be treated accordingly. The European Union (EU) General Data Protection Regulation does not seem to substantially alter the underlying premises for performing research based on register data or register linkages or the movement of data within EU countries, at least not from a northern European point of view, but this remains an important issue for close monitoring.23
While intuitively appealing, fully federated analyses (option 2) are linked to issues of whether, from a legal point of view, providing external access to the data is any different from exporting the very same data to the analysing party. If the analyses are run ‘in the cloud’, then some sort of data export must de facto have occurred. There are, however, promising technical solutions in operation that rely on transfer of scripts and interim results (aggregate-level data or parameter estimates) only, and thereby circumvent the challenge of actual or virtual data access and transfer.24 25 Running models that are based on iterative model-building across several data sets may, however, be time-consuming and limited by the information-transfer capacities in the network used.
By contrast, separate but harmonised analyses (option 3) put particular demands on harmonisation not only of the raw data but also of the statistical analysis plan to be executed at each register. Since the absence of pooling of individual-level data precludes the potential for adjusted analyses across data sets, other means to accommodate important risk determinants across data sets must be implemented (eg, stratification and standardisation, as exemplified in a recently published study26). The analytical ‘output’ to be pooled through meta-analytic approaches may vary from rates to actual relative risk estimates. Since all programming and analysis would need to be run separately at each collaborating site, this option also may also require more total work hours than the other options, where at least the central analysis would only need to be run by one designated statistician.
A Nordic initiative to facilitate collaboration across enriched rheumatology registers
With largely similar healthcare structures, the existence of national registers on medical and societal outcomes (cancer, hospitalisations, pregnancies, sick leave, other cost reimbursements and so on) and the possibility to link information across registers, the premises for collaborative large-scale register-based research including enriched clinical data are good across the Nordic countries, though not without challenges as those outlined above.
We have initiated a collaboration between the Nordic Rheumatology registers which aims to establish a standing network across Sweden, Denmark, Finland, Norway and Iceland for register research on patients with RA, AS, SpA and PsA. Taking a pragmatic and research-question–based approach, we will exploit various approaches to harmonise (1) primary data collection, (2) data management and (3) analytical protocols, and to address legal, logistic and technical challenges involved in data handling and in data sharing, both with regard to the clinical register data and to linked data from other data sources (figure 2).
The research questions that can be addressed through a collaboration such as ours can largely be divided into two categories: (1) questions that can be addressed using the clinical registers only (eg, a clinical effectiveness study of response to treatment X vs response to treatment Y) and (2) questions that can typically only be addressed using data enriched with data from other sources data (eg, long-term malignancy risks with treatment Z). A logical first step before embarking on specific comparative effectiveness/safety projects is to characterise and compare the patient populations across countries27 and to assess the relative uptake of different therapies in each country.27 28 Even with the similarities across the Nordic countries, heterogeneity is to be expected. Harmonisation of clinical input data is therefore a prerequisite, both regarding the definition and the collection, as well as of the study protocol. For each specific subproject launched within our collaboration, a prespecified statistical analysis protocol must be agreed on, and exposure, outcome and covariates need to be clearly defined. Besides study-specific definitions and analysis plans, this work will eventually result in a library of ‘standard’ and generic definitions for data harmonisation regardless of specific research question.
So far, our collaboration has begun to demonstrate similarities and differences across the Nordic countries with regard to biological therapies used in AS/SpA28and PsA29 and in the choice of biological therapies in patients with a history of cancer.30 Ongoing projects include studies of infection risks with newer type of biologics in RA, risks for demyelinating events with TNFi31 and birth outcomes.32
There is an increasing need for detailed real-world data on rheumatic diseases and their treatments in large patient populations. Collaboration across large, population-based clinical rheumatology registers in settings that allow for enrichment through linkages to additional sources of information represent a powerful next step in the generation of real-world evidence and can be of great value for patients, clinicians, regulators, pharmaceutical companies and other healthcare providers. In this regard, the premises for register-based collaborative clinical research in the field of chronic inflammatory diseases in the Nordic countries are particularly promising. Such collaboration comes, however, with legal, logistic and methodological challenges. In a collaboration between rheumatology registers on chronic inflammatory arthritides in the Nordic countries, we hope to enrich, harmonise and standardise the resultant data repositories to investigate analytical approaches to data coming from different sources, to assess the merits of different logistical approaches to data protection and sharing by performing collaborative studies on treatment effectiveness, safety and health-economic outcomes.
Contributors All authors contributed significantly in the preparation of the review and are part of this international collaboration.
Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.
Competing interests None declared.
Patient consent Not required.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
Correction notice This article has been corrected since it was first published. The 13th author’s name has been corrected to ‘Bjorn Gudbjornsson.’
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.