Article Text

Download PDFPDF

Original research
Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset
  1. Pedro Alves1,
  2. Jigar Bandaria1,
  3. Michelle B Leavy2,
  4. Benjamin Gliklich3,
  5. Costas Boussios1,
  6. Zhaohui Su4 and
  7. Gary Curhan2
  1. 1Data Science, OM1 Inc, Boston, Massachusetts, USA
  2. 2Research, OM1 Inc, Boston, Massachusetts, USA
  3. 3Research, Noble and Greenough School, Dedham, Massachusetts, USA
  4. 4Biostatistics, OM1 Inc, Boston, Massachusetts, USA
  1. Correspondence to Ms Michelle B Leavy; mleavy{at}om1.com

Abstract

Objective Use of the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) in routine clinical practice is inconsistent, and availability of clinician-recorded SLEDAI scores in real-world datasets is limited. This study aimed to validate a machine learning model to estimate SLEDAI score categories using clinical notes and to apply the model to a large, real-world dataset to generate estimated score categories for use in future research studies.

Methods A machine learning model was developed to estimate an individual patient’s SLEDAI score category (no activity, mild activity, moderate activity or high/very high activity) for a specific encounter date using clinical notes. A training cohort of 3504 encounters and a separate validation cohort of 1576 encounters were created from the OM1 SLE Registry. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), calculated using a binarised version of the outcome that sets the positive class to be those records with clinician-recorded SLEDAI scores >5 and the negative class to be records with scores ≤5. Model performance was evaluated by categorising the scores into the four disease activity categories and by calculating the Spearman’s R value and Pearson’s R value.

Results The AUC for the two categories was 0.93 for the development cohort and 0.91 for the validation cohort. The model had a Spearman’s R value of 0.7 and a Pearson’s R value of 0.7 when calculated using the four disease activity categories.

Conclusion The model performs well when estimating SLEDAI score categories using unstructured clinical notes.

  • lupus erythematosus
  • systemic
  • outcome assessment
  • healthcare
  • epidemiology

Data availability statement

No data are available. This study was conducted using deidentified participant data compiled from multiple sources. Restrictions apply to the availability of these data, which were used under certain permissions for this study. Please contact the corresponding author (Michelle Leavy, ORCID ID 0000-0003-1927-7248) with any questions.

http://creativecommons.org/licenses/by-nc/4.0/

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.

Statistics from Altmetric.com

Key messages

What is already known about this subject?

  • Machine learning models may be used to estimate disease activity scores in rheumatological conditions, such as systemic lupus erythematosus and rheumatoid arthritis.

What does this study add?

  • A machine learning model performs well when estimating Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) score categories for patients with systemic lupus erythematosus (SLE) using unstructured clinical notes data.

  • This study appears to be one of the first uses of machine learning to estimate SLE disease activity score categories from unstructured clinical notes.

How might this impact on clinical practice or future developments?

  • The model could be used to estimate SLEDAI score categories in real-world data sources, making these data more useful for research.

  • The approach used to develop this model could be applied to disease activity measures for other rheumatological conditions.

Introduction

Composite disease activity measures, such as the Systemic Lupus Erythematosus Disease Activity Index Index (SLEDAI), are key outcome measures for assessing systemic lupus erythematosus (SLE) clinical status and response to treatment.1 The SLEDAI has been used extensively in clinical trials and, to some extent, in clinical practice. It has strong psychometric properties, is simple to score and can be used to identify patients in remission both on and off-therapy.2 High disease activity as assessed by the SLEDAI is associated with more severe disease and damage.3 Yet, despite its relative advantages, use of SLEDAI in clinical practice is inconsistent,4 and availability of clinician-recorded SLEDAI scores in real-world datasets is limited.

Increasing the availability of SLEDAI scores in real-world datasets would yield new opportunities for comparisons to trials data and research on patient treatment patterns and outcomes in a real-world setting. To date, statistical methods of imputation have been used to address missing disease activity scores in research studies,5 but patients in real-world datasets may have no recorded disease activity scores, limiting the utility of this approach. Even for patients with some recorded disease activity level data, adding additional estimates at different timepoints would provide a more complete view of the patient’s response to treatment and outcomes over time.

In recent years, significant progress has been made in classifying medical text and applying machine learning to the features produced. These techniques could be applied to the rich clinical data that exist in rheumatology clinical notes to estimate disease activity scores. This approach has been used in rheumatoid arthritis, where disease activity scores have been estimated from electronic medical records (EMR) data using machine learning, text features and laboratory values with good performance.6 Because SLE disease activity is also measured using clinical and laboratory variables, it is reasonable to consider using the information contained in the clinical record to estimate disease activity at specific timepoints for patients with SLE to increase the value of real-world data for clinical and research purposes.

The objectives of this study were to validate a machine learning model to estimate SLEDAI score categories for patients with SLE using clinical notes and to apply the model to a large, real-world dataset to generate estimated SLEDAI score categories for use in future research studies.

Methods

Participants

We used the OM1 SLE Registry, which includes over 41 000 patients with SLE from over 800 rheumatologists.7 Practices are geographically distributed across the USA. Clinical data are drawn from EMRs and include medication history and prescription information, laboratory results and diagnoses as documented by a physician, primarily from outpatient or emergency room settings. Disease activity scores such as the SLEDAI are captured in the registry when they are documented in the patient record in structured form. Unstructured, physician-documented notes are available as well. The registry includes data from 2013 to 2020. To be eligible for inclusion in the registry, patients must be followed by a rheumatologist and have either multiple diagnosis codes for SLE or prescriptions for SLE treatments plus recorded SLE-specific patient-reported outcome measures. All data in the registry are deidentified. This study was submitted for Institutional Review Board approval and determined to be exempt.

Patients in the SLE Registry with at least one clinician-recorded SLEDAI and at least one text-based clinical note associated with the date of the clinician-recorded SLEDAI score were identified and randomly assigned to either the model training cohort (69%) or the model validation cohort (31%).

Modelling strategy

Dependent variable

The model is trained to estimate four SLEDAI score categories: no activity (SLEDAI score of 0), mild activity (SLEDAI scores of 1–5), moderate activity (SLEDAI scores of 6–10) and a combined high/very high activity (SLEDAI scores of ≥11). These categories align with published SLEDAI cutpoints for disease activity.8 The trained model generates an estimated SLEDAI (eSLEDAI) score category for a specific encounter on a specific date. Because scores are estimated for a specific encounter on a specific date, an individual patient may have multiple timepoints for which they have eSLEDAI score categories and additional timepoints for which they have clinician-recorded SLEDAI scores.

Explanatory variables

We derived the explanatory features from clinical notes. In order to ensure that there was sufficient information to estimate the score category, analysed clinical notes were restricted to those that have components of medical history and physical examination. In the event that more than one clinical note exists on a single date, clinical notes from the same day were appended together as a single note for purposes of score category estimation. Explanatory variables that were used in the development of the estimation model were related to references of signs, symptoms, treatment and medications from the nine systems assessed by the SLEDAI (central nervous system, vascular, renal, musculoskeletal, serosal, dermal, immunologic, constitutional and haematologic).

Modelling strategy

The SLEDAI estimation model used in this study was a multivariable ordinal regression model with the explanatory features generated from the clinical notes. Our modelling approach involved multiple steps: (1) processing of the body of clinical notes in order to de-noise and standardise their content; (2) identification of terms and phrases that are used by the physicians to indicate SLEDAI-related signs and symptoms to form the set of explanatory variables; and (3) model development and validation.

First, standard text processing steps such as word tokenisation (a process of separating a piece of text into smaller units), text lemmatisation (a process that reduces various inflectional and related forms of a word to a common root word) and removal of stop-words were followed (extremely common words that have little value for modelling).9 10 The text processing was done in Python language using the NLTK package. N-grams features were generated from the processed text, and the features were TFIDF (Term frequency Inverse Document Frequency) transformed to generate the input for the model training. The text processing for feature generation and model development was performed using the scikit-learn package in Python. The entire process of text processing, feature generation and model development was performed in a Python environment where the package version was fixed for reproducibility. An initial estimate of the feature coefficients was obtained by fitting the data to a logistic regression model.

Next, we evaluated the clinical notes to identify terms and phrases that have correlation to the SLEDAI score and are clinically important for the calculation of the SLEDAI score. The terms and phrases that we identified were based on the 24 variables used in calculation of SLEDAI score (eg, arthritis, rash, vasculitis, fever) as well as terms that indicate worsening, improvement or change in these variables. Terms and phrases that had a correlation with SLEDAI scores were used as predictive features in the model. This set of predictive features was used to train the model on the training cohort. In the final step, the model was validated on the validation cohort.

Model performance was assessed using the area under the receiver operating characteristic curve (AUC). The AUC was calculated using a binarised version of the outcome that sets the positive class to be those records with clinician-recorded SLEDAI scores greater than 5 (the threshold at which SLEDAI scores are considered to correspond to ‘mild’ vs ‘moderate’ disease activity), and the negative class to be records with scores less than or equal to 5. Model performance was also evaluated by categorising the scores into the four disease activity categories and calculating the Spearman’s R value and Pearson’s R value. Correlation coefficients were used to evaluate performance because there is ordinality between the four eSLEDAI score categories (ie, it is more accurate to categorise ‘no activity’ as ‘mild activity’ than to categorise ‘no activity’ as ‘high/very high activity’). Sensitivity and specificity for the four eSLEDAI categories were also calculated. The predictors were reviewed for clinical suitability for estimating a SLEDAI score category, and the distribution of eSLEDAI score categories was compared with the distribution of clinician-recorded SLEDAI scores.

Application of the model to the SLE Registry

The trained model was applied to all SLE Registry patients who did not meet the eligibility criteria for the development and validation cohorts but for whom encounter notes with sufficient detail existed. Descriptive analyses were conducted to examine the demographic and clinical characteristics, including age, sex, race, medical history and treatment history, of the SLE Registry patients with clinician-recorded SLEDAI scores, with eSLEDAI score categories, and with both clinician-recorded scores and eSLEDAI score categories. Analyses were also conducted to examine the relation of clinician-recorded SLEDAI scores and eSLEDAI score categories to healthcare resource utilisation, pain medication prescriptions and corticosteroid prescriptions, as previous studies have shown that resource utilisation including medication usage increases with increased disease activity.11 12 While a comparison of the eSLEDAI score category to a clinician-recorded SLEDAI score cannot be done for these SLE Registry encounters, the performance of the model can still be assessed based on other types of medical data. In these analyses, we compared trends in healthcare resource utilisation, pain medication prescriptions and corticosteroid prescriptions across the four estimated SLEDAI categories to trends across clinician-recorded SLEDAI scores as independent measures of the model’s performance in a real-world dataset.

Results

Participants and characteristics

The model training cohort consisted of 3504 encounters from 1130 patients with SLE, while the validation cohort consisted of 1576 encounters from 500 distinct patients. Demographics and clinical characteristics of patients in the training set are presented and discussed in table 1.

Table 1

Demographic and clinical characteristics of SLE Registry patients

Model performance

The AUC was 0.93 for the development cohort and 0.91 for the validation cohort for the binary outcome (figure 1). Sensitivity and specificity for the four eSLEDAI score categories were calculated for the validation cohort and are presented in table 2.

Table 2

Sensitivity and specificity for the four eSLEDAI categories

Figure 1

AUC of SLEDAI prediction model. The AUC was calculated using a binarised version of the outcome that sets the positive class to be those records with clinician-recorded SLEDAI scores greater than 5 (the threshold at which SLEDAI scores are considered to correspond to ‘mild’ vs ‘moderate’ disease activity), and the negative class to be records with scores less than or equal to 5. AUC, area under the receiver operating characteristic curve; ROC, receiver operating characteristic; SLEDAI, Systemic Lupus Erythematosus Disease Activity Index.

The model had a Spearman’s R value of 0.7 and a Pearson’s R value of 0.7 when evaluating the model performance by the four estimated SLEDAI score categories. The confusion matrix for the Spearman data is presented in table 3.

Table 3

Confusion matrix for Spearman’s rank correlation coefficient

Features that were used in the development of the model were related to signs and symptoms from the nine systems assessed by the SLEDAI. The top predictors were signs/symptoms that are specifically used in determining SLEDAI, plus additional terms that relate to the severity and/or frequency of these signs/symptoms.

Estimation of SLEDAI score categories in the SLE Registry

The model was used to generate eSLEDAI score categories for 62 263 encounters from 21 393 SLE Registry patients that do not have clinician-recorded SLEDAI scores, but have clinical notes with sufficient information. The distribution of score categories for encounters with clinician-recorded SLEDAI scores in the validation cohort was similar to the distribution of score categories for encounters with eSLEDAI score categories in the SLE Registry (figure 2). Specifically, for clinician-recorded SLEDAI scores, 31.7% of clinical notes were categorised as no activity, 31.8% of notes as mild activity, 21.5% of notes as moderate activity and 15.1% of notes as high/very high activity. For eSLEDAI score categories, 32.1% of notes were classified as no activity, 26.6% of notes as mild activity, 23.1% of notes as moderate activity and 18.2% of notes as high/very high activity.

Figure 2

Comparison of clinician-recorded SLEDAI score categories versus eSLEDAI score categories. The distribution of SLEDAI score categories for encounters with clinician-recorded SLEDAI scores in the validation cohort and the distribution of SLEDAI score categories for encounters with eSLEDAI score categories are shown here. eSLEDAI, estimated SLEDAI; SLEDAI, Systemic Lupus Erythematosus Disease Activity Index.

The SLE Registry contains 7017 encounters from 2286 patients (of 41 000 total patients or 5.6%) that have clinician-recorded SLEDAI scores from routine clinical rheumatology practice. Of the 2286 patients with encounters with clinician-recorded scores, 1762 had eSLEDAI score categories estimated for other encounters. Demographics and clinical characteristics of patients with clinician-recorded SLEDAI scores, patients with eSLEDAI score categories, patients with both clinician-recorded scores and eSLEDAI score categories and patients in the model training cohort are presented in table 1. The majority (93.6% with clinician-recorded scores, 92.0% with eSLEDAI score categories, 93.6% with both scores and 93.2% in the model training cohort) were female, and the mean age as of the score was 49.9 years for patients with clinician-recorded scores, 51.2 years for patients with eSLEDAI score categories, 48.3 years for patients with both scores and 50.4 years for patients in the training cohort.

Healthcare resource utilisation, pain medication prescriptions and corticosteroid prescriptions were also examined for encounters with clinician-recorded scores and eSLEDAI score categories. Inpatient visits and emergency room (ER) visits that occurred within a 12-month window (±6 months) of the SLEDAI score date are presented in table 4. Encounters with clinician-recorded scores and those with estimated scores have similar trends in terms of healthcare resource utilisation. Pain medication prescriptions that occurred within a 90-day window (±45 days) of the SLEDAI score date are presented in table 4. Pain medication prescriptions increased with higher score categories, and similar trends are observed among encounters with clinician-recorded scores and those with eSLEDAI score categories (30.2% of high/very high activity clinician-recorded scores and 23.8% of high/very high activity eSLEDAI score categories). Corticosteroid prescriptions that occurred within a 90-day window (±45 days) of the SLEDAI score date are presented in figures 3 and 4. Corticosteroid prescriptions increased with higher score categories, and similar trends are observed among encounters with clinician-recorded scores and those with eSLEDAI score categories (36.8% of high/very high activity clinician-recorded scores and 29.6% of high/very high activity eSLEDAI score categories).

Table 4

Healthcare resource utilisation and pain medication prescriptions by SLEDAI score category for clinician-recorded scores and eSLEDAI score categories

Figure 3

Corticosteroid prescriptions by SLEDAI score category for clinician-recorded SLEDAI scores. Corticosteroid prescriptions that occurred within a 90-day window (±45 days) of the SLEDAI score date are presented here by SLEDAI score category for clinician-recorded scores. SLEDAI, Systemic Lupus Erythematosus Disease Activity Index.

Figure 4

Corticosteroid prescriptions by SLEDAI score category for eSLEDAI score categories. Corticosteroid prescriptions that occurred within a 90-day window (±45 days) of the estimated score category date are presented here by eSLEDAI score categories. eSLEDAI, estimated SLEDAI; SLEDAI, Systemic Lupus Erythematosus Disease Activity Index.

Discussion

While the SLEDAI is widely used in clinical trials, its use in routine clinical practice is inconsistent, and availability of clinician-recorded SLEDAI scores in real-world datasets is limited. The lack of SLEDAI scores in real-world datasets reduces the utility of these datasets for addressing questions about real-world treatment patterns and outcomes and for supporting new research, including identification of eligible patients for clinical trials. This study demonstrated that a machine learning model can be used to estimate eSLEDAI score categories. This machine learning model performs well when estimating the four SLEDAI score categories using unstructured clinical notes from a real-world dataset. The results align with objective data, such as corticosteroid prescriptions, pain medication prescriptions and healthcare resource utilisation, providing further confidence in the estimated score categories. The approach used to develop this model could be applied to other rheumatology disease activity scores, and future models may build on this effort by combining structured clinical data with unstructured notes to further improve performance.

This study appears to be one of the first uses of machine learning to estimate four SLE disease activity score categories from unstructured text. Other efforts have developed models to predict high disease activity in SLE,13 to predict future disease activity scores in rheumatoid arthritis6 and to estimate disease severity in rheumatoid arthritis using administrative claims data.14 15 This effort is unique in that it focuses on SLE, estimates four disease activity score categories at a specific point in time as opposed to predicting future scores and uses clinical narrative data to assess disease activity. Assessing disease activity is a routine part of patient care for SLE, as disease activity is an essential factor in determining the appropriate intervention strategies and therapies to achieve disease remission. These automated estimates also provide a means to assist in tracking disease progression, improvement and remission over time and can generate insights into the effectiveness of interventions. Importantly, objective measures of disease activity allow for standardisation of patient outcome measures across clinicians and patients and facilitate care management, population health management and research.

Yet, calculation and recording of SLEDAI scores is challenging in routine clinical practice settings, leading to the need for new strategies to estimate these scores at the individual patient and population level. As shown in table 1, only 5.6% of patients in a real-world setting had SLEDAI scores recorded in their EMR. While a clinician-recorded SLEDAI score will remain the gold standard, generating eSLEDAI score categories based on the clinical narrative dramatically increases the number of available endpoints for tracking and understanding patient outcomes when clinician-recorded SLEDAI scores are not available. For patients for whom there are multiple SLEDAI values available, statistical imputation could be considered as an alternative approach, but this was the case for a small minority of patients. Statistical methods for imputing missing data, like the multiple imputation method, usually assume missing at random which may not hold because patients with and without clinician-recorded SLEDAI values have different characteristics. In addition, the multiple imputation method is applicable only when the percent of missing data is less than 40%.16

Multiple applications exist for a machine learning model to estimate SLEDAI score categories. First, use of the model within real-world datasets would expand the population of patients that could be included in research studies that require disease activity scores. This would be particularly useful for building large cohorts of patients who receive treatment outside specialised SLE clinics where the SLEDAI score is rarely recorded.4 Estimating score categories for these patients and including these patients in real-world studies could generate useful information to better correlate trial outcomes with real-world patient outcomes. Beyond research uses, generation of an estimated score category at the individual patient level could help clinicians track a patient’s response to treatment over time and support care management. Consideration of the performance characteristics of the model, particularly AUC, as well positive predictive value and negative predictive value in appropriate use cases, will be critical.

This approach has some limitations. First, the model relies on data recorded in the clinical notes, and notes without sufficient text and clinical details are excluded. Estimation of the SLEDAI score categories requires that adequate time was spent with the patient during the visit and that the visit was documented with sufficient thoroughness. In addition, the model was trained and validated using EMR data from multiple rheumatology practices in the USA. Prior to use in other data sources, the model will need to be modified to address variations in other care settings and documentation practices. Validity should be confirmed before applying the model in other data sources.

By developing a model to estimate SLEDAI score categories, this study addressed the lack of SLE disease activity scores in real-world data sources. Use of the model to estimate SLEDAI score categories could make real-world data sources more useful for research and potentially support patient care management in clinical practice.

Data availability statement

No data are available. This study was conducted using deidentified participant data compiled from multiple sources. Restrictions apply to the availability of these data, which were used under certain permissions for this study. Please contact the corresponding author (Michelle Leavy, ORCID ID 0000-0003-1927-7248) with any questions.

References

Footnotes

  • Contributors PA—study concepts and design, manuscript and figure preparation, manuscript editing, final approval of the article; JB—study concepts and design, manuscript and figure preparation, manuscript editing, final approval of the article; ML—study concepts and design, manuscript and figure preparation, manuscript editing, final approval of the article; BG—study concepts and design, manuscript and figure preparation, manuscript editing, final approval of the article; CB—study concepts and design, manuscript and figure preparation, manuscript editing, final approval of the article; ZS—study concepts and design, manuscript and figure preparation, manuscript editing, final approval of the article; GC—study concepts and design, manuscript and figure preparation, manuscript editing, final approval of the article.

  • Funding The authors have not declared a specific grant for this research from any funding agency in the public, commercial or not-for-profit sectors.

  • Competing interests The authors indicated are employees of OM1, which is involved in issues related to the topic of this manuscript.

  • Provenance and peer review Not commissioned; externally peer reviewed.

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.