PT - JOURNAL ARTICLE AU - Pedro Alves AU - Jigar Bandaria AU - Michelle B Leavy AU - Benjamin Gliklich AU - Costas Boussios AU - Zhaohui Su AU - Gary Curhan TI - Validation of a machine learning approach to estimate Systemic Lupus Erythematosus Disease Activity Index score categories and application in a real-world dataset AID - 10.1136/rmdopen-2021-001586 DP - 2021 May 01 TA - RMD Open PG - e001586 VI - 7 IP - 2 4099 - http://rmdopen.bmj.com/content/7/2/e001586.short 4100 - http://rmdopen.bmj.com/content/7/2/e001586.full SO - RMD Open2021 May 01; 7 AB - Objective Use of the Systemic Lupus Erythematosus Disease Activity Index (SLEDAI) in routine clinical practice is inconsistent, and availability of clinician-recorded SLEDAI scores in real-world datasets is limited. This study aimed to validate a machine learning model to estimate SLEDAI score categories using clinical notes and to apply the model to a large, real-world dataset to generate estimated score categories for use in future research studies.Methods A machine learning model was developed to estimate an individual patient’s SLEDAI score category (no activity, mild activity, moderate activity or high/very high activity) for a specific encounter date using clinical notes. A training cohort of 3504 encounters and a separate validation cohort of 1576 encounters were created from the OM1 SLE Registry. Model performance was assessed using the area under the receiver operating characteristic curve (AUC), calculated using a binarised version of the outcome that sets the positive class to be those records with clinician-recorded SLEDAI scores >5 and the negative class to be records with scores ≤5. Model performance was evaluated by categorising the scores into the four disease activity categories and by calculating the Spearman’s R value and Pearson’s R value.Results The AUC for the two categories was 0.93 for the development cohort and 0.91 for the validation cohort. The model had a Spearman’s R value of 0.7 and a Pearson’s R value of 0.7 when calculated using the four disease activity categories.Conclusion The model performs well when estimating SLEDAI score categories using unstructured clinical notes.No data are available. This study was conducted using deidentified participant data compiled from multiple sources. Restrictions apply to the availability of these data, which were used under certain permissions for this study. Please contact the corresponding author (Michelle Leavy, ORCID ID 0000-0003-1927-7248) with any questions.