Objective To evaluate the level of agreement on ultrasonographic (US) lesions among highly experienced sonographers as well as the intraobserver and interobserver reliability of inflammatory and structural US lesions in patients with osteoarthritis (OA) of the foot.
Methods After a systematic literature review, a Delphi survey was performed to test definitions of US lesions in OA of the foot, including inflammatory lesions (ie, synovial hypertrophy [SH], joint effusion [JE], power Doppler signal [PD]), and structural abnormalities (ie, cartilage damage [CD] and osteophytes). Subsequently, the reliability of US in assessing the aforementioned lesions was tested on static images as well as during a live exercise. Reliability was assessed by kappa analyses and prevalence-adjusted bias-adjusted kappa (PABAK) on a dichotomous and an ordinal scale.
Results Intraobserver and interobserver reliability for SH and JE evaluated by binary scoring was good for both components, while the intraobserver reliability for semiquantitative scoring of SH ranged from moderate in the web-based exercise (PABAK 0.49) to good (PABAK 0.8) in the live exercise. Reliability for CD and PD assessments were respectively good and excellent in all exercises (ranged from PABAK 0.61 to 0.79 for CD and 0.88 to 0.95 for PD). The interobserver reliability for the semiquantitative scoring of osteophytes was fair in the live exercise (PABAK 0.36) and moderate in the static exercise (PABAK 0.60).
Conclusions Consensual US definitions were found to be reliable for assessing inflammatory lesions in OA of the foot, while the use of US to assess structural damage requires further studies.
- outcomes research
This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See: http://creativecommons.org/licenses/by-nc/4.0/.
Statistics from Altmetric.com
What is already known about this subject?
Foot is a target area in osteoarthritis (OA) and it could significantly impact on patients’ quality of life.
European League Against Rheumatism recommendations on the use of imaging in OA have highlighted that imaging studies in foot are scarce.
What does this study add?
The study demonstrated that ultrasonography may be a reliable tool for assessing inflammatory lesions in OA of the foot.
Ultrasonography seems to be a promising tool to be further tested in diagnostic, prognostic and follow-up studies.
Osteoarthritis (OA) is a degenerative joint disease characterised by cartilage breakdown, growth of osteophytes and subsequent low-grade inflammation of the synovial membrane.1 OA is common in the middle-aged to elderly population and may lead to significant disability and pain. The cornerstones of the therapy of OA are symptomatic treatment as well as measures aimed at preserving physical function. However, recent therapeutic developments which address specific molecular pathways may change the way OA is treated in the future.2 3 For this reason, a renewed interest on valid tools assessing disease activity and damage in OA has emerged, with several imaging techniques identified as potential candidates to monitor the impact of new treatments.4 In this context, there is a growing interest in the use of ultrasonography (US) for the assessment of OA, as US findings are in good agreement with conventional radiography in detecting typical elementary lesions of OA (eg, central joint erosions, osteophytes).5 6 The foot is recognised as a target region for OA, and this involvement could significantly impact on patients’ quality of life.7 Despite extensive literature on the use of imaging in OA, it has been shown that only a minority of studies focused on the foot, and this applies to US also.5 Therefore, foot and ankle imaging studies should be prioritised, favouring patient management.5 The validation of US as an outcome measure for evaluating foot OA is an area of interest for the Outcome Measures in Rheumatology (OMERACT) Ultrasound Group. Exploring the reliability of inflammatory elementary lesions (eg, effusion, synovial hypertrophy) and of structural changes (eg, cartilage abnormalities and osteophytes) is an essential step to include US in trials and clinical practice. For this purpose, the OA task force of the OMERACT US group decided to evaluate the level of agreement among highly experienced sonographers as well as the intraobserver and interobserver reliability of US on inflammatory and structural US lesions in patients with OA of the foot.
Materials and methods
Design of the study
Following the OMERACT methodology,8 a systematic literature review (SLR) on US in OA of the foot was performed. Based on these results, a Delphi survey on the definition and characteristics of US lesions in patients with OA was circulated among a group of experts in the field of US and OA selected from the OMERACT special interest group on US. Subsequently, a web-based as well as a patient-based exercise was performed, with the aim of testing the reliability of US in the detection of inflammatory and structural US lesions. Before starting the exercise on patients, a training session was performed on US images of OA abnormalities in the foot and discussions among the experts participating in the meeting took place. The methods and results in our manuscript follows previously published guidelines.9 The study was reported to the local ethics committee and no further approval has been deemed necessary.
Systematic literature review
A systematic literature review was performed by one of the authors (GS). MEDLINE via PubMed and Embase were searched from inception to 31 January 2016. Eligible studies had to involve patients with foot (midfoot or metatarsophalangeal [MTP] joints) OA and undergoing US; possible comparators were other imaging techniques or histology. The outcome of interest was the definition of pathology in both greyscale (GS) and power Doppler (PD). All study types excluding narrative reviews were eligible. Search strategies including terms addressing OA and US were applied in both databases (Table S1); prespecified forms were used for data extraction. After screening the title and abstract of 83 studies and the full-text of 11 studies (online supplementary figure S1), 4 studies were finally included (online supplementary figure S1and table S2). The hand search of the references of the included studies did not lead to further inclusions.
A preliminary questionnaire was circulated to present the results of the SLR to all participants and to collect their comments and suggestions on the items to be included in the Delphi survey. In the first round, the Delphi survey consisted of 15 statements and 19 participants rated their level of agreement for each according to a Likert scale (1=strongly disagree to 5=strongly agree) and gave their comments. Based on the results and comments obtained, the survey was modified and proposed again to the participants until agreement was reached. Group agreement was considered achieved with a total cumulative agreement of 75% or more (a score of 4 or 5 in the Likert scale). Statements that did not reach this cut-off were eliminated from subsequent rounds while statements that achieved agreement were proposed again for voting only in the case of the presence of new statements that were formulated according to the panel’s suggestions. If no statement achieved 75% of agreement, those that reached 60% or more, plus new statements were proposed again for voting to avoid missing value in the definitions.
A pool of 110 US images (from 83 patients with OA) of the anatomical sites under examination were collected from a personal database of three collaborators (FF, IR, CS) who did not participate in the exercise. Images from patients with foot OA and healthy controls were chosen in order to have both images of normal and abnormal joints. A total of 20 experts were invited to participate in the exercise and each of them rated the images according to the definitions approved in the Delphi survey. The whole Delphi process and the web-based agreement exercise were carried out on a web-based platform (RedCap).10 Only the facilitator and the epidemiologists of the study had access to the online data and were responsible for the upload and preparation of the Delphi rounds and the web-based exercise.
Prior to the patient-based reliability exercise, the US methodology was clarified among sonographers and a consensus was obtained on both the scanning protocol and on image interpretation of normal and pathological US findings.
The patient-based exercise was performed on 12 patients (10 female and 2 male, mean age of 67.75 years) recruited if they reported foot pain on weight-bearing and had a diagnosis of OA of the foot based on clinical examination and on the presence of radiographic criteria of OA in at least one foot joint.11 Patients were located in a comfortable examining room and they were lying on an examination bed. The single seats were placed at a distance that permitted a blinded and separate evaluation by the sonographers, each of whom was seated in front of a single patient. The time frame between the two rounds was 3 hours (first round in the morning and second round in the afternoon of the same day). Twelve high-level US units (six Esaote MyLab ClassC; six General Electric Logiq e9) were used, all equipped with multifrequency linear probes operating at a frequency of 18 MHz (Esaote) and 15 MHz (General Electric). The same settings (GS frequency 18 MHz Esaote and 15 MHz General Electric; GS gain 50% Esaote and 48% General Electric; PD frequency 8.3 MHz Esaote and 7.7 MHz General Electric; pulse repetition frequency (PRF) 0.5 Hz; PD gain 50% Esaote and 30% General Electric) were used on all units and each sonographer was allowed to modify only one basic function (depth).
Intraobserver and interobserver reliability were calculated using the kappa coefficient. Intraobserver reliability was assessed by Cohen’s kappa. Interobserver reliability was studied by calculating the mean kappa on all pairs (ie, Light’s kappa). Kappa coefficients were interpreted according to Landis and Koch. Kappa values of 0–0.20 were considered poor, 0.20–0.40 fair, 0.40–0.60 moderate, 0.60–0.80 good and 0.80–1.00 excellent.12 13 The percentage of observed agreement (ie, percentage of observations that obtained the same score), prevalence of the observed lesions and prevalence-adjusted bias-adjusted kappa (PABAK) were also calculated. Analyses were performed using R Statistical Software (Foundation for Statistical Computing).
All 19 participants responded to all rounds of the Delphi survey. At the preliminary questionnaire, the definitions extrapolated from the SLR were elaborated and presented to the panel. The first Delphi round included 15 statements for voting (online supplementary table S3). In the first round, 10 statements reached agreement; the remaining statements and one, modified according to the comments received by the experts, were proposed again for voting in the second round and third rounds, reaching agreement only for one more statement (online supplementary table S4). A summary of the results of the Delphi survey can be seen in table 1. Furthermore, based on the need to assess osteophytes (table 1), the statement of the Delphi on osteophytes scoring with best agreement was selected (ie, semiquantitative 0–3) (online supplementary table S4).
The web-based exercise was successfully completed in two rounds by 13 participants. Interobserver reliability, including both rounds and together MTP and midfoot joints, ranged from 0.50 for synovial hypertrophy (SH) to 0.89 for PD score (table 2), while considering only midfoot it ranged from 0.51 for SH score to 0.84 for PD score (table 3), and only MTP joints it ranged from 0.49 for SH to 0.89 for PD score (table 4 ). Intraobserver reliability ranged from a minimum value of 0.48 for SH score (0.54 for midfoot and 0.60 for MTP joints, tables 3–4) to a 0.9 for PD score (table 2). Adjusting kappa values for the prevalence of the observed lesions, no significant differences were noted.
The sonographers agreed to use the previously described semiquantitative scoring system for grading SH, joint effusion (JE), PD signal and osteophyte evaluation.
SH, JE and PD signal. During the training session, the sonographers agreed to score SH and JE as absent/present (0–1) and to use for SH also a semiquantitative score (0–3).14 15 PD was evaluated with a semiquantitative score (0–3).15
Training session on cartilage damage (CD ). CD was as loss of anechoic structure and/or thinning of cartilage layer16 (online supplementary figure S2). During the training session, the sonographers agreed to use a binary score for CD (absent/present, 0–1)16 and to evaluate this lesion only in the first MTP joint. Indeed, to evaluate cartilage by US, the probe has to be perpendicular to the cartilage surface and dorsal osteophytes could limit the US image of cartilage; these being the reasons for limiting the assessment of CD as outlined above.
Training session on osteophyte evaluation. The sonographers agreed to use the recently published semiquantitative scoring systems of grading osteophytes (0=none, 1=minor, 2=moderate, 3=major size of osteophytes).17 18
Training session on midfoot joints. During the training session, the sonographers agreed to evaluate and score the midfoot joints as a single joint and to use the same method also for analysing the images of the web-based exercise. On the patient-based exercise, only the highest score of each lesion was recorded.
The patient-based exercise was successfully completed in two rounds lasting about 3.5 hours each, one in the morning and one in the afternoon of the same day by 11 rheumatologists from five countries. All rheumatologists were experts in US and were members of the OMERACT group. Interobserver reliability, including both rounds, ranged from 0.08 for CD to 0.51 for SH, but when PABAK was considered, it ranged from 0.36 for osteophytes to 0.93 for PD score (table 5). Evaluating the results of the midfoot and MTP separately, interobserver agreement ranged from 0.37 for osteophytes score to 0.95 for PD score (using PABAK) (table 6), while for MTP joints from 0.24 for JE to 0.74 for PD score (using PABAK) (online supplementary table S5). Intrareader reliability ranged from a minimum value of 0.41 for PD score to 0.64 for SH, considering kappa adjusted it reached higher scores ranging from 0.62 for osteophytes to 0.95 for PD score (table 5). Table 6 and online supplementary table S5 reported results divided by midfoot and MTP joints.
Foot is a target area in OA and despite the high frequency of involvement and disability, the recent European League Against Rheumatism recommendations on the use of imaging in OA have highlighted that imaging studies in foot are scarce. Therefore, there is a need for more research concerning the benefits of imaging in such, less commonly studied sites of OA.5 To our knowledge, this is the first study exploring the reliability of US in scoring inflammatory and structural lesions in OA of the foot. Considering the low prevalence of certain elementary lesions in the patient-based exercise, the reliability assessment by Cohen’s kappa could be misleading and, for this purpose, the use of PABAK values was considered to optimise the evaluation of the strength of agreement. The assessment of both inflammatory and structural damage-related lesions allowed us to globally evaluate the reliability of US in OA of the foot.
In this reliability exercise, SH and JE were evaluated separately and their detection (present/absent, 0–1) showed similar intra-agreement and inter-agreement for both the web-based and the live exercises, reaching good agreement in all assessments. As suggested by the Delphi exercise, in addition to the binary score, a semiquantitative score (0–3) for SH was used and the results, similar to studies in rheumatoid arthritis and psoriatic arthritis,19 demonstrated moderate intraobserver agreement in the web-based exercises and a good agreement for patient-based exercise.
In all grades of OA, thickening of the synovial lining cell layer, increased vascularity and inflammatory cell infiltration of the synovial membranes are the main histological features.20 Furthermore, angiogenesis and inflammation are closely integrated processes and may affect disease progression and pain.1 In this scenario, imaging of vascularisation with PD mode is important for providing a complete image of joint inflammation in OA. In this reliability exercise, a semiquantitative scoring of PD demonstrated excellent reliability on static images, confirmed also on live scans with PABAK values greater than 0.9. However, considering the low prevalence of images with PD signal on live exercise, these results need to be confirmed.
Globally, these results, both for SH and PD, show a possible relevant role of US in clinical trials in OA. Moving to foot damage, this issue could significantly impact the assessment of disability of patients with OA. Ultrasound may thus be a promising method for detecting cartilage pathology, also in early stages of OA of the foot. However, in this study, which represents the first step in this field, we decided to use a binary score (absent/present, 0–1) for evaluating CD only in the first MTP joint. This choice was due to the difficulty to image cartilage in the midfoot: indeed to evaluate cartilage by US, the probe has to be perpendicular to the cartilage surface, which could be difficult to obtain in OA of the foot, particularly for midfoot joints. Using a binary score for CD, we found good intraobserver and interobserver reliability. With regard to osteophytes, however, the results of this study differed considerably from the good to excellent intraobserver and interobserver reliability of osteophytes in hand OA.18 In our study, we could demonstrate only good intra-reliability and fair to moderate inter-reliability.
In conclusion, this study demonstrated that US may be a reliable tool for assessing inflammatory lesions in OA of the foot, while for US lesions related to damage, further studies are needed, particularly in anticipation of the application of US in clinical trials. New tools as reference atlases could be useful to improve reliability of US scoring. Finally, based on the results of this study, US seems to be a promising tool to be further tested in diagnostic, prognostic and follow-up studies on foot OA.
AZ, GF and MC contributed equally.
Collaborators Fabiana Figus, Iolanda Rutigliano, Chiara Scirocco.
Contributors Reported in the manuscript.
Competing interests None declared.
Patient consent for publication Obtained.
Provenance and peer review Not commissioned; externally peer reviewed.
Data sharing statement No additional data are available.
If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.