Salivary gland ultrasound abnormalities in primary Sjögren’s syndrome: consensual US-SG core items definition and reliability

Objectives Ultrasonography (US) is sensitive for detecting echostructural abnormalities of the major salivary glands (SGs) in primary Sjögren’s syndrome (pSS). Our objectives were to define selected US-SG echostructural abnormalities in pSS, set up a preliminary atlas of these definitions and evaluate the consensual definitions reliability in both static and acquisition US-SG images. Methods International experts in SG US in pSS participated in consensus meetings to select and define echostructural abnormalities in pSS. The US reliability of detecting these abnormalities was assessed using a two-step method. First 12 experts used a web-based standardised form to evaluate 60 static US-SG images. Intra observer and interobserver reliabilities were expressed in κ values. Second, five experts, who participated all throughout the study, evaluated US-SG acquisition interobserver reliability in pSS patients. Results Parotid glands (PGs) and submandibular glands (SMGs) intra observer US reliability on static images was substantial (κ > 0.60) for the two main reliable items (echogenicity and homogeneity) and for the advised pSS diagnosis. PG inter observer reliability was substantial for homogeneity. SMGs interobserver reliability was moderate for homogeneity (κ = 0.46) and fair for echogenicity (κ = 0.38). On acquisition images, PGs interobserver reliability was substantial (κ = 0.62) for echogenicity and moderate (κ = 0.52) for homogeneity. The advised pSS diagnosis reliability was substantial (κ = 0.66). SMGs interobserver reliability was fair (0.20< κ ≤ 0.40) for echogenicity and homogeneity and either slight or poor for all other US core items. Conclusion This work identified two most reliable US-SG items (echogenicity and homogeneity) to be used by US-SG trained experts. US-PG interobserver reliability result for echogenicity is in line with diagnosis of pSS.

Objectives Ultrasonography (US) is sensitive for detecting echostructural abnormalities of the major salivary glands (SGs) in primary Sjögren's syndrome (pSS). Our objectives were to define selected US-SG echostructural abnormalities in pSS, set up a preliminary atlas of these definitions and evaluate the consensual definitions reliability in both static and acquisition US-SG images. Methods International experts in SG US in pSS participated in consensus meetings to select and define echostructural abnormalities in pSS. The US reliability of detecting these abnormalities was assessed using a two-step method. First 12 experts used a webbased standardised form to evaluate 60 static US-SG images. Intra observer and interobserver reliabilities were expressed in κ values. Second, five experts, who participated all throughout the study, evaluated US-SG acquisition interobserver reliability in pSS patients. Results Parotid glands (PGs) and submandibular glands (SMGs) intra observer US reliability on static images was substantial (κ > 0.60) for the two main reliable items (echogenicity and homogeneity) and for the advised pSS diagnosis. PG inter observer reliability was substantial for homogeneity. SMGs interobserver reliability was moderate for homogeneity (κ = 0.46) and fair for echogenicity (κ = 0.38). On acquisition images, PGs interobserver reliability was substantial (κ = 0.62) for echogenicity and moderate (κ = 0.52) for homogeneity. The advised pSS diagnosis reliability was substantial (κ = 0.66). SMGs interobserver reliability was fair (0.20< κ ≤ 0.40) for echogenicity and homogeneity and either slight or poor for all other US core items. Conclusion This work identified two most reliable US-SG items (echogenicity and homogeneity) to be used by US-SG trained experts. US-PG interobserver reliability result for echogenicity is in line with diagnosis of pSS.

IntroductIon
Lymphocytic infiltration of the salivary glands (SGs) is a key pathological feature of primary Sjögren's syndrome (pSS). 1 2 Currently available tools for assessing the SGs include salivary flow measurement, minor SG biopsy, sialography, scintigraphy, CT and MRI. Ultrasonography (US) was introduced more recently. [3][4][5][6][7] US holds considerable appeal, as it is non-invasive, does not involve ionising radiation, can be repeated many times and is available as an outpatient investigation. Both researchers and clinicians have identified US as a valuable tool for diagnosing pSS [8][9][10][11][12][13][14][15] and as a potential source of classification criteria for this disease. Moreover, the advent of new treatments for pSS 16 has created a need for valid and easy-to-use imaging tools capable of detecting changes in disease activity over

RMD Open
time. 17 Using the Outcome Measures in Rheumatology (OMERACT) filter, 18 we recently reported the need for a consensual scoring system, SG US expert training as well as evaluation of US criterion validity, that is, comparing minor SG biopsy to minor SG US. 19 The review results were consistent with those of the literature 20-25 and our review is the first step in setting up an OMERACT standard that will open further avenues for the use of such promising diagnostic tool. 26 Today, US cannot yet be used as the sole diagnostic bedside tool in pSS but as an early diagnostic tool when used carefully by experienced US experts.
Here, our objectives were to define selected US-SG echostructural abnormalities in pSS, set up a preliminary atlas of these definitions and evaluate the consensual definitions reliability in both static and acquisition US-SG images.

Method definition of the core items of ultrasound sGs
During the 2012 American Congress of Rheumatology meeting, international pSS experts (from France, Norway, Italy, England, Serbia, Slovenia, Sweden, The Netherlands and USA) who had at least 5 years of experience with US in pSS were invited to work on the study. Among 10 experts, only 6 contributed to the first meeting towards achieving a consensus definition of US-SG abnormalities in pSS. They selected a preliminary core set of US-SG items worthy of routine evaluation. These experts then completed an email questionnaire, indicating whether they agreed with each definition (yes/no answers). The same six experts followed up to the 2013 and 2014 European League of Rheumatism (EULAR) meetings where preliminary 2012 meeting results were presented.

set-up of a sG ultrasound atlas
Finally, in 2014 consensus was reached in regard with the US-SG core items and a preliminary atlas was set forth.
This initial atlas included only consensual B-mode images in pSS and will be used by the experts. reliability of us-sG core items on static us images Twelve experts performed a web-based standardised form (table 1) to evaluate 60 static images (30 parotid glands (PGs) and 30 submandibular glands (SMGs)) (Philips iU22, 12.5 MHz linear array transducer; Philips Healthcare, Amsterdam, The Netherlands). The set of B-mode images (with the same setting) of the major SGs, from Brest centre data bank, of patients with pSS and normal individuals was chosen in order to have both images of normal (healthy individuals) and abnormal SG parenchymal echostructure (pSS patients) and were sent to other centres. Each expert evaluated the items selected Table 1 Standardised form used to assess the reliability of ultrasound of salivary gland core items of the images Sjögren syndrome by consensus (ie, the preliminary atlas) in two rounds, at an interval of 3 months. Then an advice for pSS diagnosis (rule out, rule in, indeterminate) was given by the experts. The results were used to assess interobserver and intraobserver reliabilities. All images were read anonymously and in random order.

reliability of us-sG core items on acquisition imaging in patients with pss
Five experts, of the initial six who participated in development of the definitions and the atlas, participated in assessment of the reliability of the consensual items in acquisition imaging. Over a 2-day period, each expert performed US of both PGs and both SMGs of 19 patients with pSS (with or without known SG abnormalities). Various US machines were used (Mylab ALPHA, Mylab60 and Mylab 6; all from Esaote, Genoa, Italy). Given the difference in each patient's SG echogenicity US B-mode settings were adjusted according to each patient. The time needed for each expert to examine the 19 patients was recorded. Approval was obtained from Brest ethics committee and the study was referred in clinical trial (NCT 02358213).

statistical analysis
Cohen's κ was used to measure interobserver agreement for binary items. Weighted κ (Fleiss-Cohen weights) was used for items with more than two ordinal categories. 27 κ coefficients were calculated for each pair of observers, leading to mean value, minimum and maximum for each item. The same coefficients were calculated for each observer to measure intraobserver agreement between the first and the second interpretation, leading to mean value, minimum and maximum for each US-SG core item. 28 Number of hypoechoic or anechoic areas was recorded as: none, 1-4 or ≥5 and location was reported as follows: none, isolated, localised, scattered or diffuse. Number of abnormal lymph nodes was recorded as present or absent. Diagnosis advice was reported as follows: ruled out, indeterminate or ruled-in.

results definition of us-sG echostructural abnormalities (us-sG core items)
The experts selected the US-SG echostructural abnormalities related to: echogenicity, homogeneity, lymph nodes, posterior border, in B-mode (see online supplementary table 1). The definitions developed during the first meeting were modified during the second meeting (see online supplementary table 1). Complete agreement was reached about the following items definitions: echogenicity, homogeneity, lymph nodes, posterior border, calcification, hyperechoic bands, hypoechoic/ anechoic areas, location of hypoechoic/anechoic areas and abnormal lymph nodes. We called these items US-SG core items. An initial consensual reference atlas (33 consensual images) was developed based on these definitions (figure 1 and see online supplementary atlas). reliability of sG ultrasound core items using static images intraobserver reliability PG intraobserver US reliability was substantial for: echogenicity, homogeneity, number and location of hypoechoic/ anechoic areas, and normal lymph nodes. The advised pSS diagnosis reliability was substantial (κ=0.86) (table 2). In contrast, intraobserver reliability was moderate for hyperechoic bands, number of abnormal lymph nodes, and fair for calcifications, posterior border visibility. SMG intraobserver US reliability was substantial for echogenicity, homogeneity, hyperechoic band, hypoechoic/anechoic areas and location. The results were moderate for normal lymph nodes, calcification and posterior border. The advised diagnosis reliability was substantial (κ=0.70).
interobserver reliability PG interobserver reliability was substantial for homogeneity, number and location of hypoechoic/anechoic areas and moderate for echogenicity and normal lymph nodes. The results were fair for hyperechoic bands and slight for calcification and posterior border. The advised diagnosis reliability was substantial (κ=0.78). SMG interobserver reliability was fair for echogenicity and calcification. The results were: moderate for homogeneity, hypoechoic/ anechoic areas and location; slight for posterior border and normal lymph nodes. Advised diagnosis reliability was moderate (κ=0.54).

distribution of ultrasound core items for static images
Item distributions were similar between the two readings. Echogenicity was regarded as normal in >50% of both PGs and SMGs. Homogeneity was regarded as abnormal in >50% of the SMGs and 48.6%-52.1% of the PGs. Hyperechoic bands occupying <50% of the gland surface area of both PGs and SMGs were seen in >50% of the cases (table 3). In all four SGs, hypoechoic/anechoic areas were not found in 44.9%-53.1% of cases, and ≥5 hypoechoic areas were found in 23.9%-33.0% of the cases. Normal-appearing lymph nodes were seen in >20% of PGs and <10% of SMGs. Abnormal lymph nodes and calcifications were rare in both PGs and SMGs. The posterior border was usually not visible, particularly for the SMGs. reliability of sG ultrasound core items on acquisition imaging PGs interobserver reliability was substantial for echogenicity and moderate for homogeneity, number and location and the size of the largest hypoechoic/anechoic area but fair for hyperechoic bands and slight for normal lymph nodes, number of abnormal lymph nodes, calcification and posterior border visibility (table 4). The advised diagnosis reliability of pSS was substantial (κ=0.66). SMGs interobserver reliability was fair for echogenicity, homogeneity and number and location of hypoechoic/anechoic areas.

RMD Open
The results were slight for normal lymph nodes, calcifications and posterior border visibility. Advised diagnosis reliability was fair (κ=0.38).
As shown in table 5, mean time duration of US ranged across observers from 11 to 27 min.

dIscussIon
Given the multitude of US-SG abnormalities in pSS, it has become a challenge to reach consensus on the definition and scoring of the most reliable US-SG abnormalities. 19 We conducted this study to develop an international consensus about the definitions of echostructural abnormalities of SG in pSS and to evaluate the reliability of US in detecting them.
Homogeneity and echogenicity items showed substantial intraobserver reliability for both PGs and SMGs on static images. Whereas interobserver reliability of homogeneity was only substantial in PGs and moderate in SMGs and that of echogenicity was moderate in PGs and fair again in SMGs. Heterogeneity was defined as the presence of hypoechoic/anechoic areas with or without hyperechoic bands. For both glands, there is an abundance of hypoechoic/anechoic areas in pSS and their absence in normal individuals. However, the presence

RMD Open
of highly vascularised fatty infiltration (ie, SMG echotexture vs PG echotexture) in SMGs may contribute to the difference observed in interobserver reliabilities of homogeneity item. These findings are consistent with those reported by Yoshiura et al. 7 In line with previous studies, 7 30 several core items showed low reliability in our study, namely, hyperechoic bands, calcifications and posterior border visibility for PGs and SMGs. Even though hyperechoic bands were defined by consensus, their reliable assessment in pSS remains challenging. Hyperechoic bands may develop in normal individuals due to advanced age or fibrosis of the SGs.
The static image inter-reliability of lymph nodes was moderate for the PGs but only slight for the SMGs, a The posterior border was also difficult to assess, particularly for PGs, due to their anatomy and location in the retromandibular fossa. 32 In our study, both PGs and SMGs inter-reliability and intrareliability were moderate to substantial for echogenicity and homogeneity (echostructure) due to the presence and distribution of hypoechoic/anechoic areas.
In the acquisition imaging study in 19 patients with pSS, homogeneity item showed moderate interobserver reliability for PGs and fair for SMGs. Whereas the results for echogenicity item of PG and SMG were substantial and fair, respectively. Our results contradict a single-centre study by Carotti et al, 10 who reported better reliability for SMGs than for PGs. Nevertheless, moderate interobserver reliability for PG homogeneity and fair one for SMG were consistent with those of the literature. [36][37][38] In our study, the experts' advised diagnosis of pSS on static US images showed substantial interobserver and intraobserver reliability for PGs and intraobserver reliability for SMGs. Experts' advised diagnosis of pSS on acquisition US showed substantial interobserver reliability for PGs and fair for SMGs. These results may be explained by the better echogenicity of PGs compared with SMGs. These findings can be ascribed to the development of a novel US atlas as a prerequisite to performing careful ultrasound evaluations by trained US experts of the SGs in patients with pSS. 39 Our study had several limitations. First limitation was the small number of US experts and US acquisitions performed, which precluded a large-scale Delphi. 36 40 However, pSS is not a so common disease 41 42 and, consequently, few experts were trained in SG US at the time of the study. Second, different US machines were used for acquisition in pSS patients, that is, not all experts had prior training with the proposed machines. In addition, intraobserver reliability of US-SG core items was not evaluated due to the lack of time during our study. Another limitation was that the interobserver reliability of abnormal lymph nodes could not be evaluated yet can be explained by the rare nature of this US-SG item in pSS. Our study drawback is the fact that

RMD Open
the images (both in static and in acquisition mode) were not characteristic of patients seen in consultation for a suspicion of pSS. The assessment of the abnormal echostructure (ie, typical pattern of an inhomogeneous gland with hypoechoic/anechoic areas in its parenchyma) of the images led to the 'advised diagnosis' by experts. We did not perform a diagnosis of pSS patients. The typical pattern is a conclusive characteristic of pSS but not present in all pSS patients.
In conclusion, this study is the first attempt to set forth a preliminary atlas of consensual US images and US-SG core items definitions. We assessed the reliability of US-SG core items of the images in regard with the typical pattern of an inhomogeneous gland with hypoechoic/ anechoic areas in its parenchyma, (ie, not present in all pSS patients) to identify those most reliable. The reliability results of the pSS typical pattern on these images can be used carefully by US-SG trained experts and concomitantly with other classification criteria in diagnosis of pSS. US-PG interobserver reliability result (substantial) for echogenicity seems to be in line with that of advised diagnosis of pSS.
Larger sample US-SG acquisition studies using the same US machine by US-SG trained experts are warranted to validate our results and to further evaluate intraobserver reliability of US-SG items. Our US-SMG reliability results open the avenue to further search for other reliable and relevant abnormality definitions and scorings of SMG echotexture-to better distinguish them from neighbouring tissues.