MRI in the Assessment of TMJ-Arthritis in Children with JIA; Repeatability of a Newly Devised Scoring System

Rationale and Objectives: The temporomandibular joint (TMJ) is commonly involved in children with juvenile idiopathic arthritis. The diag- nosis and evaluation of the disease progression is dependent on medical imaging. The precision of this imaging is under debate. Several scoring systems have been proposed but transparent testing of the precision of the constituents of the scoring systems is lacking. The present study aims to test the precision of 25 imaging features based on magnetic resonance imaging (MRI). Materials and Methods: Clinical data and imaging were obtained from the Norwegian juvenile idiopathic arthritis study, The NorJIA study. Twenty- ﬁ ve imaging features of the TMJ in MRI datasets from 86 study participants were evaluated by two experienced radiologists for inter- and intraobserver agreement. Agreement of ordinal variables was measured with Cohen (cid:1) s linear or weighted Kappa as appropriate. Agreement of continuous measurements was assessed with 95% limit of agreement according to Bland-Altman. Results: In the osteochondral domain, the ordinal imaging variables “ loss of condylar volume, ” “ condylar shape, ” “ condylar irregularities, ” “ shape of the eminence/fossa, ” “ disk abnormalities, ” and “ condylar inclination ” showed inter- and intraobserver agreement above Kappa 0.5. In the in ﬂ ammatory domain, the ordinal imaging variables “ joint ﬂ uid, ” “ overall impression of in ﬂ ammation, ” “ synovial enhancement ” and “ bone marrow oedema ” showed inter- and intraobserver agreement above Kappa 0.5. Continuous measurements performed poorly with wide limits of agreement. Conclusion: A precise MRI-based scoring system for assessment of TMJ in JIA is proposed consisting of seven variables in the osteo- chondral domain and four variables in the in ﬂ ammatory domain. Further testing of the clinical validity of the variables is needed.

The diagnosis of TMJ involvement in JIA is based on clinical findings, magnetic resonance imaging (MRI), cone beam computed tomography (CBCT) or a combination of these (11)(12)(13)(14)(15). The accuracy of clinical findings and clinical monitoring of the disease course, both active inflammation and permanent damage, is under debate (16,17) and much effort has been made during the past years to establish a valid, MRI-based imaging protocol and classification system. However, methodological difficulties, including lack of references for normal findings, low image resolution and imprecise scoring systems have led to both over-and underreporting of signs of pathology (18,19). For example, Stoll and colleagues, in a study of 35 patients with JIA and 122 controls without JIA, demonstrated a significant overlap between the two groups with respect to MR findings thought to be suggestive of active disease (20).
In 2013, Koos et al (21) proposed a classification system addressing both structural changes and inflammation, applicable on JIA-related findings in the TMJ for both MRI and CBCT. The authors reported that the system was not hampered with significant intra-or inter-reader differences but did not present any data to confirm their statement. Vaid and colleagues (22) proposed an MRI-based scoring system based on 20 patients, classifying changes into acute or chronic (structural damage). The grading system included measurements of small, intraarticular components under 3 mm in size, however, the precision of these measurements was not presented. The overall interobserver agreement for acute and chronic changes, based on composite variables, was moderate to good, with weighted kappa values of 0.51 and 0.68, respectively.
In 2015, a third MRI-based scoring system was published by Kellenberger and co-authors (13). The system is progressive on a 0-4 scale and divided into an inflammatory domain and a deformity domain. The system is in part built on the experiences drawn from the publications by Koos and Vaid, but in the publication from 2015, a full scale, adequate test of intra-and interobserver agreement is lacking. In 2018, Tolend et al proposed an MRI-based scoring system (23) founded on the experiences drawn from the systems published by Koos, Vaid and Kellenberger. The system was developed by a multi-institutional consensus process finally proposing eight imaging items including both the inflammatory and osteochondral damage domains. Each item was assigned either a binary, ordinal grade (0-1) or a 3-graded, ordinal grade (0-2). The grades of each, individual item were then added, resulting in a total score. The authors performed a reliability exercise of the system in 21 selected cases and chose to measure reliability along an intra-class correlation scale (ICC). However, measuring agreement of ordinal variables with ICC is debatable (24). Furthermore, the selection of patients and low number of patients (n = 21) leaves unanswered questions on the transferability of the results to the JIA population.
In 2018, Kellenberger published a pictorial essay on JIArelated, temporomandibular changes on MRI (14). This publication presents a thorough explanation of the scoring systems already proposed by Tolend and Kellenberger, both through written explanations and through a wide range of MRI examples. Used as a common ground-reference this publication might help reduce interobserver variability.
To date, however, no MRI-based scoring system of the TMJ is proven precise and valid. We therefore aimed to examine the precision of MRI-based measurements and scores used to describe anatomy, structural damage and inflammation of the TMJ in a large cohort of children and adolescents with JIA. Next, to indicate markers holding sufficient precision to be included in a future scoring system for active arthritis and structural deformity.

Patients
The participants in this study constitute a subset of 86 children and adolescents selected from a prospective, longitudinal observational study addressing TMJ involvement in children with JIA (n = 228), the Norwegian JIA Study (NorJIA), NCT number NCT03904459 in www.clinicaltrials.gov. Participants in the NorJIA study were recruited from three tertiary pediatric university hospitals (Haukeland University hospital, Bergen, St Olav University hospital, Trondheim and University hospital of North Norway, Tromsø). Inclusion criteria were a diagnosis of JIA according to the ILAR criteria (25) performed by experienced pediatric rheumatologists, and age between 4 and 16 years at inclusion. According to the study protocol, all of the included participants in the NorJIAstudy were referred to MRI of the TMJ, regardless of clinical symptoms from the TMJ. In cases of clinical TMJ symptoms, and when an MRI was judged to be of specific clinical importance, sedation was used for the younger children. For this particular sub-study, we included MRIs performed between March 2015 and May 2018. Exclusion criteria for this study were suboptimal examinations due to artefacts and the use of braces.
To test the scoring system regarding skeletal development and varied pathology, an a priori, balanced selection of patients from the NorJIA cohort was made, based on the radiology report and patient age. The selection consisted of approximately 33% participants with moderate/severe findings, 33% participants with mild findings and 33% participants with subtle or no findings.

Imaging
All MRI examinations were performed on a 3 Tesla system (Skyra, Siemens healthineers, Erlangen, Germany), using a 64-channel head coil (32-channel at St Olav). An extensive protocol, including nine sequences was performed to allow for comparisons of different sequences, either alone or in combination, in the assessment of pathology. The MRI protocol takes into account the recommendations given by Miller (26) and Kellenberger (27), including sagittal T1weighted MPRAGE, sagittal/oblique proton densityweighted, sagittal/oblique fat-saturated T2-weighted, sagittal/oblique fat-saturated T1-weighted, coronal T1-weighted and coronal T1-weighted two-point Dixon sequences.
Following intravenous gadolinium contrast injection, a dynamic coronal sequence, a sagittal/oblique fat-saturated T1-weighted sequence and a sagittal/oblique proton densityweighted sequence (open mouth) were performed. Intravenous gadolinium contrast was injected in a standardized way in an antecubital vein (Dotarem 279,3 mg/ml, 0,2 ml/kg body weight, 2 ml/s with 20 ml saline chaser). A detailed protocol description is provided in Appendix A.

Image Review
For the present study, the following seven sequences were used; coronal T1-weighted, sagittal T1-weighted MPRAGE, sagittal/oblique fat-saturated T2-weighted, sagittal/oblique fat-saturated T1-weighted, sagittal/oblique proton densityweighted with closed and open mouth and sagittal/oblique fat-saturated T1-weighted after intravenous contrast. The images were assessed independently by two consultant radiologists, twice (at an interval of at least 4 weeks) by O.A. (12 years of experience) and once by T.A.A. (13 years of experience), without any additional information available. Before scoring was performed, previous publications on scoring protocols and imaging atlas were thoroughly studied (14,(21)(22)(23). The readers calibrated their interpretation of the chosen scoring protocol during two 1-day meetings and 2 video conferences, followed by consensus scoring of five TMJ MR examinations from a cohort of children with JIA, not included in the present study.
Five imaging markers describing anatomical features, seven describing structural changes (damage) and 13 markers describing inflammation were analyzed for the right and for the left TMJ, separately (Appendix B, C, D and E). To explore the usefulness of an extended MRI protocol, assessment of condylar irregularities was made, first on a minimal (core) set of sequences and second, on an extended (ideal) set of sequences, as suggested by Miller et al (26).

Statistical Analysis
Continuous data were presented as means ( §SD), ordinal data as medians (ranges) and dichotomous data as proportions. Intra-and interobserver agreement were analyzed using a simple or a weighted (linear) Cohen's Kappa coefficient with 95% confidence interval. A kappa score of <0.2 was considered poor, 0.21-0.40 fair, 0.41-0.60 moderate, 0.61-0.80 good and 0.81-1.00 very good. Absolute agreement was reported as proportions. Differences in measurements were analyzed using 95% limits of agreement (termed repeatability coefficient, when used for repeat measurements) as per Bland-Altman. Bland-Altman plots are generally interpreted informally, and a clinically acceptable agreement was set at 15%. A significance level of 0.05 was decided a priori and all the reported p values are two-tailed. Statistical analyses were performed using IBM SPSS Statistics, version 26.

Ethics
The NorJIA study was approved by the Regional Ethics Committee; REK nr 2012/542. Informed consents were given by the children if 16 years, and by the parents if the child were <16 years. Data was collected and stored according to the General Data Protection Regulation.

RESULTS
One set of MRIs from a total of 86 children (51 females) with JIA, median age 13 years (IQR 5), were included. Median age at diagnosis was 6 years (IQR 8) and the median duration of disease at the time of MRI imaging was 4,5 years (IQR 6) ( Table 1). The distribution of findings for each of the 25 MRI-features assessed are shown in Figure 1 and 2 (right side first reading).

Osteochondral Domain
Assessment of loss of condylar volume on a 0-1 scale, condylar shape/flattening in the sagittal (0-3 and 0-2 scale) and in the coronal plane (0-2 scale), condylar irregularities on a 0-2 scale (both based on a core and an ideal protocol), disk abnormalities on a 0-1 scale and the shape of the articular eminence and glenoid fossa on a 0-2 scale showed good to very good agreement for the same reader, with kappa values of 0.67-0.80 (Table 2) (Fig 3). The inter-reader agreement was also good to very good except for condylar irregularities (both the ideal and the core protocols) and shape of the articular eminence and glenoid fossa, showing moderate agreement with kappa values of 0.57, 0.47 and 0.55, respectively (Table 2) (Figs 4aÀb and Figure 5aÀb).
Assessment of condylar inclination on a 0-2 scale showed good intra-and interobserver agreement, with kappa values of 0.74 and 0.61.   Assessment of the position of the condyle on a 0-6-point location scale showed fair agreement for the same reader and poor inter-reader agreement ( Table 2).
As for disk position on a 0-5 scale, with the mouth closed, there was a fair intra-reader and a poor inter-reader agreement (Table 2).

Inflammatory Domain
Joint fluid: The intra-observer agreement for assessment of joint fluid on a 0-2 scale was good, both for the whole joint, and for the lower compartment, with kappa values of 0.74 and 0.69, respectively, while the agreement for upper compartment was moderate (kappa 0.51) ( Table 3) (Fig 6). Agreement between observers was good for the whole joint, moderate for the upper and poor for the lower compartment ( Table 3). Assessment of pathological fluid on a 0-1 scale performed well for the same observer, and moderately between observers.
Synovial inflammation/enhancement/thickening: There was moderate agreement for grading overall impression of inflammation on a 0-2 scale, with a kappa value of 0.59 for the same reader and 0.57 between readers (Table 3).
Assessing synovial enhancement on a 0-2 scale showed good to moderate agreement, with kappa values of 0.68 for the same reader and 0.54 between readers (Fig 7). Similar, the  Abbreviations: JIA juvenile idiopathic arthritis; MRI, magnetic resonance imaging; TMJ, temporomandibular joint; 1 0=none, 1=present 2 0=rounded/ovoid, 1=subtle anterior flattening, 2=mild flattening, involves part of the surface of the condyle, 3= moderate/severe flattening involves the entire surface of the condyle, or loss of height of the condyle 3 0=Absent; round/slightly angular shape of the condyle, 1=Mild, extent of flattening involves part of the surface of the condyle. 2=Moderate/ severe, extent of flattening involves the entire surface of the condyle, or loss of height of the condyle. According to reference 23 4 0=Normal shape of temporal bone and mandibular condyle according to age: S-shaped articular eminence/glenoid fossa. Round condyle (young patient). Less rounded, more angular appearing condyle (older patient). Smooth subchondral bone contour, 1=Mild flattening of the mandibular condyle and/or temporal bone. 2=Moderate flattening of the mandibular condyle and/or temporal bone. 3=Severe flattening of the mandibular condyle with loss of height, and/or completely flat temporal bone, and/or presence of small erosions/irregularities. 4= "Destruction" of temporomandibular joint by large erosions, fragmentation of the mandibular condyle, intra-articular ossification or bone apposition on mandibular condyle or temporal bone. According to reference 13 5 0=Convex throughout, 1=mild/partial flattening, 2=moderately or severely flattened throughout 6 0=absent, 1=present 7 0=Straight, 1=mild anterior inclination, 2=moderate/significant anterior inclination 8 Based on coronal T1, Sagittal/oblique T2fs, Sagittal/oblique T1fs, Sagittal/oblique PD and Sagittal/oblique T1-fs with Gd; 0=none, 1=mild (involving only part of the articular surface of the condyle), 2=moderate/severe (presence of deep breaks in the subchondral bone seen in two planes, or irregularities involving the entire articular surface) 9 0=S-shaped, 1= mild to moderate widening or flattening, 2= severely flattened fossa-eminence 10 Based on coronal T1, Sagittal/oblique T2fs, Sagittal/oblique T1-fs with Gd; 0=none, 1=mild (involving only part of the articular surface of the condyle), 2=moderate/severe (presence of deep breaks in the subchondral bone seen in two planes, or irregularities involving the entire articular surface). Right side excluded due to skewed distribution of findings.
11 Overall position of the condyle in the temporal fossa; 0=neutral, 1=anterior, 2=posterior, 3=medial, 4=lateral, 5=superior, 6=inferior 12 0=none, 1=displaced anteriorly, 2=displaced posteriorly, 3=displaced laterally, 4=displaced medially, 5=Not applicable, discus cannot be defined agreement for grading inflammation on a 0-4 scale according to the progressive system as suggested by Kellenberger (13) was good to moderate, with kappa values of 0.61 for the same reader and 0.45 between readers. The agreement for assessment of synovial thickening on a 0-2 scale and joint enhancement on a 0-2 scale, as suggested by Tolend (23), was moderate with kappa values of 0.43-0.44 both between readers and for the same reader. Subjective impression of thickened synovium was assessed with moderate agreement for the same reader and fair agreement between readers (Kappa 0.23).
Bone marrow oedema/enhancement: Assessment of bone marrow oedema on a 0-1 scale showed fair to moderate agreement, with kappa values of 0.35 for the same reader and 0.54 between readers.
The analysis of agreement of the variable bone marrow enhancement on a 0-2 scale was hampered by severely skewed distribution in one of the readings. Therefore, kappa analysis could not be performed. The variable showed a high proportion of absolute agreement (89%).
Direct measurements of joint fluid: The mean measurement of joint fluid in the upper compartment was 0.2 mm (median 0.1), with 95% limits of agreement of -0.6 to 0.4 mm between readers. The mean measurement of joint fluid in the lower compartment was 0.3 mm (median 0.1) with 95% limits of agreement of -1.0 to 0.7 mm between readers.
Based on the presented results a scoring system consisting of the following, precise imaging features could be considered (Table 4).

DISCUSSION
Of 25 commonly used MRI-based markers for TMJ changes in children with JIA, 13 showed sufficient precision, of which  11 were judged the more relevant to be included in a robust scoring system; seven within the osteochondral domain and four within the inflammatory domain (Table 4). An additional six markers performed well for the same reader, indicating that these be used with caution. Interestingly, several of the commonly used markers performed poorly, in particular assessment of synovial thickness and joint enhancement, as well as measurements of joint fluid.

Osteochondral Domain
In the present study, the most precise MRI marker suggestive of osteochondral damage was condylar volume on a 0-1 scale;  0 being within normal and 1 representing a clearly deformed condyle in the sagittal and/or coronal views, a feature not seen in children without JIA (14,19,28).
Likewise, assessment of osseous deformity as suggested by Tolend and Kellenberger using a progressive scoring system performed well, however, this grading system is based on a sequence of pathological changes, starting with a mildly flattened mandibular condyle and/or temporal bone (grade 1), followed by moderate flattening of the same structures (grade 2). Grade 3 is characterized by severe flattening of the mandibular condyle with loss of height, and/or completely flat temporal bone, and/or presence of small erosions/irregularities while grade 4 is defined as destruction of the temporomandibular joint by large erosions, fragmentation of the mandibular condyle, intra-articular ossification or bone apposition on mandibular condyle or temporal bone (13).
We have previously shown that a mildly flattened condyle is seen in around 20% of children without JIA, and as such represents a normal variation rather than early destructive change (19). Moreover, we experienced that both condylar irregularities and erosions may be present before severe condylar flattening, thus biasing a progressive system.
To overcome the abovementioned challenges, we suggest that the different markers are scored separately, and summarized. More specifically, that the most precise markers, such as loss of condylar volume, condylar shape and irregularities, and shape of articular eminence and glenoid fossa are used to construct a total damage score. Ideally, each of these components should be weighted, for example by using CBCT scores that are more fine-meshed in the osteochondral domain.
Several authors have explored the importance and incidence of disk abnormalities in TMJ (29-32), however, without addressing the precision of findings. We have now shown that assessing the disk as either normal or pathological represents a precise variable.
Subjective assessment of the condylar inclination showed good intra-and interobserver agreement. Previous studies have shown that the condylar inclination is symmetrical, and that it normally increases with age (14,19,28). Thus, the finding of asymmetric condylar inclination in a child with JIA could indicate growth disturbances secondary to the disease.

Inflammation
Four markers within the inflammation domain were considered of sufficient precision, both within and between readers, to be included in a future scoring system, namely joint fluid on a 0-2 scale, overall impression of inflammation on a 0-2 scale, synovial enhancement on a 0-2 scale and bone marrow oedema on a 0-1 scale ( Table 4).
As for evaluation of joint fluid, the hybrid assessment with both continuous measurements and semi-qualitative evaluation suggested by Tolend (23) performed well in contrast to the subjective grading of the upper and lower compartments separately. However, direct continuous measurement of joint fluid turned out to be rather inaccurate, with significant variation between observers. These results are in line with others (33,34), reflecting difficulties in measuring small distances. To overcome the challenges associated with continuous measurements, we tested the subjective variable "overall impression of joint fluid," although with disappointing results between readers. In conclusion, the mechanisms providing high precision to the variable "joint fluid" are not fully understood, but probably depends on a thorough understanding of the normal appearances of fluid in the recesses and joint compartments.
The variable "overall impression of inflammation 0-2 00 depends explicitly on the subjective understanding of normal, age-related and physiologic findings in the TMJ. At the same time the variable demands the reader to define, from his/her own understanding, the difference between normal findings and inflammation. Like the binary variables "overall impression of pathological joint fluid" and "loss of condylar volume" this type of variables has not been tested in other publications. This study shows that the variable as such is precise enough to be studied further.
Opposite to the marker synovial enhancement, which was based on pre and post T1-fat suppressed images only,  Abbreviations: JIA juvenile idiopathic arthritis; MRI, magnetic resonance imaging; TMJ, temporomandibular joint. 1 0=absent; ≤1 mm fluid in joint recess, 1=small; >1 mm and ≤2 mm fluid in recess or involving entire joint compartment, 2=large; >2 mm fluid in recess or involving entire joint compartment. Adapted from reference 23 2 0=no joint fluid, 1=a thin line of fluid, 2=more than a thin line of fluid 3 0=no joint fluid, 1=a thin line of fluid, 2=more than a thin line of fluid 4 0=no, 1=yes 5 0=normal, includes normal synovial enhancement and a thin line of joint fluid, 1=mild inflammation, considered pathological, 2=moderate/ severe inflammation 6 0=subtle synovial enhancement, 1=mildly increased synovial enhancement, 2=moderately to severe synovial enhancement (signal intensity ≥ nearby vessel) 7 0= no inflammation: No or small amounts of joint fluid in any recess, with ≤ 1 mm width. No enhancement or enhancement confined to physiological joint fluid. 1= mild inflammation: Extension of joint enhancement exceeds that of physiological joint fluid but does not involve entire joint compartment and/or presence of bone marrow oedema. 2= moderate inflammation: Joint enhancement involves entire joint compartment or there is an enhancing joint effusion, 3= severe inflammation: Detectable synovial thickening in addition to increased joint enhancement or effusion, 4= joint space filled with and enlarged by pannus. Adapted from reference 13 8 0=absent; no synovium visible (apparent joint compartment ≤1 mm width), 1=mild; >1 and <2 mm thickness at the point of maximum synovial thickening, 2=Moderate/severe; >2 mm thickness at the point of maximum synovial thickening. Adapted from reference 23 9 0=normal; high signal intensity confined to signal perimeter of normal amount of fluid on corresponding fluid-sensitive image, 1=mild; high signal intensity focally exceeding signal perimeter of physiologic amount of joint fluid on corresponding fluid-sensitive image, 2=moderate/ severe; high signal intensity diffusely involving 1 or both joint compartments. Adapted from reference 23 10 0=no thickening, 1=mild thickening, 2=moderate/severe thickening 11 0=absent, 1=present 12 0=No enhancement, 1=subtle enhancement, what is considered normal, 2=increased, pathological enhancement assessment of joint enhancement, as suggested by Tolend et al (23), is based on both fluid-sensitive images as well as postcontrast T1-weighted fat-suppressed images. According to their 0-2 score, mild inflammation is defined as high signal intensity focally exceeding signal perimeter of physiologic amount of joint fluid on corresponding fluid-sensitive image  while moderate to severe inflammation is characterized by high signal intensity diffusely involving one or both joint compartments. We observed numerous cases showing subtle, focal, synovial contrast enhancement on T1-weighted fatsuppressed images, with no fluid seen on T2-weighted images, i.e., a grade 0 according to the synovial enhancement score and a grade 1 according to the joint enhancement score. Thus, it seems that combining pre-gadolinium fluid-sensitive images with post-gadolinium fat-suppressed T1-weighted images tends to overestimate pathology, possibly accentuated by slightly different imaging parameters on T1-and T2weighted images. These difficulties are reflected in the slightly lower agreement between readers for the joint enhancement score as compared to the synovial enhancement score. We found acceptable agreement between readers for the assessment of condylar bone marrow oedema. In adults with rheumatoid arthritis of the wrist, the precision of this variable is addressed in numerous publications (35)(36)(37) with results supporting the findings in our study. However, the precision in these studies is measured as a sum of scores along an ICCscale so the transferability of the results to the mandibles of a pediatric population is questionable. In their study on MRI and CBCT Koos and colleagues report "no relevant interobserver differences" which per se supports our findings, even though their statement could be more elaborated (21). In 2014, Vaid studied the composite variable including contrast enhancement, joint fluid, synovial thickening and bone marrow oedema with a weighted kappa of 0.51. The complexity of their composite variable makes it hard to say if their results support or contradict our findings (22). Lastly, Tolend tested both a binary and a 4-graded version of the variable bone marrow oedema with ICC-results that do not support our findings (sICC 0.01 and 0.06, avICC 0.61 and 0.57). Still, bone marrow oedema is considered an important marker, as oedema/osteitis is believed to represent relevant pathology in rheumatology. Taken together, we suggest the variable should be part of a future scoring system.
As for the progressive inflammation score, this is based on a fixed sequence of changes, like that described for the osteochondral domain. We experienced, in a small number of TMJs, that this sequence was violated, in that subtle synovial thickening was present without synovial enhancement or joint effusion. Thus, according to the progressive system, these joints should be scored as a grade 3 inflammation. Seen together with the difficulties in defining synovial thickening this represents a bias in the progressive system. As in the osteochondral domain, we suggest that each variable be scored separately, and subsequently summarized.
Similar to bone marrow oedema, the variable "bone marrow enhancement" aims to describe an important and closely related part of the rheumatologic pathology, namely osteitis and increased perfusion of the intraosseous part of the condyle. However, we noted that virtually all condyles demonstrated some degree of enhancement, also when compared to the mandibular ramus, which corresponds to a grade 1 in the binary system as proposed by Tolend. The 3-graded system proposed in this study shows a slight differentiation between assumed normal and pathological enhancement with a high proportion of absolute agreement, although kappa analysis could not be performed due to skewed distribution of the findings. We note that Tolend and co-workers do not present data on the repeatability of the binary variant of this variable. The assumed importance of the pathological process, in combination with the paucity of data on the precision of the variable makes it an interesting topic for further research, but as per today it should not be included in a robust scoring system.
Except for the inflammation score in the progressive system, all these scores are relatively crude, however, previous studies have demonstrated difficulties in establishing reliable, fine-meshed imaging markers for the inflammatory domain (23).
In general, we found that the intra-observer agreement was better than agreement between observers, despite thorough calibration and the use of a reference atlas. This is not unexpected and similar results has been shown in numerous earlier publications. Still, we assume that this finding underscores the importance of performing clinical, JIA-related radiology reporting in a small environment of subspecialists with a special interest in JIA.

Limitations and Strengths
We acknowledge that our study has shortcomings. First, the use of Cohens Kappa has limitations especially in analysis of datasets with skewed distribution (38,39). To compensate for this, we chose to both present the proportion of absolute agreement and the distribution of findings for each variable. We assume this to be a more correct and transparent way of presenting the data than other statistical models which would introduce other sources of error. Next, the study was performed with two readers only, aiming to examine the potential of a scoring system given optimal conditions, rather than assessing its performance in a clinical setting. And lastly, the distribution of findings for some of the variables under investigation was skewed, thus hindering statistical analysis to be performed. The strengths of our study include the high numbers, the meticulous standardization of scoring systems and measurements, and the construction of an atlas for optimizing precision.

CONCLUSION
We propose a robust scoring system for the assessment of TMJ involvement in children with JIA including four variables in the inflammatory domain and seven variables in the osteochondral domain. Further studies on clinical validity of these markers are needed.

ACKNOWLEDGMENTS
This work was partially funded by the Liaison Committee between the Central Norway Regional Health Authority Signal intensity of the synovium, capsule and joint fluid higher than that of muscle on post contrast T1-fat saturated images 0=Normal; high signal intensity confined to signal perimeter of normal amount of fluid on corresponding fluid-sensitive image 1=Mild; high signal intensity focally exceeding signal perimeter of physiologic amount of joint fluid on corresponding fluid-sensitive image 2= Moderate/ severe; high signal intensity diffusely involving 1 or both joint compartments Joint fluid a Increased joint fluid with isointense signaling of joint space compared to that of cerebrospinal fluid on fluid-sensitive images 0=Absent; 1mm fluid in recess 1=Small; >1 and 2 mm in recess or involving entire joint compartment 2=Large; >2 mm fluid in recess or involving entire joint compartment Synovial enhancement Sagittal/ oblique T1-fat saturated images post iv contrast 0=Subtle synovial enhancement 1=Mildly increased synovial enhancement 2=Moderate to severe synovial enhancement (signal intensity nearby vessel) Synovial thickening a Sagittal/oblique T2 fat-saturated images 0=Absent; no synovium visible (apparent joint compartment 1 mm width) 1=Mild; >1 and <2 mm thickness at the point of maximum synovial thickening, 2=Moderate/severe; >2 mm thickness at the point of maximum synovial thickening Joint enhancement a Sagittal/ oblique T1-fat saturated images post iv contrast and sagittal/oblique T2 fat-saturated images 0=Normal; high signal intensity confined to signal perimeter of normal amount of fluid on corresponding fluid-sensitive image 1=Mild; high signal intensity focally exceeding signal perimeter of physiologic amount of joint fluid on corresponding fluid-sensitive image 2=Moderate/severe; high signal intensity diffusely involving 1 or both joint compartments (continued) (RHA) and the Norwegian University of Science and Technology (NTNU) and "Norsk Revmatikerforbund." The study has also been supported by the Northern Norway Regional Health Authority and by the Tromsø Research Foundation (TFS).