Autism spectrum disorders (ASDs) are heterogeneous neurodevelopmental conditions. In fMRI studies, including most machine learning studies seeking to distinguish ASD from typical developing (TD) samples, cohorts differing in gender and symptom severity composition are often treated statistically as one "ASD group". Using resting-state functional connectivity (FC) data, we implemented random forest to build diagnostic classifiers in 4 ASD samples including a total of 656 participants (NASD = 306, NTD = 350, ages 6-18). Groups were manipulated to titrate heterogeneity of gender and symptom severity and partially overlapped. Each sample differed on inclusionary criteria: (1) all genders, unrestricted severity range; (2) only male participants, unrestricted severity; (3) all genders, higher severity only; (4) only male participants, higher severity. Each set consisted of 200 participants per group (ASD, TD; matched on age and head motion), 160 for training and 40 for validation. FMRI time series from 237 regions of interest (ROIs) were Pearson correlated in a 237×237 FC matrix and classifiers were built using random forest in training samples. Classification accuracies in validation samples were 62.5%, 65%, 70% and 73.75%, respectively for samples 1-4. Connectivity within cingulo-opercular task control (COTC) network, and between COTC ROIs and default mode and dorsal attention network contributed overall most informative features, but features differed across sets. Findings suggest that diagnostic classifiers vary depending on ASD sample composition. Specifically, greater homogeneity of samples regarding gender and symptom severity enhances classifier performance. However, given the true heterogeneity of ASDs, performance metrics alone may not adequately reflect classifier utility.
Keywords: Autism diagnostic observation schedule; Autism spectrum disorder; Conditional random forest Functional connectivity; fMRI; symptom severity. machine learning. heterogeneity.