The clinical and genetic spectrum of paediatric speech and language disorders in 52,143 individuals

medRxiv [Preprint]. 2024 Apr 23:2024.04.23.24306192. doi: 10.1101/2024.04.23.24306192.

Abstract

Speech and language disorders are known to have a substantial genetic contribution. Although frequently examined as components of other conditions, research on the genetic basis of linguistic differences as separate phenotypic subgroups has been limited so far. Here, we performed an in-depth characterization of speech and language disorders in 52,143 individuals, reconstructing clinical histories using a large-scale data mining approach of the Electronic Medical Records (EMR) from an entire large paediatric healthcare network. The reported frequency of these disorders was the highest between 2 and 5 years old and spanned a spectrum of twenty-six broad speech and language diagnoses. We used Natural Language Processing to assess to which degree clinical diagnosis in full-text notes were reflected in ICD-10 diagnosis codes. We found that aphasia and speech apraxia could be easily retrieved through ICD-10 diagnosis codes, while stuttering as a speech phenotype was only coded in 12% of individuals through appropriate ICD-10 codes. We found significant comorbidity of speech and language disorders in neurodevelopmental conditions (30.31%) and to a lesser degree with epilepsies (6.07%) and movement disorders (2.05%). The most common genetic disorders retrievable in our EMR analysis were STXBP1 (n=21), PTEN (n=20), and CACNA1A (n=18). When assessing associations of genetic diagnoses with specific linguistic phenotypes, we observed associations of STXBP1 and aphasia (P=8.57 × 10-7, CI=18.62-130.39) and MYO7A with speech and language development delay due to hearing loss (P=1.24 × 10-5, CI=17.46-Inf). Finally, in a sub-cohort of 726 individuals with whole exome sequencing data, we identified an enrichment of rare variants in synaptic protein and neuronal receptor pathways and associations of UQCRC1 with expressive aphasia and WASHC4 with abnormality of speech or vocalization. In summary, our study outlines the landscape of paediatric speech and language disorders, confirming the phenotypic complexity of linguistic traits and novel genotype-phenotype associations. Subgroups of paediatric speech and language disorders differ significantly with respect to the composition of monogenic aetiologies.

Keywords: Human Phenotype Ontology; electronic medical records; genetics; language disorder; speech disorder.

Publication types

  • Preprint