The application of epiphenotyping approaches to DNA methylation array studies of the human placenta

Res Sq [Preprint]. 2023 Jun 26:rs.3.rs-3069705. doi: 10.21203/rs.3.rs-3069705/v1.

Abstract

Background : Genome-wide DNA methylation (DNAme) profiling of the placenta with Illumina Infinium Methylation bead arrays is often used to explore the connections between in utero exposures, placental pathology, and fetal development. However, many technical and biological factors can lead to signals of DNAme variation between samples and between cohorts, and understanding and accounting for these factors is essential to ensure meaningful and replicable data analysis. Recently, "epiphenotyping" approaches have been developed whereby DNAme data can be used to impute information about phenotypic variables such as gestational age, sex, cell composition, and ancestry. These epiphenotypes offer avenues to compare phenotypic data across cohorts, and to understand how phenotypic variables relate to DNAme variability. However, the relationships between placental epiphenotyping variables and other technical and biological variables, and their application to downstream epigenome analyses, have not been well studied. Results : Using DNAme data from 204 placentas across three cohorts, we applied the PlaNET R package to estimate epiphenotypes gestational age, ancestry, and cell composition in these samples. PlaNET ancestry estimates were highly correlated with independent polymorphic ancestry informative markers, and epigenetic gestational age, on average, was estimated within 4 days of reported gestational age, underscoring the accuracy of these tools. Cell composition estimates varied both within and between cohorts, but reassuringly were robust to placental processing time. Interestingly, the ratio of cytotrophoblast to syncytiotrophoblast proportion decreased with increasing gestational age, and differed slightly by both maternal ethnicity (lower in white vs. non-white) and genetic ancestry (lower in higher probability European ancestry). The cohort of origin and cytotrophoblast proportion were the largest drivers of DNAme variation in this dataset, based on their associations with the first principal component. Conclusions : This work confirms that cohort, array (technical) batch, cell type proportion, self-reported ethnicity, genetic ancestry, and biological sex are important variables to consider in any analyses of Illumina DNAme data. Further, we demonstrate that estimating epiphenotype variables from the DNAme data itself, when possible, provides both an independent check of clinically-obtained data and can provide a robust approach to compare variables across different datasets.

Publication types

  • Preprint