Cellular identity relies on cell-type-specific gene expression controlled at the transcriptional level by cis-regulatory elements (CREs). CREs are unevenly distributed across the genome, giving rise to individual CREs and clusters of CREs (COREs). Technical and biological features hinder CORE identification. We addressed these issues by developing an unsupervised machine learning approach termed clustering of genomic regions analysis method (CREAM). CREAM automates CORE detection from chromatin accessibility profiles that are enriched in CREs strongly bound by master transcription regulators, proximal to highly expressed and essential genes, and discriminating cell identity. Although COREs share similarities with super-enhancers, we highlight differences in terms of the genomic distribution and structure of these cis-regulatory units. We further show the enhanced value of COREs over super-enhancers to identify master transcription regulators, highly expressed and essential genes defining cell identity. COREs enrich at topologically associated domain (TAD) boundaries. They are also preferentially bound by the chromatin looping factors CTCF and cohesin, in contrast to super-enhancers, forming clusters of CTCF and cohesin binding regions and defining homotypic clusters of transcription regulator binding regions (HCTs). Finally, we show the clinical utility of CREAM to identify COREs across chromatin accessibility profiles to stratify more than 400 tumor samples according to their cancer type and to delineate cancer type-specific active biological pathways. Collectively, our results support the utility of CREAM to delineate COREs underlying, with greater accuracy than individual CREs or super-enhancers, the cell-type-specific biological underpinning across a wide range of normal and cancer cell types.
© 2019 Madani Tonekaboni et al.; Published by Cold Spring Harbor Laboratory Press.