Ribosomal DNA genes (rDNA) encode the major ribosomal RNAs and in eukaryotes typically form tandem repeat arrays. Species have characteristic rDNA copy numbers, but there is substantial intra-species variation in copy number that results from frequent rDNA recombination. Copy number differences can have phenotypic consequences, however difficulties in quantifying copy number mean we lack a comprehensive understanding of how copy number evolves and the consequences. Here we present a genomic sequence read approach to estimate rDNA copy number based on modal coverage to help overcome limitations with existing mean coverage-based approaches. We validated our method using Saccharomyces cerevisiae strains with known rDNA copy numbers. Application of our pipeline to a global sample of S. cerevisiae isolates showed that different populations have different rDNA copy numbers. Our results demonstrate the utility of the modal coverage method, and highlight the high level of rDNA copy number variation within and between populations.
Keywords: Bioinformatics method; Copy number; Ribosomal DNA; Saccharomyces cerevisiae; Sequence read coverage; Tandem repeats.
Copyright © 2022 The Authors. Published by Elsevier Inc. All rights reserved.