Ancient DNA obtained from ancient samples, such as sediments, bones, and teeth, is an important genetic resource that can be used to reconstruct an evolutional history of humans, animals, and plants. The application of high-throughput sequencing enables the research of ancient DNA to be conducted in a whole genome scale. However, post-mortem DNA damage mainly caused by deamination of cytosine to uracil (or methylated cytosine to thymine) may confound the variant calling and downstream analysis. In this article, we develop a Python program to implement a new variant caller, "AntCaller", which extracts the information on nucleotide substitutions from sequencing data and calculates the probability of each genotype based on a Bayesian rule. Through both simulation studies and real data analyses, it was shown that our method reduced the false discovery rate caused by nucleotide misincorporations and outperformed two mainstream variant callers (i.e., GATK and SAMtools) in terms of calling accuracy. In a real application with serious DNA damage, AntCaller still outperformed GATK and SAMtools combined with quality score recalling.
Keywords: Ancient DNA; Chemical damage; NGS; SNV/SNP calling.