Inbred Strain Variant Database (ISVdb): A Repository for Probabilistically Informed Sequence Differences Among the Collaborative Cross Strains and Their Founders

Daniel Oreper; Yanwei Cai; Lisa M Tarantino; Fernando Pardo-Manuel de Villena; William Valdar

doi:10.1534/g3.117.041491

Inbred Strain Variant Database (ISVdb): A Repository for Probabilistically Informed Sequence Differences Among the Collaborative Cross Strains and Their Founders

G3 (Bethesda). 2017 Jun 7;7(6):1623-1630. doi: 10.1534/g3.117.041491.

Authors

Daniel Oreper^{1

2}, Yanwei Cai^{1

2}, Lisa M Tarantino^{2

3}, Fernando Pardo-Manuel de Villena^{2

4}, William Valdar^{5

4}

Affiliations

¹ Curriculum in Bioinformatics and Computational Biology, University of North Carolina, Chapel Hill, North Carolina 27599-7265.
² Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599-7265.
³ Division of Pharmacotherapy and Experimental Therapeutics, Eshelman School of Pharmacy University of North Carolina, Chapel Hill, North Carolina 27599-7265.
⁴ Lineberger Comprehensive Cancer Center.
⁵ Department of Genetics, University of North Carolina, Chapel Hill, North Carolina 27599-7265 william.valdar@unc.edu.

Abstract

The Collaborative Cross (CC) is a panel of recently established multiparental recombinant inbred mouse strains. For the CC, as for any multiparental population (MPP), effective experimental design and analysis benefit from detailed knowledge of the genetic differences between strains. Such differences can be directly determined by sequencing, but until now whole-genome sequencing was not publicly available for individual CC strains. An alternative and complementary approach is to infer genetic differences by combining two pieces of information: probabilistic estimates of the CC haplotype mosaic from a custom genotyping array, and probabilistic variant calls from sequencing of the CC founders. The computation for this inference, especially when performed genome-wide, can be intricate and time-consuming, requiring the researcher to generate nontrivial and potentially error-prone scripts. To provide standardized, easy-to-access CC sequence information, we have developed the Inbred Strain Variant Database (ISVdb). The ISVdb provides, for all the exonic variants from the Sanger Institute mouse sequencing dataset, direct sequence information for CC founders and, critically, the imputed sequence information for CC strains. Notably, the ISVdb also: (1) provides predicted variant consequence metadata; (2) allows rapid simulation of F1 populations; and (3) preserves imputation uncertainty, which will allow imputed data to be refined in the future as additional sequencing and genotyping data are collected. The ISVdb information is housed in an SQL database and is easily accessible through a custom online interface (http://isvdb.unc.edu), reducing the analytic burden on any researcher using the CC.

Keywords: Collaborative Cross; MPP; haplotype; inbred strain; multiparental populations; online GUI; variant imputation.

Publication types

Research Support, Non-U.S. Gov't
Research Support, N.I.H., Extramural

MeSH terms

Algorithms
Animals
Breeding
Computer Simulation
Crosses, Genetic
Databases, Genetic*
Genetic Variation*
Genomics / methods
Genotype
Haplotypes
Mice
Mice, Inbred Strains*
User-Computer Interface
Web Browser
Workflow

Abstract

Publication types

MeSH terms

Grants and funding