Target sequence capture in the Brazil nut family (Lecythidaceae): Marker selection and in silico capture from genome skimming data

Mol Phylogenet Evol. 2019 Jun:135:98-104. doi: 10.1016/j.ympev.2019.02.020. Epub 2019 Feb 25.

Abstract

Reconstructing species trees from multi-loci datasets is becoming a standard practice in phylogenetics. Nevertheless, access to high-throughput sequencing may be costly, especially with studies of many samples. The potential high cost makes a priori assessments desirable in order to make informed decisions about sequencing. We generated twelve transcriptomes for ten species of the Brazil nut family (Lecythidaceae), identified a set of putatively orthologous nuclear loci and evaluated, in silico, their phylogenetic utility using genome skimming data of 24 species. We designed the markers using MarkerMiner, and developed a script, GoldFinder, to efficiently sub-select the best makers for sequencing. We captured, in silico, all designed 354 nuclear loci and performed a maximum likelihood phylogenetic analysis on the concatenated sequence matrix. We also calculated individual gene trees with maximum likelihood and used them for a coalescent-based species tree inference. Both analyses resulted in almost identical topologies. However, our nuclear-loci phylogenies were strongly incongruent with a published plastome phylogeny, suggesting that plastome data alone is not sufficient for species tree estimation. Our results suggest that using hundreds of nuclear markers (i.e. 354) will significantly improve the Lecythidaceae species tree. The framework described here will be useful, generally, for developing markers for species tree inference.

Keywords: Lecythidaceae; MarkerMiner; Markers; Species Tree; Target sequencing; Transciptomes.

Publication types

  • Research Support, Non-U.S. Gov't
  • Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

  • Bertholletia / genetics*
  • Computer Simulation*
  • Genetic Markers
  • Genome, Plant*
  • Likelihood Functions
  • Phylogeny
  • Selection, Genetic*
  • Sequence Analysis, DNA*
  • Transcriptome / genetics

Substances

  • Genetic Markers