Taxonomic identification accuracy from BOLD and GenBank databases using over a thousand insect DNA barcodes from Colombia

PLoS One. 2023 Apr 24;18(4):e0277379. doi: 10.1371/journal.pone.0277379. eCollection 2023.

Abstract

Recent declines of insect populations at high rates have resulted in the need to develop a quick method to determine their diversity and to process massive data for the identification of species of highly diverse groups. A short sequence of DNA from COI is widely used for insect identification by comparing it against sequences of known species. Repositories of sequences are available online with tools that facilitate matching of the sequences of interest to a known individual. However, the performance of these tools can differ. Here we aim to assess the accuracy in identification of insect taxonomic categories from two repositories, BOLD Systems and GenBank. This was done by comparing the sequence matches between the taxonomist identification and the suggested identification from the platforms. We used 1,160 COI sequences representing eight orders of insects from Colombia. After the comparison, we reanalyzed the results from a representative subset of the data from the subfamily Scarabaeinae (Coleoptera). Overall, BOLD systems outperformed GenBank, and the performance of both engines differed by orders and other taxonomic categories (species, genus and family). Higher rates of accurate identification were obtained at family and genus levels. The accuracy was higher in BOLD for the order Coleoptera at family level, for Coleoptera and Lepidoptera at genus and species level. Other orders performed similarly in both repositories. Moreover, the Scarabaeinae subset showed that species were correctly identified only when BOLD match percentage was above 93.4% and a total of 85% of the samples were correctly assigned to a taxonomic category. These results accentuate the great potential of the identification engines to place insects accurately into their respective taxonomic categories based on DNA barcodes and highlight the reliability of BOLD Systems for insect identification in the absence of a large reference database for a highly diverse country.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Coleoptera* / genetics
  • Colombia
  • DNA / genetics
  • DNA Barcoding, Taxonomic / methods
  • Databases, Nucleic Acid*
  • Insecta
  • Phylogeny
  • Reproducibility of Results

Substances

  • DNA

Grants and funding

NBB; No. 848-2019 Minciencias postdoctoral fellowship Fondo Nacional de Financiamiento para la Ciencia, la Tecnología y la Innovación "Francisco José de Caldas"; Funders: Ministerio de Ciencia Tecnología e Innovación (Minciencias) https://minciencias.gov.co/ and Instituto de Investigación de Recursos Biológicos Alexander von Humboldt http://www.humboldt.org.co/; NO - Minciencias had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. Authors were affiliated with Humboldt Institute and had a role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. MG; Colombia BIO Agreement # FP 44842-109-2016 (IAvH No. 16-062); Colciencias https://minciencias.gov.co/node/1434 and Instituto de Investigación de Recursos Biológicos Alexander von Humboldt http://www.humboldt.org.co/ NO – Colciencias had no role in study design, and analysis, decision to publish, or preparation of the manuscript. Authors were affiliated with Humboldt Institute and had a role in study design, data collection and analysis, decision to publish, or preparation of the manuscript. SantanderBio inter-agency Agreement #2243 (IAvH No. 17-199) of the Santander government, sponsored with funds from the Sistema General de Regalías de Colombia, fund administration: Departamento Nacional de Planeación (BPIN 2017000100046), fund executor: Gobernación de Santander https://santander.gov.co/, funds operated by Instituto de Investigación de Recursos Biológicos Alexander von Humboldt and Industrial University of Santander. NO - The Santander government had no role in study design, and analysis, decision to publish, or preparation of the manuscript. Authors were affiliated with Humboldt Institute and had a role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.