Visualizing population structure with variational autoencoders

C J Battey; Gabrielle C Coffing; Andrew D Kern

doi:10.1093/g3journal/jkaa036

Visualizing population structure with variational autoencoders

G3 (Bethesda). 2021 Jan 18;11(1):jkaa036. doi: 10.1093/g3journal/jkaa036.

Authors

C J Battey¹, Gabrielle C Coffing¹, Andrew D Kern¹

Affiliation

¹ Department of Biology, University of Oregon Institute of Ecology and Evolution, Eugene, Oregon, 97403.

Abstract

Dimensionality reduction is a common tool for visualization and inference of population structure from genotypes, but popular methods either return too many dimensions for easy plotting (PCA) or fail to preserve global geometry (t-SNE and UMAP). Here we explore the utility of variational autoencoders (VAEs)-generative machine learning models in which a pair of neural networks seek to first compress and then recreate the input data-for visualizing population genetic variation. VAEs incorporate nonlinear relationships, allow users to define the dimensionality of the latent space, and in our tests preserve global geometry better than t-SNE and UMAP. Our implementation, which we call popvae, is available as a command-line python program at github.com/kr-colab/popvae. The approach yields latent embeddings that capture subtle aspects of population structure in humans and Anopheles mosquitoes, and can generate artificial genotypes characteristic of a given sample or population.

Keywords: data visualization; deep learning; machine learning; neural network; pca; population genetics; population structure; variational autoencoder.

Publication types

Research Support, N.I.H., Extramural

MeSH terms

Humans
Machine Learning*
Neural Networks, Computer*

Associated data

figshare/10.25387/g3.13311539

Abstract

Publication types

MeSH terms

Associated data

Grants and funding