Secondary metabolites are compounds not essential for an organism's development, but provide significant ecological and physiological benefits. These compounds have applications in medicine, biotechnology and agriculture. Their production is encoded in biosynthetic gene clusters (BGCs), groups of genes collectively directing their biosynthesis. The advent of metagenomics has allowed researchers to study BGCs directly from environmental samples, identifying numerous previously unknown BGCs encoding unprecedented chemistry. Here, we present the BGC Atlas (https://bgc-atlas.cs.uni-tuebingen.de), a web resource that facilitates the exploration and analysis of BGC diversity in metagenomes. The BGC Atlas identifies and clusters BGCs from publicly available datasets, offering a centralized database and a web interface for metadata-aware exploration of BGCs and gene cluster families (GCFs). We analyzed over 35 000 datasets from MGnify, identifying nearly 1.8 million BGCs, which were clustered into GCFs. The analysis showed that ribosomally synthesized and post-translationally modified peptides are the most abundant compound class, with most GCFs exhibiting high environmental specificity. We believe that our tool will enable researchers to easily explore and analyze the BGC diversity in environmental samples, significantly enhancing our understanding of bacterial secondary metabolites, and promote the identification of ecological and evolutionary factors shaping the biosynthetic potential of microbial communities.
© The Author(s) 2024. Published by Oxford University Press on behalf of Nucleic Acids Research.