BinderSpace: A package for sequence space analyses for datasets of affinity-selected oligonucleotides and peptide-based molecules

Payam Kelich; Huanhuan Zhao; Jose R Orona; Lela Vuković

doi:10.1002/jcc.27130

BinderSpace: A package for sequence space analyses for datasets of affinity-selected oligonucleotides and peptide-based molecules

J Comput Chem. 2023 Aug 15;44(22):1836-1844. doi: 10.1002/jcc.27130. Epub 2023 May 12.

Authors

Payam Kelich¹, Huanhuan Zhao², Jose R Orona³, Lela Vuković¹

Affiliations

¹ Department of Chemistry and Biochemistry, University of Texas at El Paso, El Paso, Texas, USA.
² Bioinformatics Program, University of Texas at El Paso, El Paso, Texas, USA.
³ Department of Biological Sciences, University of Texas at El Paso, El Paso, Texas, USA.

PMID: 37177839
DOI: 10.1002/jcc.27130

Abstract

Discovery of target-binding molecules, such as aptamers and peptides, is usually performed with the use of high-throughput experimental screening methods. These methods typically generate large datasets of sequences of target-binding molecules, which can be enriched with high affinity binders. However, the identification of the highest affinity binders from these large datasets often requires additional low-throughput experiments or other approaches. Bioinformatics-based analyses could be helpful to better understand these large datasets and identify the parts of the sequence space enriched with high affinity binders. BinderSpace is an open-source Python package that performs motif analysis, sequence space visualization, clustering analyses, and sequence extraction from clusters of interest. The motif analysis, resulting in text-based and visual output of motifs, can also provide heat maps of previously measured user-defined functional properties for all the motif-containing molecules. Users can also run principal component analysis (PCA) and t-distributed stochastic neighbor embedding (t-SNE) analyses on whole datasets and on motif-related subsets of the data. Functionally important sequences can also be highlighted in the resulting PCA and t-SNE maps. If points (sequences) in two-dimensional maps in PCA or t-SNE space form clusters, users can perform clustering analyses on their data, and extract sequences from clusters of interest. We demonstrate the use of BinderSpace on a dataset of oligonucleotides binding to single-wall carbon nanotubes in the presence and absence of a bioanalyte, and on a dataset of cyclic peptidomimetics binding to bovine carbonic anhydrase protein. BinderSpace is openly accessible to the public via the GitHub website: https://github.com/vukoviclab/BinderSpace.

Keywords: affinity selection datasets; clustering analysis; dimensionality reduction; high affinity binding; sequence motif analysis; sequence space.

Publication types

Research Support, U.S. Gov't, Non-P.H.S.

MeSH terms

Algorithms
Animals
Cattle
Computational Biology
Nanotubes, Carbon*
Oligonucleotides*
Peptides
Sequence Analysis

Substances

Oligonucleotides
Nanotubes, Carbon
Peptides

Grants and funding

NSF 2106587