Alignment-free ultra-high-throughput comparison of druggable protein-ligand binding sites

J Chem Inf Model. 2010 Jan;50(1):123-35. doi: 10.1021/ci900349y.

Abstract

Inferring the biological function of a protein from its three-dimensional structure as well as explaining why a drug may bind to various targets is of crucial importance to modern drug discovery. Here we present a generic 4833-integer vector describing druggable protein-ligand binding sites that can be applied to any protein and any binding cavity. The fingerprint registers counts of pharmacophoric triplets from the Calpha atomic coordinates of binding-site-lining residues. Starting from a customized data set of diverse protein-ligand binding site pairs, the most appropriate metric and a similarity threshold could be defined for similar binding sites. The method (FuzCav) has been used in various scenarios: (i) screening a collection of 6000 binding sites for similarity to different queries; (ii) classifying protein families (serine endopeptidases, protein kinases) by binding site diversity; (iii) discriminating adenine-binding cavities from decoys. The fingerprint generation and comparison supports ultra-high throughput (ca. 1000 measures/s), does not require prior alignment of protein binding sites, and is able to detect local similarity among subpockets. It is thus particularly well suited to the functional annotation of novel genomic structures with low sequence identity to known X-ray templates.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Adenosine Triphosphate / chemistry
  • Adenosine Triphosphate / metabolism
  • Algorithms
  • Binding Sites
  • Databases, Protein
  • Drug Evaluation, Preclinical / methods*
  • High-Throughput Screening Assays / methods*
  • Humans
  • Ligands
  • Models, Molecular
  • Pharmaceutical Preparations / chemistry
  • Pharmaceutical Preparations / metabolism*
  • Protein Binding
  • Protein Conformation
  • Proteins / chemistry*
  • Proteins / metabolism*
  • Serine Endopeptidases / chemistry
  • Serine Endopeptidases / metabolism
  • Time Factors

Substances

  • Ligands
  • Pharmaceutical Preparations
  • Proteins
  • Adenosine Triphosphate
  • Serine Endopeptidases