CLCAs - a family of metalloproteases of intriguing phylogenetic distribution and with cases of substituted catalytic sites

PLoS One. 2013 May 9;8(5):e62272. doi: 10.1371/journal.pone.0062272. Print 2013.

Abstract

The zinc-dependent metalloproteases with His-Glu-x-x-His (HExxH) active site motif, zincins, are a broad group of proteins involved in many metabolic and regulatory functions, and found in all forms of life. Human genome contains more than 100 genes encoding proteins with known zincin-like domains. A survey of all proteins containing the HExxH motif shows that approximately 52% of HExxH occurrences fall within known protein structural domains (as defined in the Pfam database). Domain families with majority of members possessing a conserved HExxH motif include, not surprisingly, many known and putative metalloproteases. Furthermore, several HExxH-containing protein domains thus identified can be confidently predicted to be putative peptidases of zincin fold. Thus, we predict zincin-like fold for eight uncharacterised Pfam families. Besides the domains with the HExxH motif strictly conserved, and those with sporadic occurrences, intermediate families are identified that contain some members with a conserved HExxH motif, but also many homologues with substitutions at the conserved positions. Such substitutions can be evolutionarily conserved and non-random, yet functional roles of these inactive zincins are not known. The CLCAs are a novel zincin-like protease family with many cases of substituted active sites. We show that this allegedly metazoan family has a number of bacterial and archaeal members. An extremely patchy phylogenetic distribution of CLCAs in prokaryotes and their conserved protein domain composition strongly suggests an evolutionary scenario of horizontal gene transfer (HGT) from multicellular eukaryotes to bacteria, providing an example of eukaryote-derived xenologues in bacterial genomes. Additionally, in a protein family identified here as closely homologous to CLCA, the CLCA_X (CLCA-like) family, a number of proteins is found in phages and plasmids, supporting the HGT scenario.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Motifs / genetics
  • Amino Acid Sequence
  • Archaeal Proteins / genetics
  • Bacterial Proteins / genetics
  • Databases, Protein
  • Genome, Human / genetics*
  • Humans
  • Metalloproteases / classification
  • Metalloproteases / genetics*
  • Molecular Sequence Data
  • Multigene Family / genetics*
  • Phylogeny*
  • Sequence Homology, Amino Acid

Substances

  • Archaeal Proteins
  • Bacterial Proteins
  • Metalloproteases

Grants and funding

AL, MD, MG and KP were supported by the grant N N301 3165 33 from the Ministry of of Science and Higher Education, Republic of Poland. The funders had no role in study design, data collection and analysis, decision to publish, or preparation of the manuscript.