Sequence-based prediction of type III secreted proteins

PLoS Pathog. 2009 Apr;5(4):e1000376. doi: 10.1371/journal.ppat.1000376. Epub 2009 Apr 24.

Abstract

The type III secretion system (TTSS) is a key mechanism for host cell interaction used by a variety of bacterial pathogens and symbionts of plants and animals including humans. The TTSS represents a molecular syringe with which the bacteria deliver effector proteins directly into the host cell cytosol. Despite the importance of the TTSS for bacterial pathogenesis, recognition and targeting of type III secreted proteins has up until now been poorly understood. Several hypotheses are discussed, including an mRNA-based signal, a chaperon-mediated process, or an N-terminal signal peptide. In this study, we systematically analyzed the amino acid composition and secondary structure of N-termini of 100 experimentally verified effector proteins. Based on this, we developed a machine-learning approach for the prediction of TTSS effector proteins, taking into account N-terminal sequence features such as frequencies of amino acids, short peptides, or residues with certain physico-chemical properties. The resulting computational model revealed a strong type III secretion signal in the N-terminus that can be used to detect effectors with sensitivity of approximately 71% and selectivity of approximately 85%. This signal seems to be taxonomically universal and conserved among animal pathogens and plant symbionts, since we could successfully detect effector proteins if the respective group was excluded from training. The application of our prediction approach to 739 complete bacterial and archaeal genome sequences resulted in the identification of between 0% and 12% putative TTSS effector proteins. Comparison of effector proteins with orthologs that are not secreted by the TTSS showed no clear pattern of signal acquisition by fusion, suggesting convergent evolutionary processes shaping the type III secretion signal. The newly developed program EffectiveT3 (http://www.chlamydiaedb.org) is the first universal in silico prediction program for the identification of novel TTSS effectors. Our findings will facilitate further studies on and improve our understanding of type III secretion and its role in pathogen-host interactions.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Amino Acid Sequence
  • Artificial Intelligence
  • Bacterial Proteins / chemistry
  • Bacterial Proteins / metabolism*
  • Chlamydia
  • Computational Biology / methods*
  • Conserved Sequence
  • Databases, Protein
  • Escherichia
  • Evolution, Molecular
  • Gram-Negative Bacteria / chemistry*
  • Protein Sorting Signals / genetics*
  • Protein Structure, Secondary
  • Salmonella
  • Yersinia

Substances

  • Bacterial Proteins
  • Protein Sorting Signals