Functional assignment of the 20 S proteasome from Trypanosoma brucei using mass spectrometry and new bioinformatics approaches

J Biol Chem. 2001 Jul 27;276(30):28327-39. doi: 10.1074/jbc.M008342200. Epub 2001 Apr 17.

Abstract

As experimental technologies for characterization of proteomes emerge, bioinformatic analysis of the data becomes essential. Separation and identification technologies currently based on two-dimensional gels/mass spectrometry provide the inherent analytical power required. This strategy involves protein spot digestion and accurate mass mapping together with computational interrogation of available data bases for protein functional identification. When either no exact match is found or when the possible matches only partially account for molecular weights actually observed, peptide sequencing by tandem mass spectrometry has emerged as the methodology of choice to provide the basic additional information required. To evaluate the capabilities of bioinformatics methods employed for identifying homologs of a protein of interest, we attempted to identify the major proteins from the 20 S proteasome of Trypanosoma brucei using sequence information determined using mass spectrometry. The results suggest that neither the traditional query engines, BLAST and FASTA, nor specialized software developed for analysis of sequence information obtained by mass spectrometry are able to identify even closely related sequences at statistically significant scores. To address this deficit, new bioinformatics approaches were developed for concomitant use of the multiple fragments of short sequence typically available from methods of tandem mass spectrometry. These approaches rely on the occurrence of congruence across searches of multiple fragments from a single protein. This method resulted in sharply better statistical significance values for correct hits in the data base output relative to that achieved for independent searches using single sequence fragments.

Publication types

  • Research Support, U.S. Gov't, P.H.S.

MeSH terms

  • Algorithms
  • Amino Acid Sequence
  • Animals
  • Computational Biology / methods*
  • Cysteine Endopeptidases / chemistry*
  • Databases, Factual
  • Electrophoresis, Gel, Two-Dimensional
  • Molecular Sequence Data
  • Multienzyme Complexes / chemistry*
  • Peptides / chemistry
  • Proteasome Endopeptidase Complex
  • Sequence Homology, Amino Acid
  • Software
  • Spectrometry, Mass, Matrix-Assisted Laser Desorption-Ionization / methods*
  • Trypanosoma brucei brucei / chemistry*
  • Trypsin / metabolism

Substances

  • Multienzyme Complexes
  • Peptides
  • Trypsin
  • Cysteine Endopeptidases
  • Proteasome Endopeptidase Complex

Associated data

  • GENBANK/AF140353
  • GENBANK/AF148124
  • GENBANK/AF148125
  • GENBANK/AF169651
  • GENBANK/AF169652
  • GENBANK/AF169653
  • GENBANK/AF198386
  • GENBANK/AF198387
  • GENBANK/AF226673
  • GENBANK/AF226674
  • GENBANK/AF290945
  • GENBANK/AJ130820
  • GENBANK/AJ131043
  • GENBANK/AJ131148