Computational analysis of protein tyrosine phosphatases: practical guide to bioinformatics and data resources

Methods. 2005 Jan;35(1):90-114. doi: 10.1016/j.ymeth.2004.07.012.

Abstract

The exponential growth of sequence data has become a challenge to database curators and end-users alike and biologists seeking to utilize the data effectively are faced with numerous analysis methods. Here, with practical examples from our bioinformatics analysis of the protein tyrosine phosphatases (PTPs), we show how computational analysis can be exploited to fuel hypothesis-driven experimental research through the exploration of online databases. We cover the following elements: (i) similarity searches and strategies to collect a non-redundant database of tyrosine-specific PTP domains; (ii) utilization of this database to classify human, fly, and worm PTPs (based on alignments and phylogenetic analysis); (iii) three-dimensional structural analysis to identify conserved regions (structure-function) and non-conserved selectivity-determining regions (substrate specificity); and (iv) genomic analysis, including mapping of exon structure, identification of pseudogenes, and exploration of disease databases. We discuss the importance of manual curation, illustrating examples in which pseudogenes give rise to predicted proteins in GenBank and note that domain servers, such as PFAM and SMART, erroneously include dual-specificity and lipid phosphatases in their collection of tyrosine-specific PTPs. To capitalize on our annotated set of 402 PTP domains (from 47 species and five phyla), we identify sequence conservation across taxonomic categories and explore structure-function relationships among tandem domain receptor-like PTPs. We define three Src homology 2 domain-containing PTP genes in stingray, zebrafish, and fugu and speculate on their evolutionary relationship with human pseudogenes. Our annotated sequences, along with a web service for phylogenetic classification of PTP domains, are available online (http://ptp.cshl.edu and http://science.novonordisk.com/ptp).

Publication types

  • Research Support, U.S. Gov't, P.H.S.
  • Review

MeSH terms

  • Amino Acid Sequence
  • Animals
  • Computational Biology / methods*
  • Databases, Protein
  • Genome, Human
  • Humans
  • Protein Structure, Tertiary
  • Protein Tyrosine Phosphatases / genetics*
  • Sequence Analysis, Protein
  • Sequence Homology

Substances

  • Protein Tyrosine Phosphatases