Accurate genome annotation, the foundation of life science research in the genome era, is hampered by limited known gene models, nonstandard start codons, and the limited homology of annotated genes in other organisms. LysargiNase mirrors trypsin at the cleavage sites, providing the opportunity to identify peptides other than tryptic peptides. In this study, we used an in-house developed acetylated LysargiNase (Ac-LysargiNase) with higher activity and stability in non-pathogenic Mycolicibacterium smegmatis MC2 155 to supplement the widely used trypsin in proteomic studies. We identified 27,582 peptides from 3844 annotated proteins and 332 novel genome search-specific peptides (GSSPs). Among these GSSPs, 88 peptides were annotated in another M.smegmatis genome database, and 41 were verified as novel peptides by predicted theoretical spectra and their corresponding 15N-labeling spectra. Further analysis revealed that 17 verified GSSPs corrected the N-terminus of the 13 annotated genes. The other 24 verified GSSPs helped identify 17 novel open reading frames (ORFs) missed in previously annotated M. smegmatis genomes. Among these novel ORFs, four relatively small proteins with amino acid residues less than 100 and three were precisely identified with C-terminal peptides. Ac-LysargiNase helps with genome reannotation by identifying new genes and events in proteogenomic studies. SIGNIFICANCE: Correct genomic annotation is vital in the field of life sciences. The nonstandard start codons seriously affect the confirmation of the translation initiation sites (TISs) of an open reading frame (ORF), and unknown structural genes are easily missed in automated gene prediction. Although proteogenomics presents new avenues for validating gene expression and gene structure refinement based on conventional tryptic peptides, determining the TISs and potential encoding genes is complicated. Thus, validation of TISs and encoding ORFs is crucial and urgent. Therefore, we recommend Ac-LysargiNase, a mirror enzyme of trypsin that can identify additional novel peptides for N-terminal correction and ORF identification.
Keywords: Ac-LysargiNase; Genome annotation; Mycolicibacterium smegmatis MC(2) 155; Novel peptides; Proteogenomic.
Copyright © 2022 Elsevier B.V. All rights reserved.