The non-pathogenic bacterium Mycobacterium smegmatis mc2155 has been widely used as a model organism in mycobacterial research, yet a detailed study about its transcription landscape remains to be established. Here we report the transcriptome, expression profiles and transcriptional structures through growth-phase-dependent RNA sequencing (RNA-seq) as well as other related experiments. We found: (1) 2,139 transcriptional start sites (TSSs) in the genome-wide scale, of which eight samples were randomly selected and further verified by 5'-RACE; (2) 2,233 independent monocistronic or polycistronic mRNAs in the transcriptome within the operon/sub-operon structures which are classified into five groups; (3) 47.50% (1016/2139) genes were transcribed into leaderless mRNAs, with the TSSs of 41.3% (883/2139) mRNAs overlapping with the first base of the annotated start codon. Initial amino acids of MSMEG_4921 and MSMEG_6422 proteins were identified by Edman degradation, indicating the presence of distinctive widespread leaderless features in M. smegmatis mc2155. (4) 150 genes with potentially wrong structural annotation, of which 124 proposed genes have been corrected; (5) eight highly active promoters, with their activities further determined by β-galactosidase assays. These data integrated the transcriptional landscape to genome information of model organism mc2155 and lay a solid foundation for further works in Mycobacterium.
Keywords: Mycobacterium smegmatis; gene structural re-annotation; highly active promoter; leaderless mRNA; operon; sub-operon; transcriptional start site; transcriptome.