Cross-species inference of long non-coding RNAs greatly expands the ruminant transcriptome

Genet Sel Evol. 2018 Apr 24;50(1):20. doi: 10.1186/s12711-018-0391-0.

Abstract

Background: mRNA-like long non-coding RNAs (lncRNAs) are a significant component of mammalian transcriptomes, although most are expressed only at low levels, with high tissue-specificity and/or at specific developmental stages. Thus, in many cases lncRNA detection by RNA-sequencing (RNA-seq) is compromised by stochastic sampling. To account for this and create a catalogue of ruminant lncRNAs, we compared de novo assembled lncRNAs derived from large RNA-seq datasets in transcriptional atlas projects for sheep and goats with previous lncRNAs assembled in cattle and human. We then combined the novel lncRNAs with the sheep transcriptional atlas to identify co-regulated sets of protein-coding and non-coding loci.

Results: Few lncRNAs could be reproducibly assembled from a single dataset, even with deep sequencing of the same tissues from multiple animals. Furthermore, there was little sequence overlap between lncRNAs that were assembled from pooled RNA-seq data. We combined positional conservation (synteny) with cross-species mapping of candidate lncRNAs to identify a consensus set of ruminant lncRNAs and then used the RNA-seq data to demonstrate detectable and reproducible expression in each species. In sheep, 20 to 30% of lncRNAs were located close to protein-coding genes with which they are strongly co-expressed, which is consistent with the evolutionary origin of some ncRNAs in enhancer sequences. Nevertheless, most of the lncRNAs are not co-expressed with neighbouring protein-coding genes.

Conclusions: Alongside substantially expanding the ruminant lncRNA repertoire, the outcomes of our analysis demonstrate that stochastic sampling can be partly overcome by combining RNA-seq datasets from related species. This has practical implications for the future discovery of lncRNAs in other species.

Publication types

  • Comparative Study
  • Research Support, Non-U.S. Gov't

MeSH terms

  • Animals
  • Cattle
  • Chromosome Mapping / veterinary
  • Databases, Genetic
  • Gene Expression Profiling / veterinary*
  • Gene Expression Regulation
  • Gene Regulatory Networks
  • Goats / genetics
  • High-Throughput Nucleotide Sequencing / veterinary*
  • Humans
  • Molecular Sequence Annotation
  • Organ Specificity
  • RNA, Long Noncoding / genetics*
  • Sequence Analysis, RNA / veterinary*
  • Sheep / genetics*
  • Synteny

Substances

  • RNA, Long Noncoding