Genome-wide expression for diagnosis of pulmonary tuberculosis: a multicohort analysis

Lancet Respir Med. 2016 Mar;4(3):213-24. doi: 10.1016/S2213-2600(16)00048-5. Epub 2016 Feb 20.

Abstract

Background: Active pulmonary tuberculosis is difficult to diagnose and treatment response is difficult to effectively monitor. A WHO consensus statement has called for new non-sputum diagnostics. The aim of this study was to use an integrated multicohort analysis of samples from publically available datasets to derive a diagnostic gene set in the peripheral blood of patients with active tuberculosis.

Methods: We searched two public gene expression microarray repositories and retained datasets that examined clinical cohorts of active pulmonary tuberculosis infection in whole blood. We compared gene expression in patients with either latent tuberculosis or other diseases versus patients with active tuberculosis using our validated multicohort analysis framework. Three datasets were used as discovery datasets and meta-analytical methods were used to assess gene effects in these cohorts. We then validated the diagnostic capacity of the three gene set in the remaining 11 datasets.

Findings: A total of 14 datasets containing 2572 samples from 10 countries from both adult and paediatric patients were included in the analysis. Of these, three datasets (N=1023) were used to discover a set of three genes (GBP5, DUSP3, and KLF2) that are highly diagnostic for active tuberculosis. We validated the diagnostic power of the three gene set to separate active tuberculosis from healthy controls (global area under the ROC curve (AUC) 0·90 [95% CI 0·85-0·95]), latent tuberculosis (0·88 [0·84-0·92]), and other diseases (0·84 [0·80-0·95]) in eight independent datasets composed of both children and adults from ten countries. Expression of the three-gene set was not confounded by HIV infection status, bacterial drug resistance, or BCG vaccination. Furthermore, in four additional cohorts, we showed that the tuberculosis score declined during treatment of patients with active tuberculosis.

Interpretation: Overall, our integrated multicohort analysis yielded a three-gene set in whole blood that is robustly diagnostic for active tuberculosis, that was validated in multiple independent cohorts, and that has potential clinical application for diagnosis and monitoring treatment response. Prospective laboratory validation will be required before it can be used in a clinical setting.

Funding: National Institute of Allergy and Infectious Diseases, National Library of Medicine, the Stanford Child Health Research Institute, the Society for University Surgeons, and the Bill and Melinda Gates Foundation.

Publication types

  • Meta-Analysis
  • Research Support, N.I.H., Extramural
  • Research Support, Non-U.S. Gov't
  • Validation Study

MeSH terms

  • Area Under Curve
  • Cohort Studies
  • Databases, Genetic
  • Datasets as Topic
  • Dual Specificity Phosphatase 3 / blood
  • Dual Specificity Phosphatase 3 / genetics
  • GTP-Binding Proteins / blood
  • GTP-Binding Proteins / genetics
  • Gene Expression*
  • Genome-Wide Association Study
  • Humans
  • Kruppel-Like Transcription Factors / blood
  • Kruppel-Like Transcription Factors / genetics
  • Latent Tuberculosis / blood
  • Latent Tuberculosis / diagnosis
  • Latent Tuberculosis / genetics
  • Oligonucleotide Array Sequence Analysis
  • ROC Curve
  • Tuberculosis, Pulmonary / blood
  • Tuberculosis, Pulmonary / diagnosis*
  • Tuberculosis, Pulmonary / genetics*

Substances

  • GBP5 protein, human
  • KLF2 protein, human
  • Kruppel-Like Transcription Factors
  • DUSP3 protein, human
  • Dual Specificity Phosphatase 3
  • GTP-Binding Proteins