Objective: Gene expression profiles become increasingly more important for diagnostic procedures, allowing clinical predictions including treatment response and outcome. However, the establishment of specific and robust gene signatures from microarray data sets requires the analysis of large numbers of patients and the application of complex biostatistical algorithms. Especially in case of rare diseases and due to these constrains, diagnostic centers with limited access to patients or bioinformatic resources are excluded from implementing these new technologies.
Method: In our study we sought to overcome these limitations and for proof of principle, we analyzed the rare t(4;11) leukemia disease entity. First, gene expression data of each t(4;11) leukemia patient were normalized by pairwise subtraction against normal bone marrow (n = 3) to identify significantly deregulated gene sets for each patient.
Result: A 'core signature' of 186 commonly deregulated genes present in each investigated t(4;11) leukemia patient was defined. Linking the obtained gene sets to four biological discriminators (HOXA gene expression, age at diagnosis, fusion gene transcripts and chromosomal breakpoints) divided patients into two distinct subgroups: the first one comprised infant patients with low HOXA genes expression and the MLL breakpoints within introns 11/12. The second one comprised non-infant patients with high HOXA expression and MLL breakpoints within introns 9/10.
Conclusion: A yet homogeneous leukemia entity was further subdivided, based on distinct genetic properties. This approach provided a simplified way to obtain robust and disease-specific gene signatures even in smaller cohorts.