Summary The molecular aetiology of polycythaemia vera (PV) remains unknown and the differential diagnosis between PV and secondary erythrocytosis (SE) can be challenging. Gene expression profiling can identify candidates involved in the pathophysiology of PV and generate a molecular signature to aid in diagnosis. We thus performed cDNA microarray analysis on 40 PV and 12 SE patients. Two independent data sets were obtained: using a two-step training/validation design, a set of 64 genes (class predictors) was determined, which correctly discriminated PV from SE patients. Separately 253 genes were identified to be upregulated and 391 downregulated more than 1.5-fold in PV compared with healthy controls (P < 0.01). Of the genes overexpressed in PV, 27 contained Sp1 sites: we therefore propose that altered activity of Sp1-like transcription factors may contribute to the molecular aetiology of PV. One Sp1 target, the transcription factor NF-E2 [nuclear factor (erythroid-derived 2)], is overexpressed 2- to 40-fold in PV patients. In PV bone marrow, NF-E2 is overexpressed in megakaryocytes, erythroid and granulocytic precursors. It has been shown that overexpression of NF-E2 leads to the development of erythropoietin-independent erythroid colonies and that ectopic NF-E2 expression can reprogram monocytic cells towards erythroid and megakaryocytic differentiation. Transcription factor concentration may thus control lineage commitment. We therefore propose that elevated concentrations of NF-E2 in PV patients lead to an overproduction of erythroid and, in some patients, megakaryocytic cells/platelets. In this model, the level of NF-E2 overexpression determines both the severity of erythrocytosis and the concurrent presence or absence of thrombocytosis.