Motivation: Recent flow and mass cytometers generate datasets of dimensions 20 to 40 and a million single cells. From these, many tools facilitate the discovery of new cell populations associated with diseases or physiology. These new cell populations require the identification of new gating strategies, but gating strategies become exponentially more difficult to optimize when dimensionality increases. To facilitate this step, we developed Hypergate, an algorithm which given a cell population of interest identifies a gating strategy optimized for high yield and purity.
Results: Hypergate achieves higher yield and purity than human experts, Support Vector Machines and Random-Forests on public datasets. We use it to revisit some established gating strategies for the identification of innate lymphoid cells, which identifies concise and efficient strategies that allow gating these cells with fewer parameters but higher yield and purity than the current standards. For phenotypic description, Hypergate's outputs are consistent with fields' knowledge and sparser than those from a competing method.
Availability and implementation: Hypergate is implemented in R and available on CRAN. The source code is published at http://github.com/ebecht/hypergate under an Open Source Initiative-compliant licence.
Supplementary information: Supplementary data are available at Bioinformatics online.