Effective biologics require high specificity and limited off-target binding, but these properties are not guaranteed by current affinity-selection-based discovery methods. Molecular counterselection against off targets is a technique for identifying nonspecific sequences but is experimentally costly and can fail to eliminate a large fraction of nonspecific sequences. Here, we introduce computational counterselection, a framework for removing nonspecific sequences from pools of candidate biologics using machine learning models. We demonstrate the method using sequencing data from single-target affinity selection of antibodies, bypassing combinatorial experiments. We show that computational counterselection outperforms molecular counterselection by performing cross-target selection and individual binding assays to determine the performance of each method at retaining on-target, specific antibodies and identifying and eliminating off-target, nonspecific antibodies. Further, we show that one can identify generally polyspecific antibody sequences using a general model trained on affinity data from unrelated targets with potential affinity for a broad range of sequences.
Keywords: affinity selection; antibody discovery; biologics; counterselection; developability; machine learning; nonspecificity; polyspecificity; screening.
© 2022 The Authors.