Gene and protein sequence features augment HLA class I ligand predictions

Cell Rep. 2024 Jun 25;43(6):114325. doi: 10.1016/j.celrep.2024.114325. Epub 2024 Jun 11.

Abstract

The sensitivity of malignant tissues to T cell-based immunotherapies depends on the presence of targetable human leukocyte antigen (HLA) class I ligands. Peptide-intrinsic factors, such as HLA class I affinity and proteasomal processing, have been established as determinants of HLA ligand presentation. However, the role of gene and protein sequence features as determinants of epitope presentation has not been systematically evaluated. We perform HLA ligandome mass spectrometry to evaluate the contribution of 7,135 gene and protein sequence features to HLA sampling. This analysis reveals that a number of predicted modifiers of mRNA and protein abundance and turnover, including predicted mRNA methylation and protein ubiquitination sites, inform on the presence of HLA ligands. Importantly, integration of such "hard-coded" sequence features into a machine learning approach augments HLA ligand predictions to a comparable degree as experimental measures of gene expression. Our study highlights the value of gene and protein features for HLA ligand predictions.

Keywords: CP: Immunology; HLA class I; HLA ligand predictions; HLA ligandome; XGBoost; antigen presentation; epitope prediction; epitopes; machine learning.

MeSH terms

  • Amino Acid Sequence
  • Histocompatibility Antigens Class I* / genetics
  • Histocompatibility Antigens Class I* / metabolism
  • Humans
  • Ligands
  • Machine Learning
  • Peptides / chemistry
  • Peptides / metabolism
  • RNA, Messenger / genetics
  • RNA, Messenger / metabolism

Substances

  • Ligands
  • Histocompatibility Antigens Class I
  • RNA, Messenger
  • Peptides