The major histocompatibility complex (MHC) is a term for all gene groups of a major histocompatibility antigen. It binds to peptide chains derived from pathogens and displays pathogens on the cell surface to facilitate T-cell recognition and perform a series of immune functions. MHC molecules are critical in transplantation, autoimmunity, infection, and tumor immunotherapy. Combining machine learning algorithms and making full use of bioinformatics analysis technology, more accurate recognition of MHC is an important task. The paper proposed a new MHC recognition method compared with traditional biological methods and used the built classifier to classify and identify MHC I and MHC II. The classifier used the SVMProt 188D, bag-of-ngrams (BonG), and information theory (IT) mixed feature representation methods and used the extreme learning machine (ELM), which selects lin-kernel as the activation function and used 10-fold cross-validation and the independent test set validation to verify the accuracy of the constructed classifier and simultaneously identify the MHC and identify the MHC I and MHC II, respectively. Through the 10-fold cross-validation, the proposed algorithm obtained 91.66% accuracy when identifying MHC and 94.442% accuracy when identifying MHC categories. Furthermore, an online identification Web site named ELM-MHC was constructed with the following URL: http://server.malab.cn/ELM-MHC/ .
Keywords: MHC I; MHC II; extreme learning machine; identification; machine learning; major histocompatibility complex.