Representations of lipid nanoparticles using large language models for transfection efficiency prediction

Bioinformatics. 2024 Jul 1;40(7):btae342. doi: 10.1093/bioinformatics/btae342.

Abstract

Motivation: Lipid nanoparticles (LNPs) are the most widely used vehicles for mRNA vaccine delivery. The structure of the lipids composing the LNPs can have a major impact on the effectiveness of the mRNA payload. Several properties should be optimized to improve delivery and expression including biodegradability, synthetic accessibility, and transfection efficiency.

Results: To optimize LNPs, we developed and tested models that enable the virtual screening of LNPs with high transfection efficiency. Our best method uses the lipid Simplified Molecular-Input Line-Entry System (SMILES) as inputs to a large language model. Large language model-generated embeddings are then used by a downstream gradient-boosting classifier. As we show, our method can more accurately predict lipid properties, which could lead to higher efficiency and reduced experimental time and costs.

Availability and implementation: Code and data links available at: https://github.com/Sanofi-Public/LipoBART.

Publication types

  • Research Support, Non-U.S. Gov't

MeSH terms

  • Lipids* / chemistry
  • Liposomes
  • Nanoparticles* / chemistry
  • RNA, Messenger / metabolism
  • Transfection* / methods

Substances

  • Lipids
  • RNA, Messenger
  • Lipid Nanoparticles
  • Liposomes

Grants and funding