Developing highly efficient non-viral gene delivery reagents is still difficult for many hard-to-transfect cell types and, to date, has mostly been conducted via brute force screening routines. High throughput in silico methods of evaluating biomaterials can enable accelerated optimization and development of devices or therapeutics by exploring large chemical design spaces quickly and at low cost. This work reports application of state-of-the-art machine learning algorithms to a dataset of synthetic biodegradable polymers, poly(beta-amino ester)s (PBAEs), which have shown exciting promise for therapeutic gene delivery in vitro and in vivo. The data set includes polymer properties as inputs as well as polymeric nanoparticle transfection performance and nanoparticle toxicity in a range of cells as outputs. This data was used to train and evaluate several state-of-the-art machine learning algorithms for their ability to predict transfection and understand structure-function relationships. By developing an encoding scheme for vectorizing the structure of a PBAE polymer in a machine-readable format, we demonstrate that a random forest model can satisfactorily predict DNA transfection in vitro based on the chemical structure of the constituent PBAE polymer in a cell line dependent manner. Based on the model, we synthesized PBAE polymers and used them to form polymeric gene delivery nanoparticles that were predicted in silico to be successful. We validated the computational predictions in two cell lines in vitro, RAW 264.7 macrophages and Hep3B liver cancer cells, and found that the Spearman's R correlation between predicted and experimental transfection was 0.57 and 0.66 respectively. Thus, a computational approach that encoded chemical descriptors of polymers was able to demonstrate that in silico computational screening of polymeric nanomedicine compositions had utility in predicting de novo biological experiments. STATEMENT OF SIGNIFICANCE: Developing highly efficient non-viral gene delivery reagents is difficult for many hard-to-transfect cell types and, to date, has mostly been explored via brute force screening routines. High throughput in silico methods of evaluating biomaterials can enable accelerated optimization and development for therapeutic or biomanufacturing purposes by exploring large chemical design spaces quickly and at low cost. This work reports application of state-of-the-art machine learning algorithms to a large compiled PBAE DNA gene delivery nanoparticle dataset across many cell types to develop predictive models for transfection and nanoparticle cytotoxicity. We develop a novel computational pipeline to encode PBAE nanoparticles with chemical descriptors and demonstrate utility in a de novo experimental context.
Keywords: Computational; Gene delivery; Library; Machine learning; Nanoparticle; Poly(beta-amino ester); Polymer.
Copyright © 2022. Published by Elsevier Ltd.