Moving beyond MARCO

PLoS One. 2023 Mar 24;18(3):e0283124. doi: 10.1371/journal.pone.0283124. eCollection 2023.

Abstract

The use of imaging systems in protein crystallisation means that the experimental setups no longer require manual inspection to determine the outcome of the trials. However, it leads to the problem of how best to find images which contain useful information about the crystallisation experiments. The adoption of a deeplearning approach in 2018 enabled a four-class machine classification system of the images to exceed human accuracy for the first time. Underpinning this was the creation of a labelled training set which came from a consortium of several different laboratories. The MARCO classification model does not have the same accuracy on local data as it does on images from the original test set; this can be somewhat mitigated by retraining the ML model and including local images. We have characterized the image data used in the original MARCO model, and performed extensive experiments to identify training settings most likely to enhance the local performance of a MARCO-dataset based ML classification model.

MeSH terms

  • Crystallization*
  • Machine Learning
  • Proteins* / chemistry

Substances

  • Proteins

Grants and funding

The author(s) received no specific funding for this work.