Data-driven pipeline modeling for predicting unknown protein adulteration in dairy products

Food Chem. 2024 Dec 31:471:142736. doi: 10.1016/j.foodchem.2024.142736. Online ahead of print.

Abstract

To preemptively predict unknown protein adulterants in food and curb the incidence of food fraud at its origin, data-driven models were developed using three machine learning (ML) algorithms. Among these, the random forest (RF)-based model achieved optimal performance, achieving accuracies of 96.2 %, 95.1 %, and 88.0 % in identifying odorless, tasteless, and colorless adulterants, respectively. These optimal models are then applied to implement external prediction, ultimately predicting 51 potential adulterants. From these, two cost-effective candidates were selected for adulteration tests. While there was no significant sensory difference between adulterated and unadulterated milk powder, the protein content in the adulterated milk powder increased. This study offers a proactive strategy to combat food fraud effectively.

Keywords: Machine learning; Potential adulterants; Prediction; Protein fraud.