Our inability to predict the behavior of biological systems severely hampers progress in bioengineering and biomedical applications. We cannot predict the effect of genotype changes on phenotype, nor extrapolate the large-scale behavior from small-scale experiments. Machine learning techniques recently reached a new level of maturity, and are capable of providing the needed predictive power without a detailed mechanistic understanding. However, they require large amounts of data to be trained. The amount and quality of data required can only be produced through a combination of synthetic biology and automation, so as to generate a large diversity of biological systems with high reproducibility. A sustained investment in the intersection of synthetic biology, machine learning, and automation will drive forward predictive biology, and produce improved machine learning algorithms.