Synthesis planning of new pharmaceutical compounds is a well-known bottleneck in modern drug design. Template-free methods, such as transformers, have recently been proposed as an alternative to template-based methods for single-step retrosynthetic predictions. Here, we trained and evaluated a transformer model, called the Chemformer, for retrosynthesis predictions within drug discovery. The proprietary data set used for training comprised ∼18 M reactions from literature, patents, and electronic lab notebooks. Chemformer was evaluated for the purpose of both single-step and multistep retrosynthesis. We found that the single-step performance of Chemformer was especially good on reaction classes common in drug discovery, with most reaction classes showing a top-10 round-trip accuracy above 0.97. Moreover, Chemformer reached a higher round-trip accuracy compared to that of a template-based model. By analyzing multistep retrosynthesis experiments, we observed that Chemformer found synthetic routes, leading to commercial starting materials for 95% of the target compounds, an increase of more than 20% compared to the template-based model on a proprietary compound data set. In addition to this, we discovered that Chemformer suggested novel disconnections corresponding to reaction templates, which are not included in the template-based model. These findings were further supported by a publicly available ChEMBL compound data set. The conclusions drawn from this work allow for the design of a synthesis planning tool where template-based and template-free models work in harmony to optimize retrosynthetic recommendations.