Modified DNA bases in mammalian genomes, such as 5-methylcytosine ((5m)C) and its oxidized forms, are implicated in important epigenetic regulation processes. In human or mouse, successive enzymatic conversion of (5m)C to its oxidized forms is carried out by the ten-eleven translocation (TET) proteins. Previously we reported the structure of a TET-like (5m)C oxygenase (NgTET1) from Naegleria gruberi, a single-celled protist evolutionarily distant from vertebrates. Here we show that NgTET1 is a 5-methylpyrimidine oxygenase, with activity on both (5m)C (major activity) and thymidine (T) (minor activity) in all DNA forms tested, and provide unprecedented evidence for the formation of 5-formyluridine ((5f)U) and 5-carboxyuridine ((5ca)U) in vitro. Mutagenesis studies reveal a delicate balance between choice of (5m)C or T as the preferred substrate. Furthermore, our results suggest substrate preference by NgTET1 to (5m)CpG and TpG dinucleotide sites in DNA. Intriguingly, NgTET1 displays higher T-oxidation activity in vitro than mammalian TET1, supporting a closer evolutionary relationship between NgTET1 and the base J-binding proteins from trypanosomes. Finally, we demonstrate that NgTET1 can be readily used as a tool in (5m)C sequencing technologies such as single molecule, real-time sequencing to map (5m)C in bacterial genomes at base resolution.
Keywords: 5-methylcytosine; NgTET1; SMRT sequencing; TET proteins; bacterial methylome.