The specific recognition of miRNAs by Argonaute (AGO) proteins, the effector proteins of the RNA-induced silencing complex, constitutes the final step of the biogenesis of miRNAs and is crucial for their target interaction. In the genome of Arabidopsis thaliana (Ath), 10 different AGO proteins are encoded and the sorting decision, which miRNA associates with which AGO protein, was reported to depend exclusively on the identity of the 5'-sequence position of mature miRNAs. Hence, with only four different bases possible, a 5'-position-only sorting signal would not suffice to specifically target all 10 different AGOs individually or would suggest redundant AGO action. Alternatively, other and as of yet unidentified sorting signals may exist. We analyzed a dataset comprising 117 Ath-miRNAs with clear sorting preference to either AGO1, AGO2, or AGO5 as identified in co-immunoprecipitation experiments combined with sequencing. While mutual information analysis did not identify any other single position but the 5'-nucleotide to be informative for the sorting at sufficient statistical significance, significantly better than random classification results using Random Forests nonetheless suggest that additional positions and combinations thereof also carry information with regard to the AGO sorting. Positions 2, 6, 9, and 13 appear to be of particular importance. Furthermore, uracil bases at defined positions appear to be important for the sorting to AGO2 and AGO5, in particular. No predictive value was associated with miRNA length or base pair binding pattern in the miRNA:miRNA* duplex. From inspecting available AGO gene expression data in Arabidopsis, we conclude that the temporal and spatial expression profile may also contribute to the fine-tuning of miRNA sorting and function.
Keywords: Arabidopsis thaliana; Argonaute proteins; RNA-protein interaction; machine learning; miRNA; mutual information; random forests; sorting.