We studied intraobserver reproducibility in recognizing the presence or absence of 57 histologic feature or patterns in a random subset of tumors (822) from the Childhood Brain Tumor Consortium database. The study protocol maximized consistency of the observer. We found that only six histologic features had high (> or = 0.75) reliability estimates while a large number had intermediate estimates of 0.50-0.74. Supratentorial or infratentorial tumor location sometimes altered reliability. Reliability estimates were unacceptable for certain histologic features often used as diagnostic criteria, descriptors of tumor characteristics, or markers of anaplasia. We hypothesize that low reliability reflects, in part, the need for more specific operational definitions, particularly those with subjective boundaries (e.g. granular bodies) may also contribute to low reliability. We also show that the kappa statistic, a commonly used measure of reliability, is inappropriate for very common or uncommon histologic features (e.g. features at the extremes of prevalence in the study cases) and we offer a simple empiric method for determining when an alternative measure, the Jaccard statistic, is appropriate.