The predictive abilities of two-group classification models (CMs) are often expressed in terms of their Cooper statistics. These statistics are often reported without any indication of their uncertainty, making it impossible to judge whether the predicted classifications are significantly better than the predictions made by a different CM, or whether the predictive performance of the CM exceeds predefined performance criteria in a statistically significant way. Bootstrap resampling routines are reported that provide a means of expressing the uncertainty associated with Cooper statistics. The usefulness of the bootstrapping routines is illustrated by constructing 95% confidence intervals for the Cooper statistics of four alternative skin-corrosivity tests (the rat skin transcutaneous electrical resistance assay, EPISKIN, Skin(2) and CORROSITEX), and four two-step sequences in which each in vitro test is used in combination with a physicochemical test for skin corrosion based on pH measurements.