This study aims to evaluate the reliability of repeated graded workload treadmill testing (G-test; 2 mph; 0% grade, increasing 2% every 2 min) and to compare the reliability of a constant workload treadmill protocol (C-test; 2 mph; 12% grade) versus the graded workload treadmill protocol in patients with intermittent claudication, studied longitudinally. A clinical trial investigating an orally stable prostacycline derivative that included 330 patients with intermittent claudication was performed. The trial employed three active treatment groups and one placebo group. Because there were no significant inter-group differences at baseline or after treatment, data from all groups were pooled for the evaluation of treadmill test reliability. Treadmill data were obtained from a 2-week run-in phase where three G-tests were performed, as well as from the beginning and the end of a 3-month double-blind phase where a G-test and a C-test were performed in random order. Treadmill test reliability was described through test process-related and between-subject variances and also using variance-derived parameters such as the reliability coefficient (RC) and the relative precision (RP). A higher value for the RC and a lower value for the RP indicate that the test variability is predominantly due to between-subject variance and not to test process-related variance. Estimates of variance were described for both the maximal or absolute claudication distance (ACD) and the initial claudication distance (ICD) with each treadmill test. Reliability estimates are reported for the total study sample and for patients with baseline claudication distances < or =300 feet and >300 feet (approximately < or =100 m; >100 m), as measured with the C-test. The cut-off value was empirically chosen to separate severely diseased from mild to moderately diseased claudicants. Theoretical considerations suggest that reliability measures may differ in these subgroups. With repeated testing during the run-in phase for the measure of ACD, the G-test had an RC of 0.952 and an RP of 21.9%. With the comparison of both test protocols in the entire study population for the measurement of ACD, the G-test had an RC of 0.902 and an RP of 31.3%, while the C-test had an RC of 0.876 and an RP of 35.2%. The results for ICD on the G-test were an RC of 0.809 and an RP of 43.7%, while the C-test had an RC of 0.737 and an RP of 51.3%. The reliability of the ACD measurement for RC and RP was numerically superior to those for the ICD for both protocols. In patients with a baseline ACD < or =300 feet, the RC for ACD on the G-test was 0.827 and the RP was 41.4%. In contrast, on the C-test the RC decreased to 0.250 and the RP increased to 86.6%. These changes in RC and RP were due to a marked decrease in the between-subject variance, demonstrating the inability of the C-test to separate appropriately the different claudication distances in populations with highly limited baseline claudication distances. During a run-in phase, the G-test has excellent test characteristics. During the longitudinal phase of a trial, the reliability of G-tests and C-tests are comparable in the entire study population. However, in patients with low claudication distances, the G-test should be given preference over the C-test.