Introduction: Reliance on administrative data sources and a cohort with restricted age range (Medicare 65 y and above) may limit conclusions drawn from public reporting of 30-day mortality rates in 3 diagnoses [acute myocardial infarction (AMI), congestive heart failure (CHF), pneumonia (PNA)] from Center for Medicaid and Medicare Services.
Methods: We categorized patients with diagnostic codes for AMI, CHF, and PNA admitted to 138 Veterans Administration hospitals (2006-2009) into 2 groups (less than 65 y or ALL), then applied 3 different models that predicted 30-day mortality [Center for Medicaid and Medicare Services administrative (ADM), ADM+laboratory data (PLUS), and clinical (CLIN)] to each age/diagnosis group. C statistic (CSTAT) and Hosmer Lemeshow Goodness of Fit measured discrimination and calibration. Pearson correlation coefficient (r) compared relationship between the hospitals' risk-standardized mortality rates (RSMRs) calculated with different models. Hospitals were rated as significantly different (SD) when confidence intervals (bootstrapping) omitted National RSMR.
Results: The ≥ 65-year models included 57%-67% of all patients (78%-82% deaths). The PLUS models improved discrimination and calibration across diagnoses and age groups (CSTAT-CHF/65 y and above: 0.67 vs. 0. 773 vs. 0.761; ADM/PLUS/CLIN; Hosmer Lemeshow Goodness of Fit significant 4/6 ADM vs. 2/6 PLUS). Correlation of RSMR was good between ADM and PLUS (r-AMI 0.859; CHF 0.821; PNA 0.750), and 65 years and above and ALL (r>0.90). SD ratings changed in 1%-12% of hospitals (greatest change in PNA).
Conclusions: Performance measurement systems should include laboratory data, which improve model performance. Changes in SD ratings suggest caution in using a single metric to label hospital performance.