Background: Most public reporting and pay for performance (P4P) programs in the United States continue to be organized and implemented by single insurers. Adequate medical group-level reliability on clinical care process measures is possible in multistakeholder initiatives because patient samples can be pooled across payers. However, the extent to which reliable measurement is achievable in single insurer P4P initiatives remains unclear.
Methods: This study uses 7 years (2001 to 2007) of patient-level clinical care process data from an insurer in Washington State involving 20 medical groups. Eight clinical care process measures were analyzed. We compared the medical group-level reliability and resulting sample size requirements for each of the 8 measures using unadjusted and adjusted binary mixed models. The relation of baseline intraclass correlation coefficients (ICCs) and medical group performance change over time was examined for each clinical care process measure.
Results: Only 45% of all medical group measurements (group-years for all observations) had sufficient sample sizes to achieve reliable estimates of group performance. Measures with the largest deficiencies in patient samples per group included appropriate asthma treatment and low-density lipoprotein screening for patients with coronary artery disease. There was an inconsistent relationship between the size of baseline ICCs and medical group performance improvement over time.
Conclusions: Unreliable performance measurement is an important consequence of the prevailing organization and implementation of public reporting and P4P programs in the US. Multi-payer collaborations may be an important vehicle for ensuring reliable medical group performance measurement and comparisons on clinical care process measures.