Background: Machine learning has the potential to help researchers better understand and close the gap in HIV care delivery in large metropolitan regions such as Mecklenburg County, North Carolina, USA.
Objectives: We aim to identify important risk factors associated with delayed linkage to care for HIV patients with novel machine learning models and identify high-risk regions of the delay.
Methods: Deidentified 2013-2017 Mecklenburg County surveillance data in eHARS format were requested. Both univariate analyses and machine learning random forest model (developed in R 3.5.0) were applied to quantify associations between delayed linkage to care (>30 days after diagnosis) and various risk factors for individual HIV patients. We also aggregated linkage to care by zip codes to identify high-risk communities within the county.
Results: Types of HIV-diagnosing facility significantly influenced time to linkage; first diagnosis in hospital was associated with the shortest time to linkage. HIV patients with lower CD4+ cell counts (<200/ml) were twice as likely to link to care within 30 days than those with higher CD4+ cell count. Random forest model achieved high accuracy (>80% without CD4+ cell count data and >95% with CD4+ cell count data) to predict risk of delay in linkage to care. In addition, we also identified top high-risk zip codes of delayed linkage.
Conclusion: The findings helped public health teams identify high-risk communities of delayed HIV care continuum across Mecklenburg County. The methodology framework can be applied to other regions with HIV epidemic and challenge of delayed linkage to care.
Copyright © 2021 Wolters Kluwer Health, Inc. All rights reserved.