Harnessing Big Heterogeneous Data to Evaluate the Potential Impact of HIV Responses Among Key Populations in Sub-Saharan Africa: Protocol for the Boloka Data Repository Initiative

JMIR Res Protoc. 2025 Jan 22:14:e63583. doi: 10.2196/63583.

Abstract

Background: In South Africa, there is no centralized HIV surveillance system where key populations (KPs) data, including gay men and other men who have sex with men, female sex workers, transgender persons, people who use drugs, and incarcerated persons, are stored in South Africa despite being on higher risk of HIV acquisition and transmission than the general population. Data on KPs are being collected on a smaller scale by numerous stakeholders and managed in silos. There exists an opportunity to harness a variety of data, such as empirical, contextual, observational, and programmatic data, for evaluating the potential impact of HIV responses among KPs in South Africa.

Objective: This study aimed to leverage and harness big heterogeneous data on HIV among KPs and harmonize and analyze it to inform a targeted HIV response for greater impact in Sub-Saharan Africa.

Methods: The Boloka data repository initiative has 5 stages. There will be engagement of a wide range of stakeholders to facilitate the acquisition of data (stage 1). Through these engagements, different data types will be collated (stage 2). The data will be filtered and screened to enable high-quality analyses (stage 3). The collated data will be stored in the Boloka data repository (stage 4). The Boloka data repository will be made accessible to stakeholders and authorized users (stage 5).

Results: The protocol was funded by the South African Medical Research Council following external peer reviews (December 2022). The study received initial ethics approval (May 2022), renewal (June 2023), and amendment (July 2024) from the University of Johannesburg (UJ) Research Ethics Committee. The research team has been recruited, onboarded, and received non-web-based internet ethics training (January 2023). A list of current and potential data partners has been compiled (January 2023 to date). Data sharing or user agreements have been signed with several data partners (August 2023 to date). Survey and routine data have been and are being secured (January 5, 2023). In (September 2024) we received Ghana Men Study data. The data transfer agreement between the Pan African Centre for Epidemics Research and the Perinatal HIV Research Unit was finalized (October 2024), and we are anticipating receiving data by (December 2024). In total, 7 abstracts are underway, with 1 abstract completed the analysis and expected to submit the full article to the peer-reviewed journal in early January 2024. As of March 2025, we expect to submit the remaining 6 full articles.

Conclusions: A truly "complete" data infrastructure that systematically and rigorously integrates diverse data for KPs will not only improve our understanding of local epidemics but will also improve HIV interventions and policies. Furthermore, it will inform future research directions and become an incredible institutional mechanism for epidemiological and public health training in South Africa and Sub-Saharan Africa.

International registered report identifier (irrid): DERR1-10.2196/63583.

Keywords: Boloka data repository; HIV, key populations; Sub-Saharan Africa; big heterogeneous data.

MeSH terms

  • Africa South of the Sahara / epidemiology
  • Big Data
  • Female
  • HIV Infections* / epidemiology
  • HIV Infections* / transmission
  • Humans
  • Male
  • Sex Workers
  • Sexual and Gender Minorities
  • South Africa / epidemiology