Physician-Confirmed and Administrative Definitions of Stroke in UK Biobank Reflect the Same Underlying Genetic Trait

Kristiina Rannikmäe; Konrad Rawlik; Amy C Ferguson; Nikos Avramidis; Muchen Jiang; Nicola Pirastu; Xia Shen; Emma Davidson; Rebecca Woodfield; Rainer Malik; Martin Dichgans; Albert Tenesa; Cathie Sudlow

doi:10.3389/fneur.2021.787107

Physician-Confirmed and Administrative Definitions of Stroke in UK Biobank Reflect the Same Underlying Genetic Trait

Front Neurol. 2022 Feb 2:12:787107. doi: 10.3389/fneur.2021.787107. eCollection 2021.

Authors

Kristiina Rannikmäe¹, Konrad Rawlik², Amy C Ferguson¹, Nikos Avramidis³, Muchen Jiang⁴, Nicola Pirastu⁵, Xia Shen^{5

6

7}, Emma Davidson⁸, Rebecca Woodfield⁹, Rainer Malik¹⁰, Martin Dichgans^{10

11

12}, Albert Tenesa^{1

2

13}, Cathie Sudlow^{1

14}

Affiliations

¹ Centre for Medical Informatics, Usher Institute, University of Edinburgh, Edinburgh, United Kingdom.
² Roslin Institute, University of Edinburgh, Edinburgh, United Kingdom.
³ School of Biological Sciences, University of Edinburgh, Edinburgh, United Kingdom.
⁴ Medical School, University of Edinburgh, Edinburgh, United Kingdom.
⁵ Usher Institute, University of Edinburgh, Edinburgh, United Kingdom.
⁶ Biostatistics Group, Greater Bay Area Institute of Precision Medicine (Guangzhou), Fudan University, Guangzhou, China.
⁷ Department of Medical Epidemiology and Biostatistics, Karolinska Institutet, Stockholm, Sweden.
⁸ Centre for Clinical Brain Sciences, University of Edinburgh, Edinburgh, United Kingdom.
⁹ Department of Medicine for the Elderly, Western General Hospital, Edinburgh, United Kingdom.
¹⁰ Institute for Stroke and Dementia Research (ISD), University Hospital, LMU Munich, Munich, Germany.
¹¹ Munich Cluster for Systems Neurology, Munich, Germany.
¹² German Center for Neurodegenerative Diseases (DZNE), Munich, Germany.
¹³ MRC Human Genetics Unit, Institute of Genetics and Cancer, University of Edinburgh, Western General Hospital, Edinburgh, United Kingdom.
¹⁴ BHF Data Science Centre, Health Data Research UK, London, United Kingdom.

Abstract

Background: Stroke in UK Biobank (UKB) is ascertained via linkages to coded administrative datasets and self-report. We studied the accuracy of these codes using genetic validation.

Methods: We compiled stroke-specific and broad cerebrovascular disease (CVD) code lists (Read V2/V3, ICD-9/-10) for medical settings (hospital, death record, primary care) and self-report. Among 408,210 UKB participants, we identified all with a relevant code, creating 12 stroke definitions based on the code type and source. We performed genome-wide association studies (GWASs) for each definition, comparing summary results against the largest published stroke GWAS (MEGASTROKE), assessing genetic correlations, and replicating 32 stroke-associated loci.

Results: The stroke case numbers identified varied widely from 3,976 (primary care stroke-specific codes) to 19,449 (all codes, all sources). All 12 UKB stroke definitions were significantly correlated with the MEGASTROKE summary GWAS results (rg.81-1) and each other (rg.4-1). However, Bonferroni-corrected confidence intervals were wide, suggesting limited precision of some results. Six previously reported stroke-associated loci were replicated using ≥1 UKB stroke definition.

Conclusions: Stroke case numbers in UKB depend on the code source and type used, with a 5-fold difference in the maximum case-sample size. All stroke definitions are significantly genetically correlated with the largest stroke GWAS to date.

Keywords: accuracy; genetic correlation; routinely collected health data; stroke; validation.

Grants and funding

MR/S004130/1/MRC_/Medical Research Council/United Kingdom