Article Text

Original research
Ancestral diversity in lipoprotein(a) studies helps address evidence gaps
  1. Moa P Lee1,
  2. Sofia F Dimos1,
  3. Laura M Raffield2,
  4. Zhe Wang3,
  5. Anna F Ballou1,
  6. Carolina G Downie1,
  7. Christopher H Arehart4,
  8. Adolfo Correa5,
  9. Paul S de Vries6,
  10. Zhaohui Du7,
  11. Christopher R Gignoux4,
  12. Penny Gordon-Larsen8,
  13. Xiuqing Guo9,
  14. Jeffrey Haessler7,
  15. Annie Green Howard10,
  16. Yao Hu7,
  17. Helina Kassahun11,
  18. Shia T Kent12,
  19. J Antonio G Lopez11,
  20. Keri L Monda12,
  21. Kari E North1,
  22. Ulrike Peters7,
  23. Michael H Preuss3,
  24. Stephen S Rich13,
  25. Shannon L Rhodes12,
  26. Jie Yao9,
  27. Rina Yarosh1,
  28. Michael Y Tsai14,
  29. Jerome I Rotter9,
  30. Charles L Kooperberg7,
  31. Ruth J F Loos3,15,
  32. Christie Ballantyne16,
  33. Christy L Avery1 and
  34. Mariaelisa Graff1
  1. 1Department of Epidemiology, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
  2. 2Department of Genetics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
  3. 3The Charles Bronfman Institute for Personalized Medicine, Icahn School of Medicine at Mount Sinai, New York, New York, USA
  4. 4Department of Medicine, University of Colorado Anschutz Medical Campus, Aurora, Colorado, USA
  5. 5Department of Population Health Science, The University of Mississippi Medical Center, Jackson, Mississippi, USA
  6. 6Department of Epidemiology, Human Genetics, and Environmental Sciences, The University of Texas Health Science Center at Houston, Houston, Texas, USA
  7. 7Division of Public Health Sciences, Fred Hutchinson Cancer Research Center, Seattle, Washington, USA
  8. 8Department of Nutrition, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
  9. 9Department of Pediatrics, UCLA Medical Center, Los Angeles, California, USA
  10. 10Department of Biostatistics, The University of North Carolina at Chapel Hill, Chapel Hill, North Carolina, USA
  11. 11Global Development, Amgen Inc, Thousand Oaks, California, USA
  12. 12Center for Observational Research, Amgen Inc, Thousand Oaks, California, USA
  13. 13University of Virginia School of Medicine, Charlottesville, Virginia, USA
  14. 14Department of Laboratory Medicine & Pathology, University of Minnesota, Minneapolis, Minnesota, USA
  15. 15Novo Nordisk Foundation Center for Basic Metabolic Research, Faculty of Health and Medical Sciences, University of Copenhagen, Kobenhavn, Denmark
  16. 16Department of Medicine, Section of Cardiology, Baylor College of Medicine, Houston, Texas, USA
  1. Correspondence to Dr Christy L Avery; christy_avery{at}


Introduction The independent and causal cardiovascular disease risk factor lipoprotein(a) (Lp(a)) is elevated in >1.5 billion individuals worldwide, but studies have prioritised European populations.

Methods Here, we examined how ancestrally diverse studies could clarify Lp(a)’s genetic architecture, inform efforts examining application of Lp(a) polygenic risk scores (PRS), enable causal inference and identify unexpected Lp(a) phenotypic effects using data from African (n=25 208), East Asian (n=2895), European (n=362 558), South Asian (n=8192) and Hispanic/Latino (n=8946) populations.

Results Fourteen genome-wide significant loci with numerous population specific signals of large effect were identified that enabled construction of Lp(a) PRS of moderate (R2=15% in East Asians) to high (R2=50% in Europeans) accuracy. For all populations, PRS showed promise as a ‘rule out’ for elevated Lp(a) because certainty of assignment to the low-risk threshold was high (88.0%–99.9%) across PRS thresholds (80th–99th percentile). Causal effects of increased Lp(a) with increased glycated haemoglobin were estimated for Europeans (p value =1.4×10−6), although inverse effects in Africans and East Asians suggested the potential for heterogeneous causal effects. Finally, Hispanic/Latinos were the only population in which known associations with coronary atherosclerosis and ischaemic heart disease were identified in external testing of Lp(a) PRS phenotypic effects.

Conclusions Our results emphasise the merits of prioritising ancestral diversity when addressing Lp(a) evidence gaps.

  • biomarkers
  • epidemiology
  • genome-wide association study
  • genetic association studies

Data availability statement

Data may be obtained from a third party and are not publicly available. All data are available through parent studies.

This is an open access article distributed in accordance with the Creative Commons Attribution Non Commercial (CC BY-NC 4.0) license, which permits others to distribute, remix, adapt, build upon this work non-commercially, and license their derivative works on different terms, provided the original work is properly cited, appropriate credit is given, any changes made indicated, and the use is non-commercial. See:

Statistics from

Request Permissions

If you wish to reuse any or all of this article please use the link below which will take you to the Copyright Clearance Center’s RightsLink service. You will be able to get a quick price and instant permission to reuse the content in many different ways.


  • Lipoprotein(a) (Lp(a)) is a highly heritable cardiovascular disease (CVD) risk factor for which pivotal clinical trials are underway.

  • Despite being one of the most variable CVD risk factors across populations, the majority of Lp(a) research has been performed in European ancestral populations, limiting the reach and generalisability of the evidence base that informs clinical and public health decision making to five-sixths of the global population.


  • By applying innovations in statistical genetics to five ancestrally diverse populations, we demonstrate how increasing ancestral diversity helps clarify the role of Lp(a) in disease pathogenesis, identify individuals with elevated Lp(a) and enable causal inference studies.


  • Important insights into Lp(a) can be enabled by inclusion of modestly sized populations of non-European ancestry. Increased awareness of the benefits of ancestral diversity will provide an essential foundation for future studies that aim to maximise the benefits and minimise the risks of therapeutic Lp(a) lowering in everyone. 


Lipoprotein(a) (Lp(a)), a highly atherogenic and prothrombotic lipoprotein, is an independent and causal cardiovascular disease (CVD) risk factor that is elevated in an estimated 1.5 billion individuals worldwide.1 2 Strong and consistent evidence linking Lp(a) with CVDs has motivated the development of Lp(a)-reducing therapies that are in pivotal clinical trials.3 Despite clinical, regulatory and public health interest in Lp(a), several major evidence gaps remain. First, the role of Lp(a) in the pathogenesis of non-CVD phenotypes remains largely unexplored despite the potential for broad phenotypic effects.4–8 Opportunities to anticipate adverse effects of therapeutic Lp(a) lowering, to illuminate mechanisms of action and to identify novel treatment indications through drug repurposing are therefore missed.9 10 Second, Lp(a) is not routinely measured in clinical practice, as few guidelines advocate universal measurement and the performance of commercially available assays varies.11 12 As a result, populations with very high lifetime CVD risk remain unidentified. Third, Lp(a) is distinguished from other CVD risk factors by pronounced ancestral differences.13 The causes and consequences of these ancestral differences are poorly understood, but may reflect heterogeneity in Lp(a)’s genetic architecture.14 Finally, Lp(s) is not found in commonly studied laboratory animals, limiting the reach of mechanistic studies.15 Causal inference studies could help bridge this gap, although few studies outside European populations or CVD outcomes have been published.4 16 As a result, opportunities to inform Lp(a) and maximise its utility in clinical medicine and public health remain incompletely realised.

Lp(a)’s high heritability (h2=70%–90%), oligogenic genetic architecture and relative stability across the lifespan offer several avenues to address these evidence gaps.17 18 These features have enabled estimation of highly accurate Lp(a) polygenic risk scores (PRS) that, by aggregating variants into a single score, explain as much as 60% of Lp(a) trait variance.16 19 20 Few other CVD risk factors are predicted with such high accuracy. However, highly predictive Lp(a) PRS are only available for European populations and these European-derived PRS are not portable to other populations. This lack of portability reflects heterogeneity in the genetic architecture of Lp(a) across ancestral populations21 as well as the broader challenge of limited ancestral diversity in genetics research.22 23

Here, we constructed ancestry specific Lp(a) PRS using seven different approaches to maximise accuracy for populations of African, East Asian, European and South Asian ancestry. These PRS are publicly available and, for non-European populations, outperformed existing Lp(a) PRS.24 We then showed how these PRS and the genome-wide association studies (GWAS) from which they were derived can inform efforts examining clinical application of Lp(a) PRS, clarify Lp(a)’s genetic architecture, enable causal inference studies and identify unexpected phenotypic effects, even when the size of non-European populations is modest.


Study populations

We included data from two sources: The Population Architecture through Genomics and Environment (PAGE) study and the UK Biobank (online supplemental table 1 and text). The PAGE study25 is a consortium funded since 2008 by the National Institutes of Health to examine the genetic underpinnings of common complex diseases and phenotypes in ancestrally diverse US populations. Five PAGE studies were included for PRS construction and testing: the Atherosclerosis Risk in Communities study (ARIC),26 the Coronary Artery Risk in Young Adults study (CARDIA),27 the Jackson Heart Study (JHS),28 the Multi-Ethnic Study of Atherosclerosis (MESA)29 and the Women’s Health Initiative (WHI).30 A sixth PAGE study, the Mount Sinai BioMe biobank (BioMe), was included to examine external application of PRS.

The UK Biobank is a publicly available, longitudinal study of England, Wales and Scotland residents.31 For all studies except BioMe, eligible participants had genotypic data and Lp(a) measures and were categorised into one of four populations with large enough numbers to support ancestry specific Lp(a) GWAS: African, East Asian, European or South Asian. For BioMe, eligible participants had imputed GWAS data, inpatient ICD-9 and ICD-10 codes, and were categorised into three populations with large enough number to support self-reported race/ethnicity-specific analyses: European, African and Hispanic/Latino.

Genotyping and imputation

After genotyping using one of several assays, imputation was performed across studies using the UK10K and 1000 Genomes Project (UK Biobank) or 1000 Genomes Project (PAGE) phase 3 reference panels (online supplemental table 2). In addition to study-specific protocols, we excluded variants meeting any of the following criteria on a population-specific and study-specific basis given the large differences in study size: minor allele frequency (MAF) <0.001; imputation quality score <0.4; or effective minor allele count Neff<30, where Neff =2f(1−f)Nq and f is minor allele frequency, N is the sample size and q is the imputation quality.

Statistical methods

Genome-wide association studies

We estimated Lp(a) ancestry-specific and study-specific genetic effects using generalised linear models of the form Embedded Image where Y was a vector of inverse-normalised Lp(a), g was an identity link function, X denoted confounders (age, sex, 15 ancestral principal components and study centre, when appropriate) and G was variant dosage. Linear models were implemented in SUGEN (ARIC, CARDIA, WHI),32 EMMAX (JHS),33 SNPTEST (MESA)34 or SAIGE (UK Biobank),35 accounting for relatedness as appropriate. Quality control and visualisation of GWAS results were performed using the EasyQC36 and EasyStrata37 packages.

Ancestry-specific GWAS results were combined using inverse-variance weighted, genomic inflation-corrected meta-analysis.38 Variant effect heterogeneity was assessed with the Cochran’s Q test within ancestry. Variants with p values <5×10−9 (Bonferroni correction for 10M tests) were considered genome-wide significant.39 Our ancestry-specific Lp(a) GWAS summary statistics will be available on the NHGRI-EBI GWAS catalogue pending publication given submission requirements (

Identification of independent/secondary signals

We used GCTA-COJO,40 the ‘cojo-slct’ method, to perform forward model selection to identify independently associated variants across each set of ancestry-specific GWAS meta-analyses. To account for linkage disequilibrium (LD) in each ancestry-specific analysis, we identified a subset of genetically unrelated UK Biobank participants and calculated LD between variant pairs in 10 Mb windows.

Variant-based heritability and percentage variance explained

Lp(a) narrow sense-heritability was estimated by ancestry using GCTA and unrelated UK Biobank participants.41 42 Briefly, a genetic relationship matrix was created by ancestry for each chromosome (1–22), including variants imputed with high quality (>0.7). After combining the matrices, we fit ancestry-specific linear models adjusting for age, sex, 15 ancestral principal components and study centre using restricted maximum likelihood to estimate the percentage variance of Lp(a) explained by genome-wide variants and LPA variants.

Lp(a) PRS estimation

We examined seven approaches to estimate Lp(a) ancestry specific PRS. These approaches were distinguished by the statistical method, the data used to construct variant weights, the genomic region examined (LPA locus or genome-wide) and the LD reference panel (online supplemental table 3). For approaches that used external PRS,43 Pruning and Thresholding44 and LDpred2,45 the discovery data were independent of the target data. For approaches that used GCTA-COJO40 to identify independent signals across ancestry-specific GWAS meta-analyses and Crosspred,46 cross-validation was used to address a lack of independence between the discovery and target data.

PRS performance at the study level was compared using the incremental R2 after accounting for age, age2, sex, 15 ancestral principal components and study centre for continuous Lp(a) and the area under the receiver operator curve (AUC) when Lp(a) was dichotomised. When estimating AUC using Lp(a) PRS as the predictor, measured Lp(a) was dichotomised at 125 nmol/L, the Lp(a) clinical risk enhancer threshold47; participants with Lp(a) levels at or above 125 nmol/L were classified as having high Lp(a). The 95% CI for R2 were obtained from 1000 non-parametric bootstrap replicates of the parameters reporting their quantile at 2.5% and 97.5%.

Certainty in PRS estimation

To inform the use of PRS for the identification of individuals who may benefit from Lp(a) testing, we quantified PRS certainty at the individual level for the best performing PRS within each ancestral population.48 Briefly, the variance of each participant’s assigned PRS was estimated across four PRS stratification thresholds (the 80th, 90th, 95th and 99th PRS percentiles). Selection of the 90th, 95th and 99th percentiles was data driven, whereas the 80th percentile was selected because it corresponded to a mean Lp(a) level of 125 nmol/L in African and European populations (online supplemental figure 1). A percentile threshold corresponding to 125 nmol/L was not examined in East Asian and South Asian populations because either the mean Lp(a) level for the highest percentiles did not exceed 100 nmol/L (East Asians) or there were too few participants in the percentiles with mean Lp(a) >125 nmol/L (South Asians), which could reflect modest PRS accuracy. We then computed a 95% credible interval for the individual PRS point estimates, which is interpreted as the interval in which the true PRS is expected to fall with 95% probability. Finally, we calculated the proportion of individuals whose 95% credible intervals were fully contained within their assigned threshold range as a measure of certainty.

BioMe phenome-wide association study

To characterise the degree to which Lp(a) association studies reported consistent results across populations, we conducted a phenome-wide association study (PheWAS) in BioMe participants of White, African American and Hispanic/Latino self-identified race/ethnicity using R.49 A genetically inferred Lp(a) value, which we substituted for measured Lp(a), was constructed for each BioMe participant with genotypic data. For White and Hispanic/Latino BioMe participants, we used the approach and weights from the external PRS developed in a European ancestral population, as an Lp(a) PRS for Hispanic/Latinos was unavailable. For African American BioMe participants, we used the independent signal (genome-wide) approach developed in PAGE and UK Biobank participants of African ancestry. Using these genetically inferred Lp(a) measures, we then estimated race/ethnic-specific associations with a maximum of 1004 phecodes with ≥20 cases derived from ICD-9 and ICD-10 inpatient codes, adjusting for age, sex and 15 ancestral principal components. Because statistical power to evaluate >1000 phecodes would be modest, we restricted our attention to phenotypes that were nominally significant (ie, p value <0.05) and were previously associated with Lp(a).

Mendelian randomisation

Causal effects of Lp(a) on two phenotypes with inconsistent evidence of association with Lp(a) (estimated glomerular filtration rate (eGFR) and glycated haemoglobin (hbA1c))8 50 were estimated in UK Biobank participants without chronic kidney disease or diabetes, respectively, by population using Mendelian randomisation (MR). MR is a form of instrumental variable (IV) analysis based on the concept that if X (Lp(a)) affects outcome Y (eGFR or hbA1c), factors affecting X (ie, Lp(a) PRS, G) must also affect Y.51 G therefore serves as an IV to estimate causal effects of X on Y. Strengths of MR include G-Y associations that are assumed to be robust to confounding from variables other than ancestry, which can be addressed through adjustment. Random assignment of G at conception also enables assessment of temporality of the G-Y association and, by extension, the X-Y association. Causal effects were estimated in SAS (PROC SYSLIN) using two-stage least squares after examination of the pleiotropy and instrument strength MR assumptions and adjusting for age, sex, study centre and 15 ancestral principal components.51 52 To facilitate comparison, we also conducted standard association analysis using linear regression.


A total of 407 799 participants of African (n=25 208), East Asian (n=2895), European (n=3 62 558), South Asian (n=8192) and Hispanic/Latino (n=8946) race/ethnicity or ancestry were included (online supplemental table 1). Study participants spanned early to late adulthood (age range: 18 to >80 years) and 56% were female. As exemplified by UK Biobank participants, the distribution of Lp(a) varied by ancestry (figure 1, online supplemental table 4). An estimated 21%, 4%, 14% and 8% of African, East Asian, European and South Asian UK Biobank participants had Lp(a) values >125 nmol/L.

Figure 1

Distributions of lipoprotein(a) by ancestry in n=3 57 096 UK Biobank participants at study baseline.

Lp(a) ancestry specific GWAS

Fourteen genome-wide significant Lp(a) loci were identified, with little evidence of within-ancestry heterogeneity: 3 loci for African, 1 locus for East Asian, 13 loci for European and 2 loci for South Asian participants (figure 2, online supplemental table 5 and figure 2). The major chromosome 6 LPA locus was the most pronounced signal, harboured numerous independent secondary signals (online supplemental tables 6 and 7) and had lead variants with very strong (SD increment per effect allele copy: |0.74–1.21|) effects. LPA also demonstrated a unique genetic architecture for each ancestral population. The African LPA lead variant (rs41269135, A effect allele) was monomorphic (ie, only the non-effect allele G was observed) in East Asian, European and South Asian participants, and the European LPA lead variant (rs10455872, A effect allele) was monomorphic in East Asian participants. Other notable Lp(a) loci included APOE on chromosome 19, which was identified at genome-wide significance levels for African (lead variant rs7412, T effect allele), European (lead variant rs1065853, T effect allele) and South Asian participants (lead variant rs7412, T effect allele). APOE lead variants rs7412 and rs1065853 were in high LD (eg, R2=1 in 1000 Genomes EUR) (online supplemental table 5), suggesting that these lead variants tag the same locus.

Figure 2

Genome wide-significant (p<5×10−9) loci discovered in ancestry-specific lipoprotein(a) genome-wide association studies of African (n=19 333), East Asian (n=2895), European (n=354 843) and South Asian (n=8192) Population Architecture through Genomics and Environment study and UK Biobank participants.

Variant-based heritability and percentage variance explained

Ancestry specific Lp(a) variant-based heritability estimates ranged from 13.1% in East Asian to 38.4% in African participants for the entire genome (online supplemental table 8). To reduce uncertainty and better capture the oligogenic architecture of Lp(a), we also estimated variant-based heritability restricting to the LPA locus. When restricting to the LPA locus, ancestry-specific variant-based heritability estimates approximately doubled in magnitude (LPA-specific h2 range: 22.5% in East Asian to 76.1% in European participants). However, smaller increases were observed for African participants (LPA-specific h2=45.7%; genome-wide h2=38.4%).

Ancestry specific PRS performance

We evaluated seven approaches for estimating ancestry specific Lp(a) PRS (figure 3, online supplemental table 9). The external PRS that included 43 LPA variants43 was highly predictive in Europeans (R2=50%), but was not transferable across ancestries (R2 range: 2–11%). Lp(a) PRS prediction accuracy in non-European populations increased considerably when ancestry specific discovery data were used, with R2=12%–25% in African and R2=10%–12% in East Asian populations despite discovery populations that were modest in size (n=11 287 in Africans and n=706 in East Asians). For South Asians, increased prediction accuracy was achieved when using Crosspred, the approach that did not require an independent discovery population (R2=20%), which was unavailable. Prediction accuracy also was increased in East Asian populations (R2=15%) when Crosspred was used. When Lp(a) PRS were extended to include regions outside LPA, prediction accuracy was increased for African population (R2=27%), but was substantially decreased (R2 range: 9%–27%) for East Asian, European and South Asian populations. The most accurate PRS for each ancestral population showed good to excellent calibration (online supplemental figure 1) and varied in the ability to distinguish participants with high Lp(a) levels (AUC range: 0.66 (East Asians)–0.90 (Europeans)) (online supplemental table 9).

Figure 3

Performance of lipoprotein(a) polygenic risk scores (PRS) by methods across study populations in n=385 263 participants of African (n=19 333), East Asian (n=2895), European (n=3 54 843) and South Asian (n=8192) ancestry from the Population Architecture through Genomics and Environment study and UK Biobank.

PRS certainty

We quantified certainty in PRS at the individual level across different PRS thresholds that might be used to identify high-risk populations (table 1). For the 90th–99th percentile thresholds, PRS assignment to the upper threshold was most certain for European participants (84.7%–94.4% of participants with 95% credible intervals fully contained in the upper threshold) and least certain for the East Asian participants (9.1%–19.6% of participants with 95% credible intervals fully contained in the upper threshold). In contrast, certainty in PRS assignment to the lower threshold was high for all populations (90th–99th percentile range: 88.6%–99.9%). We also examined an ancestry specific threshold corresponding to a mean Lp(a) ≥125 nmol/L, in African and European participants (online supplemental figure 1). This threshold occurred at the 80th percentile for both African and European participants. Approximately 94.2% of European and 65.5% of African participants with PRS assigned to the upper threshold had their 95% credible intervals fully contained in the upper threshold. Certainty in assignment to lower threshold remained high for both European (98.1%) and African (87.9%) participants.

Table 1

PRS-based individual stratification uncertainty across four populations (European, African, South Asian and East Asian) and four thresholds in n=357 096 UK Biobank participants at study baseline

Application of PRS in external studies to identify potentially novel Lp(a) effects

A total of 23 circulatory system phenotypes were nominally associated with genetically inferred Lp(a) in at least one BioMe population (online supplemental table 10). Of note, established associations with ischaemic heart disease (IHD) and coronary atherosclerosis (CAD) only were observed for Hispanic/Latino participants (n=1400 IHD cases; n=1200 CAD cases). These observations may reflect increased statistical power in Hispanic/Latinos relative to African Americans (n=680 IHD cases; n=620 CAD cases) and whites (n=740 IHD cases; n=710 CAD cases).

Causal inference using PRS as instrumental variables

For eGFR, estimates from MR uniformly indicated no causal effect of Lp(a) (table 2) (p value range: 0.15–0.76). These results contrasted with results obtained using standard methods evaluating measured Lp(a), which instead suggested that Lp(a) was inversely and significantly associated with eGFR in European (p value =1.7×10−8), South Asian (p value =0.039) and African (p value =0.014) populations. For hbA1c, both statistical methods suggested that increased Lp(a) was associated with increased hbA1c in European ancestry participants (p values <1.4×10−6). Causal effects of Lp(a) on hbA1c suggested inverse associations in African and East Asian populations.

Table 2

Mendelian randomisation causal estimates and standard estimates for lipoprotein(a) with glycated haemoglobin and estimated glomerular filtration rate in UK Biobank participants at study baseline by ancestral population


In this study, we demonstrated how the development and application of Lp(a) PRS in ancestrally diverse populations could help address longstanding Lp(a) evidence gaps. Importantly, studies of ancestrally diverse populations contributed unique information even when the PRS were less accurate than PRS in European populations. As new therapies for Lp(a) emerge, expanding ancestral diversity even further will help strengthen genetically informed risk prediction, enable novel biological insight and ensure that the promises of precision medicine are relevant for all populations.

Despite great interest in Lp(a) PRS and repeated calls for expanding ancestral diversity when constructing and testing PRS more generally, studies continue to prioritise European populations.22 Lp(a) is no exception. Three of five published studies that constructed Lp(a) PRS in the UK Biobank were restricted to European populations, although the UK Biobank included almost 20 000 participants of diverse ancestries.8 53 54 Of two prior studies that included African, East Asian and South Asian UK Biobank participants, Lp(a) PRS accuracy was poor to moderate (eg, R2 ranged 0.3% in Africans to 16% in South Asians55). Such low accuracy likely represents the application of discovery data and analytic methods that perform well in European population, but poorly in other populations. We built on these efforts by prioritising ancestrally diverse populations and sharing our results publicly to enable further efforts. Our results demonstrated that increasing PRS accuracy in non-European populations is feasible and, unlike PRS for many other cardiovascular phenotypes,56 does not require prohibitively large sample sizes. Indeed, inclusion of modestly sized samples of ancestry matched discovery data outperformed efforts that substituted discovery data collected in several hundred thousand participants of European ancestry, but these PRS remained considerably less accurate than results in European populations.

Adding further insight to the clinical application of Lp(a) PRS were estimates of individual PRS certainty. Certainty is an important companion metric to commonly reported population level measures like R2 that do not quantify variability in individual PRS estimates. We demonstrated that Lp(a) PRS showed promise as a ‘rule out’ for elevated Lp(a) because certainty of assignment to the low-risk threshold was high for individuals in all populations and across the four thresholds we evaluated. In contrast, while Lp(a) PRS may be useful to identify individuals of European ancestry for follow-up Lp(a) testing (ie, to ‘rule in’ potentially high Lp(a)), certainty was low for African, East Asian and South Asian participants. Improving Lp(a) PRS certainty for non-European populations assigned to high Lp(a) PRS thresholds—the individuals at highest risk of Lp(a)-associated CVD—is important because Lp(a) remains a ‘hidden CVD risk factor’. In addition to not being part of a standard lipid panel, clinical guidelines do not routinely recommend Lp(a) measurement.11 12 Even as Lp(a) measurement becomes more routine, integration of Lp(a) PRS risk thresholds into medical records or even direct to consumer testing57 could help broaden Lp(a) testing uptake in individuals with high PRS estimates. Although clinical application remains rare, Lp(a) PRS assignment and risk stratification could be performed using summary statistics and workflows, as presented herein, assuming genotypic data are available. These efforts are enabled by growing interests in merging electronic health records and genotypic data,58 the potential for Lp(a) PRS to capture lifetime elevations in Lp(a), the growing popularity of direct to consumer genetic testing, and in the USA, decreasing primary care receipt among individuals with no evident chronic medical conditions.59 Integrating Lp(a) PRS into electronic health record also could inform identification of clinical trials participants.54 Although Lp(a) presents a good ‘test case’ for PRS-informed screening given the modest to high accuracy of PRS, without additional work increasing PRS accuracy for everyone, these efforts will continue to benefit European populations primarily.

One outstanding question is how the genetic architecture of Lp(a) varies by ancestry and the phenotypic consequences of this variation. One potential major driver of Lp(a) ancestral differences are LPA kringle IV type 2 repeats (KIV-2).18 19 KIV-2 repeats are not called in array data, although methods that capture KIV-2 repeats from sequencing data are emerging.60 Because array data are more commonly available, it is unclear the degree to which these innovations will become integrated into PRS estimation. However, causal inference, particularly the development of genetic instruments for extremely low Lp(a), is one area where KIV-2 enabled PRS accuracy gains could be prioritised. Studies in European populations have not identified a strong genetic instrument for extremely low Lp(a).61 The phenotypic consequences of extremely low Lp(a) are poorly understood, although emerging therapies may reduce Lp(a) by ~80%62 and associations with increased type 2 diabetes risk have been reported.50 Like European populations, large proportions of East Asian populations harbour very low Lp(a) levels. These features, combined with marked ancestral heterogeneity in Lp(a)’s genetic architecture, motivate studies in East Asian populations that construct genetic instruments for extremely low Lp(a) and, if feasible, apply these instruments for causal inference.

East Asian populations are not the only population for Lp(a)’s ancestral heterogeneity could enable natural experiments that would be challenging to conduct in studies of European populations. For example, there is a limited understanding of how variants outside LPA regulate Lp(a).61 The genetic architecture of Lp(a) in African populations may be particularly well suited to examine this question because while the LPA locus explains almost all Lp(a) genetic variance in European populations, LPA accounts for ~50% of genetic variance in African populations.21 This finding is consistent with our observations that genome-wide Lp(a) heritability was highest in African populations and that the best performing PRS in African populations included variants outside LPA. However, African populations remain severely under-represented in Lp(a) GWAS.

Finally, we demonstrated how ancestrally diverse PRS could be applied in an external biobank to identify potentially novel associations. To date, available Lp(a) PheWAS have largely focused on cardiovascular traits in predominantly European populations.7 8 63 Continued prioritisation of European populations in Lp(a) PheWAS assumes no ancestral heterogeneity in the genetic architecture of Lp(a) or its phenotypic effects and that European populations can support well powered studies of all phenotypes of interest. Results from BioMe demonstrate how unrealistic these assumptions are. Although accuracy of the Lp(a) PRS in Hispanic/Latinos is likely not as high as accuracy in European populations due to reduced portability of European derived PRS, known associations with IHD and CAD only were identified in Hispanic/Latinos. It remains unknown how many other associations may be missed due to ongoing biases toward European populations.

Several limitations of the present study warrant consideration. First, our results may be limited by the lack of standardised methods used to measure Lp(a). However, little evidence of within-population effect heterogeneity was observed, suggesting modest influence. Second, KIV-2 repeats were not evaluated because sequencing data are not universally available. Studies have demonstrated the existence of common variants in LD with KIV-2 repeats, which captured some degree of KIV-2 repeat variability, thus reducing this concern.64 Third, while incorporating ancestry specific genetic information in developing PRS improved predictive values within each ancestral population substantially, there remained a significant imbalance in predictive performance. This limitation stresses that achieving equally informative PRS across ancestrally diverse populations will not be possible without more fundamental shifts in increasing diversity in genetic research.

In conclusion, the present study emphasises the need to expand scientific inquiry into Lp(a) through deliberate prioritisation of ancestrally diverse populations. Although we used PRS as an example of how such expansion could broaden understanding of Lp(a), we anticipate that there will be further gains from increased diversity that do not include genomics or PRS more specifically. At best, continued failure to expand Lp(a) research priorities to include ancestrally diverse populations will perpetuate Lp(a) evidence gaps. At worst, this continued failure will perpetuate health disparities by limiting the relevance of precision medicine in non-European populations.

Data availability statement

Data may be obtained from a third party and are not publicly available. All data are available through parent studies.

Ethics statements

Patient consent for publication

Ethics approval

This study involves human participants and was approved by the institutional review board of the University of North Carolina at Chapel Hill (#19-2281) and complies with the Declaration of Helsinki. Participants gave informed consent to participate in the study before taking part.


Supplementary materials

  • Supplementary Data

    This web only file has been produced by the BMJ Publishing Group from an electronic file supplied by the author(s) and has not been edited for content.


  • Twitter @jantonioglopez, @christy_avery

  • CLA and MG contributed equally.

  • Contributors All authors reviewed this work and agree to publication. CLA acts as guarantor and accepts full responsibility for the work and the conduct of the study, had full access to the data, and controlled the decision to publish.

  • Funding This work was supported by UK Biobank application 25953. The following grants supported this study: R01HL152828 (Avery, Ballou, Howard, North), R01HL151152 (Avery, Gignoux, Graff, North), R01HG010297, R01HG011345 (Avery, Gignoux, Graff, North), T32HL007055 (Lee) and F32HL149256 (Lee). Amgen Inc (Thousand Oaks, California) partially funded this study. The PAGE (Population Architecture Using Genomics and Epidemiology) programme is funded by the National Human Genome Research Institute with cofunding from the National Institute on Minority Health and Health Disparities and the National Heart, Lung, and Blood Institute. Assistance with data management, data integration, data dissemination, genotype imputation, ancestry deconvolution, population genetics, analysis pipelines and general study coordination was provided by the PAGE Coordinating Center (NI-HU01HG007419). Genotyping services were provided by the Center for Inherited Disease Research, which is fully funded through a federal contract from the National Institutes of Health (NIH) to The Johns Hopkins University, contract number HHSN268201200008I. Genotype data quality control and quality assurance services were provided by the Genetic Analysis Center in the Biostatistics Department of the University of Washington, through support provided by the Center for Inherited Disease Research contract. PAGE data and materials included in this report were funded through the following studies and organisations: (1) The Atherosclerosis Risk in Communities study (ARIC): The Atherosclerosis Risk in Communities study has been funded in whole or in part with Federal funds from the National Heart, Lung, and Blood Institute, National Institutes of Health, Department of Health and Human Services (contract numbers HHSN268201700001I, HHSN268201700002I, HHSN268201700003I, HHSN268201700004I and HHSN268201700005I), R01HL087641, R01HL059367 and R01HL086694; National Human Genome Research Institute contract U01HG004402; and National Institutes of Health contract HHSN268200625226C. Measurement of Lp(a) was supported by Denka Seiken. The authors thank the staff and participants of the ARIC study for their important contributions. Infrastructure was partly supported by Grant Number UL1RR025005, a component of the National Institutes of Health and NIH Roadmap for Medical Research. (2) The Coronary Artery Risk Development in Young Adults Study (CARDIA): The Coronary Artery Risk Development in Young Adults Study (CARDIA) is supported by contracts HHSN268201800003I, HHSN268201800004I, HHSN268201800005I, HHSN268201800006I and HHSN268201800007I from the National Heart, Lung, and Blood Institute (NHLBI). CARDIA is also partially supported by the Intramural Research Program of the National Institute on Aging (NIA) and an intra-agency agreement between NIA and NHLBI (AG0005). GWAS genotyping and data analyses were funded in part by grants U01-HG004729 and R01-HL093029 from the National Institutes of Health to Dr Myriam Fornage. (3) Women’s Health Initiative (WHI): The WHI programme is funded by the National Heart, Lung, and Blood Institute, National Institutes of Health, US Department of Health and Human Services through contracts 75N92021D00001, 75N92021D00002, 75N92021D00003, 75N92021D00004, 75N92021D00005. (4) The Jackson Heart Study (JHS) is supported and conducted in collaboration with Jackson State University (HHSN268201800013I), Tougaloo College (HHSN268201800014I), the Mississippi State Department of Health (HHSN268201800015I) and the University of Mississippi Medical Center (HHSN268201800010I, HHSN268201800011I and HHSN268201800012I) contracts from the National Heart, Lung, and Blood Institute (NHLBI) and the National Institute on Minority Health and Health Disparities (NIMHD). The authors also wish to thank the staffs and participants of the JHS. (5) The MESA project is conducted and supported by the National Heart, Lung, and Blood Institute (NHLBI) in collaboration with MESA investigators. Support for MESA is provided by contracts 75N92020D00001, HHSN268201500003I, N01-HC-95159, 75N92020D00005, N01-HC-95160, 75N92020D00002, N01-HC-95161, 75N92020D00003, N01-HC-95162, 75N92020D00006, N01-HC-95163, 75N92020D00004, N01-HC-95164, 75N92020D00007, N01-HC-95165, N01-HC-95166, N01-HC-95167, N01-HC-95168, N01-HC-95169, UL1-TR-000040, UL1-TR-001079, UL1-TR-001420. Also supported in part by the National Center for Advancing Translational Sciences, CTSI grant UL1TR001881, and the National Institute of Diabetes and Digestive and Kidney Disease Diabetes Research Center (DRC) grant DK063491 to the Southern California Diabetes Endocrinology Research Center. Infrastructure for the CHARGE Consortium is supported in part by the National Heart, Lung, and Blood Institute (NHLBI) grant R01HL105756. (6) BioMe: The Mount Sinai BioMe Biobank is supported by The Andrea and Charles Bronfman Philanthropies. We thank all participants and all our recruiters who have assisted and continue to assist in data collection and management. We are grateful for the computational resources and staff expertise provided by Scientific Computing at the Icahn School of Medicine at Mount Sinai. Whole genome sequencing (WGS) for the Trans-Omics in Precision Medicine (TOPMed) programme was supported by the National Heart, Lung and Blood Institute (NHLBI). WGS for 'NHLBI TOPMed: Jackson Heart Study' (phs000964) was performed at the Northwest Genomics Center (HHSN268201100037C). Core support including centralised genomic read mapping and genotype calling, along with variant quality metrics and filtering were provided by the TOPMed Informatics Research Center (3R01HL-117626-02S1; contract HHSN268201800002I). Core support including phenotype harmonisation, data management, sample-identity QC and general programme coordination were provided by the TOPMed Data Coordinating Center (R01HL-120393; U01HL-120393; contract HHSN268201800001I). We gratefully acknowledge the studies and participants who provided biological samples and data for TOPMed. The project described was supported by the National Center for Advancing Translational Sciences, National Institutes of Health, through Grant KL2TR002490 (LMR). The views expressed in this manuscript are those of the authors and do not necessarily represent the views of the National Heart, Lung, and Blood Institute; the National Institutes of Health; or the US Department of Health and Human Services.

  • Competing interests None declared.

  • Provenance and peer review Not commissioned; externally peer reviewed.

  • Supplemental material This content has been supplied by the author(s). It has not been vetted by BMJ Publishing Group Limited (BMJ) and may not have been peer-reviewed. Any opinions or recommendations discussed are solely those of the author(s) and are not endorsed by BMJ. BMJ disclaims all liability and responsibility arising from any reliance placed on the content. Where the content includes any translated material, BMJ does not warrant the accuracy and reliability of the translations (including but not limited to local regulations, clinical guidelines, terminology, drug names and drug dosages), and is not responsible for any error and/or omissions arising from translation and adaptation or otherwise.