In ELSA, we have provided the genetic data for detailed analyses of a wide range of age-related traits and outcomes in association with the genetic factors in our large and phenotypically well-characterised sample of older people born in England.

Because there may be some researchers who wish to employ genetic data in their research without access to the necessary tools or knowledge to carry out robust quality control or imputation of the untyped genotypes, we have carried out all of these within the team and made the data available for all to use.

Therefore, several genetic data products derived from our sample are available as detailed below. 

If you have any questions about the ELSA genetic data products, contact the ELSA Team and we will be happy to help you. 

Directly genotyped data

Available data Directly genotyped data for 7,412 ELSA participants and 2,230,767 SNPs. These data are for researchers who wish to carry out the quality control of the genotyped genetic data in ELSA. Access These data are accessed via the European Genome-phenome Archive (EGA). In detail

The genome-wide genotyping was performed at University College London (UCL) Genomics in 2013-2014.

This involved genotyping of 7,597 ELSA participants of European ancestry using the llumina HumanOmni2.5 BeadChips (HumanOmni2.5-4v1, HumanOmni2.5-8v1.3), which measures ~2.5 million markers that capture the genomic variation down to 2.5% minor allele frequency. Genotyping was performed in two batches.

Allele frequencies were compared between the batches after filtering for 5% of missingness. The correlation was calculated between the batches for a number of chromosomes and exceeded 99%.

After post-genotyping quality assurance, such as excluding ethnic outliers (self-reported) and duplicates, the GWAS data was available for total 7,412 ELSA participants and 2,230,767 SNPs.

For more information, contact the ELSA Team.

Quality controlled genetic data

Using methods employed in the Heath and Retirement Study, we carried out robust quality control of the genotyped genetic data in ELSA. In total, 7,183 samples (96.9% of 7,412 original cohort) and 1,372,240 (61.5% of 2,230,767) variants remained after quality control. These data are for researchers who wish to utilise clean genotyped data in their research but do not have the right knowledge or the necessary tools to carry out the quality control of large genetic datasets. Access These data are accessed directly from the ELSA Team. In detail

Quality control was performed using PLINK, R and VCFtools. The full QC procedure is depicted in Figure 1.

Samples were removed based on call rate (<0.99), suspected non-European ancestry, autosomal heterozygosity deviation (|Fhet|<0.2), and relatedness. SNPs were excluded if the minor allele frequency (MAF) was <0.01%, if more than 2% of genotype data were missing and if the Hardy-Weinberg Equilibrium (HWE) P-value<10−4.

Non-autosomal markers were also removed. The indels and chromosome X were also excluded.

In total, 7,183 samples (96.9% of 7,412 original cohort) and 1,372,240 (61.5% of 2,230,767) variants remained after quality control.

Download the detailed report.

Figure 1. QC steps that were undertaken as part of quality control in ELSA (Download the image)

Genetic imputation

Data available For those researchers who wish to utilise the cleaned genetic data with genotypes estimated through imputation that were not assayed, we conducted imputation of the untyped genotypes using the Haplotype Reference Consortium (HRC.r1-1.GRCh37) as the reference panel and subsequently carried out robust quality control of the imputed data. Access Imputed data, either raw or quality controlled, are available directly from the ELSA Team. In detail

To estimate genotypes that were not assayed, imputation was performed on the Michigan Imputation Server5 running SHAPEIT for pre-phasing6, and Minimac3 for imputation7,8 using the Haplotype Reference Consortium (HRC.r1-1.GRCh37)5,9 as the reference panel. All variants aligned to human genome build 19 (hg19).

After imputation, we required very high imputation quality (INFO>0.95), low missingness (<1%) for further quality control. We limited our analyses to variants genotyped or imputed with HWE P-value>10−5. We further applied stringent pruning to remove markers in high linkage disequilibrium (r2>0.1) and excluding high linkage disequilibrium genomic regions. In order to investigate population structure, we chose less correlated SNPs for principal components analysis.

The SNP pruning was performed following the procedure:

i) Consider a window of 50 SNPs

ii) Calculate linkage disequilibrium between each pair of SNPs in the window

iii) Remove one of a pair of SNPs if the linkage disequilibrium is greater than 0.5

iv) Shift the window 5 SNPs forward and v) repeat the procedure.

Altogether, 1,083,252 autosomal SNPs remained after the SNP pruning and were used to run principal components analysis.The top 10 principal components retained to account for any ancestry differences in genetic structures that could potentially bias the results.

After the sample quality control, 7,179,780 variants and 7,183 samples were kept.

More detail can be downloaded here.

Genetic imputation using the reference from 1000 Genomes Project

Prior to the method described above, genetic imputation was carried out using the reference panel from the 1000 Genomes Project.

Download the report.

Polygenic Score Data (PGS)

Data available Polygenic Score Data (PGS) are available for a number of behavioural, emotional and health-related phenotypes. Access These data will soon be available publicly through the UK Data Service. In the meantime, please contact the ELSA Team. In detail Many health and behavioural outcomes, such as smoking, obesity, Alzheimer’s disease and schizophrenia, have been shown to be highly polygenic implying that their genetic architecture consists of “many” genetic variants. Creating PGSs is a method that captures this signature. A polygenic score (PGS) aggregates millions of individual loci across the human genome and weights them by the strength of their association to produce a single quantitative measure of genetic risk. The methods employed for creating PGSs in ELSA are those outlined by the Health and Retirement Study (HRS). This was done in order to harmonise the research across age-related longitudinal studies by adopting a consistent methodology for creating PGSs. By making these PGSs publicly available, it is hoped that they will facilitate wide use among the ELSA data users. PGSs for each phenotype are based on a single, replicated genome-wide association study (GWAS). These scores will be updated as sufficiently large GWAS are published for new phenotypes or as updated meta-analyses for existing phenotypes are released. Download a detailed report describing the methods employed. Download a detailed list of the PGS available.


Intro videos

User day


Accessibility    |   Disclaimer   |    Copyright  |    Contact Us

  • Twitter - White Circle
  • Email ELSA

Website design by Amber Simpson