Day-10 - NHGRI GWAS catalogue

Post date: Jul 13, 2011 2:44:00 PM

There is another nice resource out there called the catalog of published genome-wide association studies, available at http://www.genome.gov/gwastudies/. It has it's limitations, but is a nice list of associations that we can be pretty confident about. So what I decided to do here was to compare my 23&me results with this list.

I first made a "light" version of the download (www.genome.gov/admin/gwascatalog.txt), selecting fields of interest - positions, gene-names, control frequency:

Adiposity 13 80959207 SPRY2 SPRY2 - ARF4P4 rs534870-A rs534870 0.68

Adiposity 16 53816275 FTO FTO rs8050136-C rs8050136 0.60

Schizophrenia 19 42066279 ATP5SL, CEACAM21 PLEKHA3P1 - CEACAM21 rs4803480-A rs4803480 0.13

...

I wanted some Caucasion minor allele frequency data also (get_maf.pl is attached)

wget http://hgdownload.cse.ucsc.edu/goldenPath/hg18/database/hapmapSnpsCEU.txt.gz

gunzip hapmapSnpsCEU.txt.gz

perl get_maf.pl > ceu.maf

grep -v arning ceu.maf > ceu.maf.clean

I then ran parse.pl (attached) to make a file called gwascatalogue.res.txt

perl parse_gwas_catalogue.pl gwascatalog.txt.lite

I wanted SNPs that were rare. The file contains control frequency and CEU frequency. Here, I'm pulling out SNPs with frequency <2.5% in CEU (I think this is more accurate than the frequencies listed in the studies).

gawk -F "\t" '$11 <= 0.025 {print }' gwascatalogue.res.txt

disease chr pos geneP..down snp_allelesnp risk_all_cont cod_geno dose ceu_maf

Type 2 diabetes 9 10430602 PTPRD PTPRD rs649891-C rs649891 0.35 CC 2 0.0167

Optic disc parameters 1 92077097 CDC7, TGFBR3 RPL39P13 - HSP90B3P rs1192415-G rs1192415 NR GG 2 0.0167

Parkinson's disease 17 44828931 MAPT NSF rs199533-C rs199533 0.78 AA 0 0.0167

Non-alcoholic fatty liver disease histology (AST) 9 78425925 Intergenic OSTF1 - PCSK5 rs12344488-A rs12344488 0.07 AA 2 0.0169

Optic disc parameters 1 92077097 CDC7,TGFBR3 RPL39P13 - HSP90B3P rs1192415-G rs1192415 0.18 GG 2 0.0167

Optic disc size (disc) 1 92077097 HSP90B3P RPL39P13 - HSP90B3P rs1192415-A rs1192415 NR GG 0 0.0167

Ankylosing spondylitis 5 96129512 ERAP1 ERAP1 rs27434-A rs27434 0.23 AA 2 0.0167

Parkinson's disease 17 44828931 NSF NSF rs199533-C rs199533 0.83 AA 0 0.0167

Parkinson's disease 17 43719143 MAPT, C17orf69, KIAA1267, LOC644246, IMP5 C17orf69 rs393152-A rs393152 0.82 GG 0 0.0167

Obesity (extreme) 10 37982097 ZNF248 MTRNR2L7 - TLK2P2 rs7474896-T rs7474896 0.14 TT 2 0.0167

Rheumatoid arthritis 6 138002637 TNFAIP3, OLIG3 OLIG3 - TNFAIP3 rs10499194-C rs10499194 0.71 TT 0 0.0167

The e.g. 'CC' (3rd last column) are my genotypes and the 'dose' field means how many copies of the risk allele do I have. So, of the above results, I think the following are most interesting:

Type 2 diabetes 9 10430602 PTPRD PTPRD rs649891-C rs649891 0.35 CC 2 0.0167

Optic disc parameters 1 92077097 CDC7, TGFBR3 RPL39P13 - HSP90B3P rs1192415-G rs1192415 NR GG 2 0.0167

Non-alcoholic fatty liver disease histology (AST) 9 78425925 Intergenic OSTF1 - PCSK5 rs12344488-A rs12344488 0.07 AA 2 0.0169

Optic disc parameters 1 92077097 CDC7,TGFBR3 RPL39P13 - HSP90B3P rs1192415-G rs1192415 0.18 GG 2 0.0167

Ankylosing spondylitis 5 96129512 ERAP1 ERAP1 rs27434-A rs27434 0.23 AA 2 0.0167

Obesity (extreme) 10 37982097 ZNF248 MTRNR2L7 - TLK2P2 rs7474896-T rs7474896 0.14 TT 2 0.0167

I have bad eyesight (very bad!) so I'm intrigued by the optic disc parameters findings! The obesity/T2D etc. are not so interesting as common and complex genetic bases.