Day-17 - functional annotation with VEP and LOFTEE
Post date: Oct 28, 2014 12:40:24 PM
I'm going to start with the results from DNA.LAND and run VEP and the LOFTEE plugin to see what loss-of-function variants I have.
(1) make a file of only the variants I have
bcft view cod_imputed.vcf.gz --min-ac 1 | bgzip > cod_imputed.variant.vcf.gz
This reduces the number of variants from 39,128,182 to 3,757,309 variants (1,550,062 of which are homozygous).
(2) annotate
perl variant_effect_predictor.pl \
-i cod_imputed.variant.vcf.gz \
-o cod_VEP_LOFTEE \
--dir_plugins /bin/vep_plugins/ \
--cache --force_overwrite --fork 14 --buffer_size 10000 \
--dir /bin/vep_data/ \
--plugin LoF,human_ancestor_fa:/bin/vep_data/human_ancestor.fa.rz \
--vcf
bgzip cod_VEP_LOFTEE && cod_VEP_LOFTEE.vcf.gz
I now have a list of results consisting of LOFTEE-annotated variants. Generally, they are pretty harmless.
Pull out the variants with damaging consequences, for example 61 frameshifts:
grep 'frameshift_variant' cod_VEP_LOFTEE.vcf | grep '1/1' | grep 'HC' | gawk '{print $1,$2,$3,$4,$5,$8 }' | sort -nrk1 -nrk2 | cut -f1 -d ','
22 45810275 rs202120654 G GA NS=1;AC=2;AN=2;CSQ=A|ENSG00000128408|ENST00000342894|Transcript|5_prime_UTR_variant&feature_elongation|398-399|||||||1||||
22 18222868 rs71690189 GGCCACGCTCAACT G NS=1;AC=2;AN=2;CSQ=-|ENSG00000015475|ENST00000342111|Transcript|frameshift_variant&feature_truncation|375-387|285-297|95-99|||||-1|POSITION:0.717391304347826|||HC
21 34169316 rs199532219 G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000205929|ENST00000382373|Transcript|intron_variant&feature_elongation||||||||-1||||
19 58413071 rs59723971 T TG NS=1;AC=2;AN=2;CSQ=G|ENSG00000269476|ENST00000602124|Transcript|frameshift_variant&NMD_transcript_variant&feature_elongation|453-454|46-47|16|||||-1||||
19 52803669 rs3217319 CTG C NS=1;AC=2;AN=2;CSQ=-|ENSG00000198464|ENST00000334564|Transcript|frameshift_variant&feature_truncation|69-70|5-6|2|||||1|POSITION:0.00405679513184584|||HC
19 52097560 rs141848717 G GA NS=1;AC=2;AN=2;CSQ=A|ENSG00000267808|ENST00000601315|Transcript|upstream_gene_variant|||||||4981|1||||
19 44589998 rs201737232 TTC T NS=1;AC=2;AN=2;CSQ=-|ENSG00000267022|ENST00000591793|Transcript|3_prime_UTR_variant&NMD_transcript_variant&feature_truncation|2294-2295|||||||1||||
19 41928867 rs3217385 G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000177191|ENST00000601379|Transcript|downstream_gene_variant|||||||3093|-1||||
19 21299773 rs202144538 T TTA NS=1;AC=2;AN=2;CSQ=TA|ENSG00000160352|ENST00000596143|Transcript|frameshift_variant&feature_elongation|628-629|303-304|101-102|||||1|POSITION:0.181981981981982|||HC
19 20981870 rs74172370 TC T NS=1;AC=2;AN=2;CSQ=-|ENSG00000160229|ENST00000594534|Transcript|frameshift_variant&feature_truncation|429|291|97|||||1|POSITION:0.470873786407767|||HC
19 20807177 rs35575803 G GA NS=1;AC=2;AN=2;CSQ=A|ENSG00000188171|ENST00000601440|Transcript|frameshift_variant&feature_elongation|1652-1653|1505-1506|502|||||-1|POSITION:0.94833018273472|||HC
18 3452222 rs11571510 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000177426|ENST00000345133|Transcript|intron_variant&feature_truncation||||||||1||||
18 3452221 rs202189171 CCT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000177426|ENST00000345133|Transcript|intron_variant&feature_truncation||||||||1||||
17 26708547 rs199750023 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000004139|ENST00000582323|Transcript|upstream_gene_variant|||||||3297|1||||
16 88599700 rs67322929 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000179588|ENST00000319555|Transcript|frameshift_variant&feature_truncation|1657|1335|445|||||1|POSITION:0.441906653426018|||HC
16 88599695 rs67712719 GGA G NS=1;AC=2;AN=2;CSQ=-|ENSG00000179588|ENST00000319555|Transcript|frameshift_variant&feature_truncation|1652-1653|1330-1331|444|||||1|POSITION:0.440582588546839|||HC
16 12027474 rs71139503 CAT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000234719|ENST00000356023|Transcript|upstream_gene_variant|||||||3981|-1||||
16 732287 rs3216838 GC G NS=1;AC=2;AN=2;CSQ=-|ENSG00000161999|ENST00000565302|Transcript|non_coding_exon_variant&nc_transcript_variant&feature_truncation|1385|||||||-1||||
15 27516740 rs76888516 GAGTC G NS=1;AC=2;AN=2;CSQ=-|ENSG00000182256|ENST00000554696|Transcript|frameshift_variant&feature_truncation|53-56|53-56|18-19|||||1|POSITION:0.0723514211886305|||HC
13 113688286 rs57693403 TC T NS=1;AC=2;AN=2;CSQ=-|ENSG00000126217|ENST00000434480|Transcript|intron_variant&feature_truncation||||||||1||||
11 119993897 rs66674912 CCT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000137699|ENST00000532195|Transcript|downstream_gene_variant|||||||4217|-1||||
11 56380546 rs72003051 CCAGA C NS=1;AC=2;AN=2;CSQ=-|ENSG00000255012|ENST00000526538|Transcript|frameshift_variant&feature_truncation|429-432|429-432|143-144|||||-1|POSITION:0.455696202531646|SINGLE_EXON||HC
11 704604 rs35782494 GA G NS=1;AC=2;AN=2;CSQ=-|ENSG00000177106|ENST00000527199|Transcript|upstream_gene_variant|||||||612|1||||
10 27702256 rs112067123 G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000182077|ENST00000438700|Transcript|frameshift_variant&feature_elongation|1041-1042|923-924|308|||||-1|POSITION:0.400607638888889|||HC
9 137968914 rs199830264 G GGA NS=1;AC=2;AN=2;CSQ=GA|ENSG00000130558|ENST00000371801|Transcript|downstream_gene_variant|||||||581|1||||
8 74581290 . A AG NS=1;AC=2;AN=2;CSQ=G|ENSG00000040341|ENST00000521451|Transcript|intron_variant&feature_elongation||||||||-1||||
7 44180079 rs5883888 A AG NS=1;AC=2;AN=2;CSQ=G|ENSG00000106631|ENST00000457910|Transcript|downstream_gene_variant|||||||132|-1||||
6 41721682 rs59644697 G GT NS=1;AC=2;AN=2;CSQ=T|ENSG00000096088|ENST00000415707|Transcript|frameshift_variant&feature_elongation|165-166|24-25|8-9|||||-1|POSITION:0.0930232558139535|||HC
6 33048693 rs202162010 CA C NS=1;AC=2;AN=2;CSQ=-|ENSG00000223865|ENST00000488575|Transcript|non_coding_exon_variant&nc_transcript_variant&feature_truncation|395|||||||1||||
6 33048688 rs141530233 G GA NS=1;AC=2;AN=2;CSQ=A|ENSG00000223865|ENST00000488575|Transcript|non_coding_exon_variant&nc_transcript_variant&feature_elongation|389-390|||||||1||||
6 31324603 rs200186034 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000234745|ENST00000481849|Transcript|upstream_gene_variant|||||||2021|-1||||
6 3264526 rs200271803 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000180822|ENST00000454610|Transcript|intron_variant&feature_truncation||||||||1||||
6 3264524 rs34894854 C CT NS=1;AC=2;AN=2;CSQ=T|ENSG00000180822|ENST00000454610|Transcript|intron_variant&feature_elongation||||||||1||||
5 149374881 rs200200374 TG T NS=1;AC=2;AN=2;CSQ=-|ENSG00000164296|ENST00000532987|Transcript|downstream_gene_variant|||||||4521|-1||||
5 149374879 rs3832324 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000164296|ENST00000532987|Transcript|downstream_gene_variant|||||||4523|-1||||
5 140242451 rs11321479 GC G NS=1;AC=2;AN=2;CSQ=-|ENSG00000204970|ENST00000504120|Transcript|intron_variant&feature_truncation||||||||1||||
5 131324250 rs3043838 C CTG NS=1;AC=2;AN=2;CSQ=TG|ENSG00000164398|ENST00000489047|Transcript|upstream_gene_variant|||||||467|-1||||
4 187112348 rs201551129 G GT NS=1;AC=2;AN=2;CSQ=T|ENSG00000145476|ENST00000378802|Transcript|upstream_gene_variant|||||||325|1||||
3 100053684 rs34835464 C CG NS=1;AC=2;AN=2;CSQ=G|ENSG00000114021|ENST00000497785|Transcript|frameshift_variant&feature_elongation|30-31|32-33|11|||||1|POSITION:0.040251572327044|||HC
3 98110406 rs144759043 G GA NS=1;AC=2;AN=2;CSQ=A|ENSG00000251088|ENST00000508616|Transcript|intron_variant&nc_transcript_variant&feature_elongation||||||||1||||
3 98073591 rs11288615 TA T NS=1;AC=2;AN=2;CSQ=-|ENSG00000251088|ENST00000508616|Transcript|intron_variant&nc_transcript_variant&feature_truncation||||||||1||||
3 56591278 rs150150392 T TGGGGTAAGCA NS=1;AC=2;AN=2;CSQ=GGGGTAAGCA|ENSG00000180376|ENST00000469966|Transcript|upstream_gene_variant|||||||456|1||||
3 44540791 rs201582604 TTC T NS=1;AC=2;AN=2;CSQ=-|ENSG00000178917|ENST00000489411|Transcript|downstream_gene_variant|||||||222|-1||||
2 185802209 rs200158920 A AAC NS=1;AC=2;AN=2;CSQ=AC|ENSG00000170396|ENST00000302277|Transcript|frameshift_variant&feature_elongation|2680-2681|2086-2087|696|||||1|POSITION:0.57465564738292|||HC
2 120704165 . TG T NS=1;AC=2;AN=2;CSQ=-|ENSG00000088179|ENST00000544261|Transcript|intron_variant&feature_truncation||||||||1||||
2 120704164 rs3214717 A AT NS=1;AC=2;AN=2;CSQ=T|ENSG00000088179|ENST00000544261|Transcript|intron_variant&feature_elongation||||||||1||||
2 105953938 . G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000272994|ENST00000610036|Transcript|upstream_gene_variant|||||||6|-1||||
2 24387178 rs3214485 G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000266118|ENST00000584973|Transcript|frameshift_variant&NMD_transcript_variant&feature_elongation|601-602|512-513|171|||||1||||
1 248525638 rs34079073 CA C NS=1;AC=2;AN=2;CSQ=-|ENSG00000196944|ENST00000366475|Transcript|frameshift_variant&feature_truncation|757|757|253|||||1|POSITION:0.723018147086915|SINGLE_EXON||HC
1 245133549 rs10536649 GGC G NS=1;AC=2;AN=2;CSQ=-|ENSG00000272195|ENST00000607453|Transcript|frameshift_variant&feature_truncation|840-841|826-827|276|||||-1|POSITION:0.75941230486685|SINGLE_EXON||HC
1 220603312 . TGA T NS=1;AC=2;AN=2;CSQ=-|ENSG00000224867|ENST00000416839|Transcript|frameshift_variant&feature_truncation|125-126|125-126|42|||||-1|POSITION:0.823529411764706|||HC
1 220603310 rs10553491 TGTGA T NS=1;AC=2;AN=2;CSQ=-|ENSG00000224867|ENST00000416839|Transcript|frameshift_variant&feature_truncation|125-128|125-128|42-43|||||-1|POSITION:0.836601307189543|||HC
1 203023239 . T TG NS=1;AC=2;AN=2;CSQ=G|ENSG00000143847|ENST00000600447|Transcript|intron_variant&nc_transcript_variant&feature_elongation||||||||1||||
1 54605319 rs79249469 G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000256407|ENST00000311841|Transcript|downstream_gene_variant|||||||3745|-1||||
X 153151284 chrX:153151284:D GT G NS=1;AC=2;AN=2;CSQ=-|ENSG00000198910|ENST00000464967|Transcript|intron_variant&nc_transcript_variant&feature_truncation||||||||-1||||
X 153151280 chrX:153151280:I G GCC NS=1;AC=2;AN=2;CSQ=CC|ENSG00000198910|ENST00000464967|Transcript|intron_variant&nc_transcript_variant&feature_elongation||||||||-1||||
X 153149708 chrX:153149708:I G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000198910|ENST00000464967|Transcript|intron_variant&nc_transcript_variant&feature_elongation||||||||-1||||
X 153149707 chrX:153149707:I C CG NS=1;AC=2;AN=2;CSQ=G|ENSG00000198910|ENST00000464967|Transcript|intron_variant&nc_transcript_variant&feature_elongation||||||||-1||||
X 122336600 chrX:122336600:I T TG NS=1;AC=2;AN=2;CSQ=G|ENSG00000125675|ENST00000542149|Transcript|intron_variant&feature_elongation||||||||1||||
X 104464281 chrX:104464281:D TC T NS=1;AC=2;AN=2;CSQ=-|ENSG00000189108|ENST00000344799|Transcript|intron_variant&feature_truncation||||||||1||||
X 104464276 chrX:104464276:D CA C NS=1;AC=2;AN=2;CSQ=-|ENSG00000189108|ENST00000344799|Transcript|intron_variant&feature_truncation||||||||1||||
I don't really feel like going into these in detail, but what are the themes? I used http://www.biomart.org/ and searched against GO and MIM
Interesting.