Day-17 - functional annotation with VEP and LOFTEE

Post date: Oct 28, 2014 12:40:24 PM

I'm going to start with the results from DNA.LAND and run VEP and the LOFTEE plugin to see what loss-of-function variants I have.

(1) make a file of only the variants I have

bcft view cod_imputed.vcf.gz --min-ac 1 | bgzip > cod_imputed.variant.vcf.gz

This reduces the number of variants from 39,128,182 to 3,757,309 variants (1,550,062 of which are homozygous).

(2) annotate

perl variant_effect_predictor.pl \

-i cod_imputed.variant.vcf.gz \

-o cod_VEP_LOFTEE \

--dir_plugins /bin/vep_plugins/ \

--cache --force_overwrite --fork 14 --buffer_size 10000 \

--dir /bin/vep_data/ \

--plugin LoF,human_ancestor_fa:/bin/vep_data/human_ancestor.fa.rz \

--vcf

bgzip cod_VEP_LOFTEE && cod_VEP_LOFTEE.vcf.gz

I now have a list of results consisting of LOFTEE-annotated variants. Generally, they are pretty harmless.

Pull out the variants with damaging consequences, for example 61 frameshifts:

grep 'frameshift_variant' cod_VEP_LOFTEE.vcf | grep '1/1' | grep 'HC' | gawk '{print $1,$2,$3,$4,$5,$8 }' | sort -nrk1 -nrk2 | cut -f1 -d ','

22 45810275 rs202120654 G GA NS=1;AC=2;AN=2;CSQ=A|ENSG00000128408|ENST00000342894|Transcript|5_prime_UTR_variant&feature_elongation|398-399|||||||1||||

22 18222868 rs71690189 GGCCACGCTCAACT G NS=1;AC=2;AN=2;CSQ=-|ENSG00000015475|ENST00000342111|Transcript|frameshift_variant&feature_truncation|375-387|285-297|95-99|||||-1|POSITION:0.717391304347826|||HC

21 34169316 rs199532219 G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000205929|ENST00000382373|Transcript|intron_variant&feature_elongation||||||||-1||||

19 58413071 rs59723971 T TG NS=1;AC=2;AN=2;CSQ=G|ENSG00000269476|ENST00000602124|Transcript|frameshift_variant&NMD_transcript_variant&feature_elongation|453-454|46-47|16|||||-1||||

19 52803669 rs3217319 CTG C NS=1;AC=2;AN=2;CSQ=-|ENSG00000198464|ENST00000334564|Transcript|frameshift_variant&feature_truncation|69-70|5-6|2|||||1|POSITION:0.00405679513184584|||HC

19 52097560 rs141848717 G GA NS=1;AC=2;AN=2;CSQ=A|ENSG00000267808|ENST00000601315|Transcript|upstream_gene_variant|||||||4981|1||||

19 44589998 rs201737232 TTC T NS=1;AC=2;AN=2;CSQ=-|ENSG00000267022|ENST00000591793|Transcript|3_prime_UTR_variant&NMD_transcript_variant&feature_truncation|2294-2295|||||||1||||

19 41928867 rs3217385 G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000177191|ENST00000601379|Transcript|downstream_gene_variant|||||||3093|-1||||

19 21299773 rs202144538 T TTA NS=1;AC=2;AN=2;CSQ=TA|ENSG00000160352|ENST00000596143|Transcript|frameshift_variant&feature_elongation|628-629|303-304|101-102|||||1|POSITION:0.181981981981982|||HC

19 20981870 rs74172370 TC T NS=1;AC=2;AN=2;CSQ=-|ENSG00000160229|ENST00000594534|Transcript|frameshift_variant&feature_truncation|429|291|97|||||1|POSITION:0.470873786407767|||HC

19 20807177 rs35575803 G GA NS=1;AC=2;AN=2;CSQ=A|ENSG00000188171|ENST00000601440|Transcript|frameshift_variant&feature_elongation|1652-1653|1505-1506|502|||||-1|POSITION:0.94833018273472|||HC

18 3452222 rs11571510 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000177426|ENST00000345133|Transcript|intron_variant&feature_truncation||||||||1||||

18 3452221 rs202189171 CCT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000177426|ENST00000345133|Transcript|intron_variant&feature_truncation||||||||1||||

17 26708547 rs199750023 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000004139|ENST00000582323|Transcript|upstream_gene_variant|||||||3297|1||||

16 88599700 rs67322929 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000179588|ENST00000319555|Transcript|frameshift_variant&feature_truncation|1657|1335|445|||||1|POSITION:0.441906653426018|||HC

16 88599695 rs67712719 GGA G NS=1;AC=2;AN=2;CSQ=-|ENSG00000179588|ENST00000319555|Transcript|frameshift_variant&feature_truncation|1652-1653|1330-1331|444|||||1|POSITION:0.440582588546839|||HC

16 12027474 rs71139503 CAT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000234719|ENST00000356023|Transcript|upstream_gene_variant|||||||3981|-1||||

16 732287 rs3216838 GC G NS=1;AC=2;AN=2;CSQ=-|ENSG00000161999|ENST00000565302|Transcript|non_coding_exon_variant&nc_transcript_variant&feature_truncation|1385|||||||-1||||

15 27516740 rs76888516 GAGTC G NS=1;AC=2;AN=2;CSQ=-|ENSG00000182256|ENST00000554696|Transcript|frameshift_variant&feature_truncation|53-56|53-56|18-19|||||1|POSITION:0.0723514211886305|||HC

13 113688286 rs57693403 TC T NS=1;AC=2;AN=2;CSQ=-|ENSG00000126217|ENST00000434480|Transcript|intron_variant&feature_truncation||||||||1||||

11 119993897 rs66674912 CCT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000137699|ENST00000532195|Transcript|downstream_gene_variant|||||||4217|-1||||

11 56380546 rs72003051 CCAGA C NS=1;AC=2;AN=2;CSQ=-|ENSG00000255012|ENST00000526538|Transcript|frameshift_variant&feature_truncation|429-432|429-432|143-144|||||-1|POSITION:0.455696202531646|SINGLE_EXON||HC

11 704604 rs35782494 GA G NS=1;AC=2;AN=2;CSQ=-|ENSG00000177106|ENST00000527199|Transcript|upstream_gene_variant|||||||612|1||||

10 27702256 rs112067123 G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000182077|ENST00000438700|Transcript|frameshift_variant&feature_elongation|1041-1042|923-924|308|||||-1|POSITION:0.400607638888889|||HC

9 137968914 rs199830264 G GGA NS=1;AC=2;AN=2;CSQ=GA|ENSG00000130558|ENST00000371801|Transcript|downstream_gene_variant|||||||581|1||||

8 74581290 . A AG NS=1;AC=2;AN=2;CSQ=G|ENSG00000040341|ENST00000521451|Transcript|intron_variant&feature_elongation||||||||-1||||

7 44180079 rs5883888 A AG NS=1;AC=2;AN=2;CSQ=G|ENSG00000106631|ENST00000457910|Transcript|downstream_gene_variant|||||||132|-1||||

6 41721682 rs59644697 G GT NS=1;AC=2;AN=2;CSQ=T|ENSG00000096088|ENST00000415707|Transcript|frameshift_variant&feature_elongation|165-166|24-25|8-9|||||-1|POSITION:0.0930232558139535|||HC

6 33048693 rs202162010 CA C NS=1;AC=2;AN=2;CSQ=-|ENSG00000223865|ENST00000488575|Transcript|non_coding_exon_variant&nc_transcript_variant&feature_truncation|395|||||||1||||

6 33048688 rs141530233 G GA NS=1;AC=2;AN=2;CSQ=A|ENSG00000223865|ENST00000488575|Transcript|non_coding_exon_variant&nc_transcript_variant&feature_elongation|389-390|||||||1||||

6 31324603 rs200186034 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000234745|ENST00000481849|Transcript|upstream_gene_variant|||||||2021|-1||||

6 3264526 rs200271803 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000180822|ENST00000454610|Transcript|intron_variant&feature_truncation||||||||1||||

6 3264524 rs34894854 C CT NS=1;AC=2;AN=2;CSQ=T|ENSG00000180822|ENST00000454610|Transcript|intron_variant&feature_elongation||||||||1||||

5 149374881 rs200200374 TG T NS=1;AC=2;AN=2;CSQ=-|ENSG00000164296|ENST00000532987|Transcript|downstream_gene_variant|||||||4521|-1||||

5 149374879 rs3832324 CT C NS=1;AC=2;AN=2;CSQ=-|ENSG00000164296|ENST00000532987|Transcript|downstream_gene_variant|||||||4523|-1||||

5 140242451 rs11321479 GC G NS=1;AC=2;AN=2;CSQ=-|ENSG00000204970|ENST00000504120|Transcript|intron_variant&feature_truncation||||||||1||||

5 131324250 rs3043838 C CTG NS=1;AC=2;AN=2;CSQ=TG|ENSG00000164398|ENST00000489047|Transcript|upstream_gene_variant|||||||467|-1||||

4 187112348 rs201551129 G GT NS=1;AC=2;AN=2;CSQ=T|ENSG00000145476|ENST00000378802|Transcript|upstream_gene_variant|||||||325|1||||

3 100053684 rs34835464 C CG NS=1;AC=2;AN=2;CSQ=G|ENSG00000114021|ENST00000497785|Transcript|frameshift_variant&feature_elongation|30-31|32-33|11|||||1|POSITION:0.040251572327044|||HC

3 98110406 rs144759043 G GA NS=1;AC=2;AN=2;CSQ=A|ENSG00000251088|ENST00000508616|Transcript|intron_variant&nc_transcript_variant&feature_elongation||||||||1||||

3 98073591 rs11288615 TA T NS=1;AC=2;AN=2;CSQ=-|ENSG00000251088|ENST00000508616|Transcript|intron_variant&nc_transcript_variant&feature_truncation||||||||1||||

3 56591278 rs150150392 T TGGGGTAAGCA NS=1;AC=2;AN=2;CSQ=GGGGTAAGCA|ENSG00000180376|ENST00000469966|Transcript|upstream_gene_variant|||||||456|1||||

3 44540791 rs201582604 TTC T NS=1;AC=2;AN=2;CSQ=-|ENSG00000178917|ENST00000489411|Transcript|downstream_gene_variant|||||||222|-1||||

2 185802209 rs200158920 A AAC NS=1;AC=2;AN=2;CSQ=AC|ENSG00000170396|ENST00000302277|Transcript|frameshift_variant&feature_elongation|2680-2681|2086-2087|696|||||1|POSITION:0.57465564738292|||HC

2 120704165 . TG T NS=1;AC=2;AN=2;CSQ=-|ENSG00000088179|ENST00000544261|Transcript|intron_variant&feature_truncation||||||||1||||

2 120704164 rs3214717 A AT NS=1;AC=2;AN=2;CSQ=T|ENSG00000088179|ENST00000544261|Transcript|intron_variant&feature_elongation||||||||1||||

2 105953938 . G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000272994|ENST00000610036|Transcript|upstream_gene_variant|||||||6|-1||||

2 24387178 rs3214485 G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000266118|ENST00000584973|Transcript|frameshift_variant&NMD_transcript_variant&feature_elongation|601-602|512-513|171|||||1||||

1 248525638 rs34079073 CA C NS=1;AC=2;AN=2;CSQ=-|ENSG00000196944|ENST00000366475|Transcript|frameshift_variant&feature_truncation|757|757|253|||||1|POSITION:0.723018147086915|SINGLE_EXON||HC

1 245133549 rs10536649 GGC G NS=1;AC=2;AN=2;CSQ=-|ENSG00000272195|ENST00000607453|Transcript|frameshift_variant&feature_truncation|840-841|826-827|276|||||-1|POSITION:0.75941230486685|SINGLE_EXON||HC

1 220603312 . TGA T NS=1;AC=2;AN=2;CSQ=-|ENSG00000224867|ENST00000416839|Transcript|frameshift_variant&feature_truncation|125-126|125-126|42|||||-1|POSITION:0.823529411764706|||HC

1 220603310 rs10553491 TGTGA T NS=1;AC=2;AN=2;CSQ=-|ENSG00000224867|ENST00000416839|Transcript|frameshift_variant&feature_truncation|125-128|125-128|42-43|||||-1|POSITION:0.836601307189543|||HC

1 203023239 . T TG NS=1;AC=2;AN=2;CSQ=G|ENSG00000143847|ENST00000600447|Transcript|intron_variant&nc_transcript_variant&feature_elongation||||||||1||||

1 54605319 rs79249469 G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000256407|ENST00000311841|Transcript|downstream_gene_variant|||||||3745|-1||||

X 153151284 chrX:153151284:D GT G NS=1;AC=2;AN=2;CSQ=-|ENSG00000198910|ENST00000464967|Transcript|intron_variant&nc_transcript_variant&feature_truncation||||||||-1||||

X 153151280 chrX:153151280:I G GCC NS=1;AC=2;AN=2;CSQ=CC|ENSG00000198910|ENST00000464967|Transcript|intron_variant&nc_transcript_variant&feature_elongation||||||||-1||||

X 153149708 chrX:153149708:I G GC NS=1;AC=2;AN=2;CSQ=C|ENSG00000198910|ENST00000464967|Transcript|intron_variant&nc_transcript_variant&feature_elongation||||||||-1||||

X 153149707 chrX:153149707:I C CG NS=1;AC=2;AN=2;CSQ=G|ENSG00000198910|ENST00000464967|Transcript|intron_variant&nc_transcript_variant&feature_elongation||||||||-1||||

X 122336600 chrX:122336600:I T TG NS=1;AC=2;AN=2;CSQ=G|ENSG00000125675|ENST00000542149|Transcript|intron_variant&feature_elongation||||||||1||||

X 104464281 chrX:104464281:D TC T NS=1;AC=2;AN=2;CSQ=-|ENSG00000189108|ENST00000344799|Transcript|intron_variant&feature_truncation||||||||1||||

X 104464276 chrX:104464276:D CA C NS=1;AC=2;AN=2;CSQ=-|ENSG00000189108|ENST00000344799|Transcript|intron_variant&feature_truncation||||||||1||||

I don't really feel like going into these in detail, but what are the themes? I used http://www.biomart.org/ and searched against GO and MIM

Interesting.