UKB

Fun with UKB summary data

September 2018

The objective of this exercise was to use Google's AutoML software for some practical applications in genetics. As proof-of-principal, I decided to use UK Biobank results generated from the lab of my colleague Ben Neale. This was a fun exercise and really just a way for me to get my feet wet with some of Google's machine learning tools.

Objectives:

  • train a visual classifier on Manhattan plots with varying degrees of quality
  • apply the classifier to all results provided by the Neale lab
  • generate a document with a probability and label describing the quality of the GWAS in question, so that we can quickly see what the good and potentially good phenotypes are (a total of >11,000 analytical phenotypes have been shared by the Neale lab)


Step 1: training data

A total of 515 images were uploaded to Google’s AutoML vision platform (https://cloud.google.com/automl/). A minimum of 100 images are recommended for each label for which the user is training their data against. I uploaded 198 images labeled (based on my experience of seeing these plots) as "bad", 106 as "good", 108 as "underpowered bad" and 103 as "underpowered good" respectively. All plots were generated using Manhattan_plot.R from Dan Howrigan.

Below are examples of Manhattan plots falling within each of the categories. Using a naïve score threshold of 0.5 (Google's default), we see precision (a measure of the number of false positives) of 79.6% and recall (a measure of false negatives) of 73.6%. A summary of the model is also shown.


Types of Manhattan plots used in training data

A) "good" Manhattan plots, B) "bad" Manhattan plots, C) "underpowered good" Manhattan plots, D) "underpowered bad" Manhattan plots

Performance of the model

Precision versus recall shown on the left and confusion matrix shown on the right, based on 53 test images selected by AutoML for evaluation.

Step 2: applying the model to the remaining GWAS

I next apply the model to the remaining ~10,500 GWAS results. This process consists of a simple script that downloads the data, generates a Manhattan plot and uses Google's restAPI to apply the model to each plot. It takes a second or so per plot and the output consists of the predicted label ("good", "bad" etc.) for the plot and the probability from the model.

Below is the breakdown of the results. About 15% are determined to be "good" or "Underpowered_good".

Step 3: share the results

Here is a document describing the results from applying the model to all the data. The third column is the probability for the predicted label (good/bad/ underpowered_good/underpowered_bad) and the fourth column is the label itself. Phenotypes that have no prediction were the ones that were used in the training data. Here are the images, test_images.zip, used in the training datasets named by their trait name, e.g. "1140.gwas.imputed_v3.both_sexes.tsv.bgz_Manhattan.png".

Below are some examples of best and worst from some of the predicted labels:

BEST SCORE FOR "GOOD"

6152_8.gwas.imputed_v3.both_sexes, score 0.99999988

WORST SCORE FOR "GOOD"

399_irnt.gwas.imputed_v3.female, score 0.50242817

BEST SCORE FOR "BAD"

22601_54122878.gwas.imputed_v3.both_sexes, score 0.99999571

WORST SCORE FOR "BAD"

6143_4.gwas.imputed_v3.male, score 0.5006054

What are the "good" traits?

If we sort on unique traits with a prediction of "good" and subset to "irnt" instead of "raw" traits, we get a total of about 400 unique traits. I created a version of the original manifest with just these traits, see here. Should make diving into the better powered traits a little easier :)


Caveat emptor!

  • I used the raw manhattan plots as input to the model. I deliberately did not fix the axes but rather left the y-axis variable in scale. We could approach this in another way, e.g. generate a multi-panel Manhattan plot with different levels of zoom on the x-axis (no threshold, max -log10(P) = 10, 5 etc. However, the presence of the genome-wide significance line in red is likely to be informative to the model, which may negate the need for a fixed y-axis scale
  • I defined the types of Manhattan plots used in this analysis and also used a reasonably small training dataset. It is possible that both the training dataset and my expertise are insufficient to define all classes of Manhattan plots