How to do GWAS in the cloud using Amazon Web Services¶
Genome-wide association studies (GWAS) offer a way to rapidly scan entire genomes and find genetic variation associated with a particular disease condition.
Our aim is to teach researchers how to perform genome wide association analysis using Amazon Web Services (AWS). This tutorial will enable researchers with minimal bioinformatics background to set up and access an AWS instance, move data in and out of the AWS instance, run some basic summary statistics and perform a simple association analysis starting with variant calling files (.vcf). We will also produce Manhattan plots to visualize variants associated with traits.
For this tutorial, we will not work with human data. We will use coat color in dogs as the trait of interest (instead of disease), and test the association of a genome-wide set of single nucleotide polymorphisms (SNPs) with two coat color variants: yellow and dark. To extrapolate this tutorial to human disease data, you might consider yellow coat color phenotype as the "case" (or disease) and dark coat color as the "control" (or normal) condition.
This tutorial is based on the ANGUS 2017 GWAS tutorial
Table of contents
Est. Time | Lesson Name | Description |
---|---|---|
10 mins | What is GWAS? | Background |
20 mins | Set up an AWS instance | How to set up an amazon web services instance |
40 mins | Download and move data to AWS | Download dog coat color data and use terminal to access data on remote computer |
10 mins | Install PLINK | Install the software PLINK |
10 mins | Install VCFtools | Install the software vcftools |
10 mins | Install R and RStudio | Install the software R and RStudio |
10 mins | Analyze | Generate summary statistics and association analysis |
20 mins | Manhattan Plots | Make some plots to visualize data |
10 mins | Terminate AWS Instance | Shut down the cloud computer |
Learning Objectives
- learn to set up and access Amazon Web Services
- learn to move data in and out of the AWS instance
- learn to install and run all software necessary for GWAS analysis
- learn to produce Manhattan plots
- Background: Some expertise in biology and fundamental genetics.
- Technology: Basic shell scripting knowledge and access to MacOS. Users must be comfortable with finding and opening a terminal window, navigating to specific directories and running pre-scripted commands in the terminal.
- Financial: First time AWS users require a valid credit card to set up an AWS account.
- Time: AWS account setup needs approval by AWS, and approval times can range from minutes to days.