Introduction
The book “R for Data Science” provides an excellent framework for using data science to turn raw data into understanding, insight, and knowledge. We will use this framework as an outline for this workshop.
R is a statistical computing and data visualization programming language. RStudio is an integrated development environment, or IDE, for R programming. R and RStudio work on Mac, Linux, and Windows operating systems. The RStudio layout displays lots of useful information and allows us to run R code from a script and view and save outputs all from one interface.
When you start RStudio, you will see two key regions in the interface: the console and the output. When working in R, you can type directly into the console, or you can type into a script. Saving commands in a script will make it easier to reproduce. You will learn more as we go along!
For today’s lesson, we will focus on data from the Gene-TissueExpression (GTEx) Project. GTEx is an ongoing effort to build a comprehensive public resource to study tissue-specific gene expression and regulation. Samples were collected from 54 non-diseased tissue sites across nearly 1000 individuals, primarily for molecular assays including WGS, WES, and RNA-Seq.
Getting Started¶
- Click the button to generate a computing environment for this workshop.
- Navigate to the GTEx folder.
- Click
GTEx.Rproj
and click “Yes” to open up an Rproject. This will set the working directory to~/GTEx/
. - If you open the
r4rnaseq-workshop.R
file which contains all the commands for today’s workshop, you can click through this and all the commands should run successfully. - If you open a new R Script by clicking File > New File > R Script, you can code along by typing out all the commands for today’s lesson as I type them.
Click “Run” to send commands from a script to the console or click command enter.
Note: the souce is code available at https://github.com/nih-cfde/training-rstudio-binder/ if you would like to explore the data locally.
R is a calculator¶
You can perform simple and advanced calculations in R.
2 + 2 * 100
## [1] 202
log10(0.05)
## [1] -1.30103
You can save variables and recall them later.
pval <- 0.05
pval
## [1] 0.05
-log10(pval)
## [1] 1.30103
You can save really long lists of things with a short, descriptive names that are easy to recall later.
favorite_genes <- c("BRCA1", "JUN", "GNRH1", "TH", "AR")
favorite_genes
## [1] "BRCA1" "JUN" "GNRH1" "TH" "AR"
Loading R packages¶
Many of the functions we will use are pre-installed. The
Tidyverse is a collection of R packages
that include functions, data, and documentation that provide more tools
and capabilities when using R. You can install the popular data
visualization package ggplot2
with the command
install.packages("ggplot2")
). It is a good idea to “comment out” this
line of code by adding a #
at the beginning so that you don’t
re-install the package every time you run the script. For this workshop,
the packages listed in the .binder/environment.yml
file were
pre-installed with Conda.
#install.packages("ggplot2")
After installing packages, we need to load the functions and tools we
want to use from the package with the library()
command. Let’s load
the ggplot2
package.
library(ggplot2)
Now you have successfully loaded the necessary R packages. Let's complete and exercise:
Exercise¶
We will also use functions from the packages tidyr
and dplyr
to tidy and transform data. What command would you run to load these packages?
library(tidyr)
library(dplyr)
You can also navigate to the “Packages” tab in the bottom right pane of RStudio to view a list of available packages. Packages with a checked box next to them have been successfully loaded. You can click a box to load installed packages. Clicking the “Help” Tab will provide a quick description of the package and its functions.
Key functions¶
Function | Description |
---|---|
<- |
The assignment variable |
log10() |
A built-in function for a log transformation |
install.packages() |
An R function to install packages |
library() |
The command used to load installed packages |