Selecting Kids First Cancer Cohort¶
The Gabriella Miller Kids First Pediatric Data Portal (KF portal) hosts datasets at the intersection of childhood development and cancer from over 16,000 samples with the constant addition of new data.
Kids First Data Portal
There are data with different access levels hosted on the KF portal including open (processed files, reports, plots, etc) and controlled (raw sequencing files, histological images, etc). For this tutorial, we will use open access pre-processed files generated using Kallisto (v0.43.1), which uses pseudoalignments to quantify transcript abundance from raw data.
KFDRC RNAseq workflow
Kids First RNAseq pipeline uses multiple tools/packages for expression detection and fusion calls. The workflow requires raw FASTQ files (controlled access) as input and generates multiple outputs including the Kallisto transcript quantification files. All the output files of this pipeline are available on the portal as open access data. In addition to the restricted data access issue, it is computationally taxing to run this workflow on multiple files.
Step 1: Filter for open access data¶
- Login to the KF portal
- Select the File Repository tab
- Select the Browse All option for the Filter.
At the time of the tutorial (Jan 2021), the portal contained a total of 88,728 files. Since new datasets are constantly uploaded to the KF portal, the query numbers may change when run in the future.
- Select the Access filter listed under FILE field
- Select Open value
- Click View Results to update selection. This results in 18,162 files.
Step 2: Apply File Filters to obtain RNAseq files¶
Select the File Filters tab and apply the following filters:
- Experimental Strategy --> RNA-Seq
- Data Type --> Gene expression
- File Format --> tsv
This results in 1,477 files.
Step 3: Select cancer type¶
Switch to the Clinical Filter tab and apply:
- Diagnosis (Source Text) --> Medulloblastoma and Ependymoma.
This filters the number of files to 235.
Step 4: Subset cohort¶
To reduce possible sources of variation from sex and race, we subset further to include data from only white male patients.
Under the Clinical Filters tab select:
- Gender --> Male
- Race --> White
This results in 99 files.
Step 5: Copy files to Cavatica¶
It is crucial to ensure the Cavatica integrations are enabled to allow for file transfers. Find more details in our Push to Cavatica lesson. You do not have to have the Data Repository Integrations set up to continue with this lesson.
- Click on the ANALYZE IN CAVATICA button.
- Select the CREATE A PROJECT option and provide an appropriate name for your folder. In this tutorial,
cancer-dgewas chosen as the project name.
- Use the SAVE option to create the project.
Following project creation, the option will update to enable copying of the selected files to Cavatica.
Successful copying of the files to the project folder will result in a pop-up box summarizing the details along with a link to view the project folder on Cavatica. If the pop-up box disappears before you have a chance to click on the project link, you can login to Cavatica and follow the steps to view files in Cavatica.
The KF portal enables sharing of the query with the unique filter combinations including as a short URL. Login to your KF account and click on the query link to obtain the selected cohort.
In our next lesson, we will explore the newly created project folder and files on the Cavatica platform!
A video walkthrough of the cancer cohort selection on Kids First portal: