Selecting a Kids First Cancer Cohort¶

The Gabriella Miller Kids First Pediatric Data Portal (KF Portal) hosts datasets at the intersection of congenital birth defects and pediatric cancers, with genomic files for more than 16,000 participants.

Kids First Data Portal

Check out our lessons on Kids First to learn more about the different Data Portal features and building simple to complex queries.

Files on the KF Portal are managed through different access levels. Open access files (including processed files of somatic samples) can be viewed and downloaded by any user. Controlled access files (including raw sequencing files and imaging data) require approvals through dbGaP. For this tutorial, we will use open access pre-processed files generated using Kallisto (v0.43.1), which uses pseudoalignments to quantify transcript abundance from raw data.

KFDRC RNA-Seq Workflow

The Kids First RNA-Seq Workflow uses multiple tools/packages for expression detection and fusion calls. The workflow requires raw FASTQ files (controlled access) as input and generates multiple outputs including the Kallisto transcript quantification files. All the output files of this pipeline are available on the KF Portal as open access data. Due to access restrictions and computational intensity, this tutorial will not cover the Kallisto workflow, but users with their own RNA-Seq data may consider starting from this point.

Step 1: Filter for open access data¶

Login to the KF Portal .
Select the File Repository tab.
Select the Browse All option for the Filter.

File Repository

Data Summary

At the time this tutorial was written (January 2021), the portal contained a total of 88,728 files. Because new datasets are constantly uploaded to the KF Portal, exact numbers within your query may change slightly when run in the future.

Select the Access filter listed under FILE field.
Select Open value.
Click View Results to update selection. This results in 18,162 files.

Open access filter

Step 2: Apply File Filters to obtain RNA-Seq files¶

Select the File Filters tab and apply the following filters:

Experimental Strategy: RNA-Seq
Data Type: Gene expression
File Format: tsv

This results in 1,477 files.

File filters

Step 3: Select cancer type¶

Switch to the Clinical Filter tab and apply the following.

Diagnosis (Source Text): Medulloblastoma and Ependymoma

This filters the number of files to 235.

Cancer type

Step 4: Subset cohort¶

To reduce possible sources of variation due to participant demographics, we will further narrow the query to only include data from white male patients.

Under the Clinical Filters tab select:

Gender: Male
Race: White

This results in 99 files.

Subset by Clinical Filters

Step 5: Copy files to Cavatica¶

Important

It is crucial to ensure the Cavatica integrations are enabled to allow for file transfers. Find more details in our Push to Cavatica lesson. You do not have to have the Data Repository Integrations set up to continue with this lesson.

Click on the ANALYZE IN CAVATICA button.
Select the CREATE A PROJECT option and provide an appropriate name for your folder. This tutorial uses cancer-dge as the project name.
Use the SAVE option to create the project.

Create project on Cavatica

Following project creation, the option will update to enable copying of the selected files to Cavatica.

Copy files to Cavatica

Successful copying of the files to the project folder will result in a pop-up box summarizing the details along with a link to view the project folder on Cavatica. If the pop-up box disappears before you have a chance to click on the project link, you can login to Cavatica and follow the steps to view files in Cavatica.

Successful copy to Cavatica

Query link

The KF Portal enables sharing of the query with the unique filter combinations including as a short URL. Login to your KF account and click on the query link to obtain the selected cohort.

Sharing query

You can learn more about the different options to save/share queries in the KF Portal from our lesson.

In our next lesson, we will explore the newly created project folder and files on the Cavatica platform!

Media resources¶

A video walkthrough of the cancer cohort selection on Kids First Portal:

Last update: December 10, 2021