Skip to content

Selecting Kids First Cancer Cohort

The Gabriella Miller Kids First Pediatric Data Portal (KF portal) hosts datasets at the intersection of childhood development and cancer from over 16,000 samples with the constant addition of new data.

Kids First Data Portal

Check out our lessons on Kids First to learn more about the different Data Portal features and building simple to complex queries.

There are data with different access levels hosted on the KF portal including open (processed files, reports, plots, etc) and controlled (raw sequencing files, histological images, etc). For this tutorial, we will use open access pre-processed files generated using Kallisto (v0.43.1), which uses pseudoalignments to quantify transcript abundance from raw data.

KFDRC RNAseq workflow

Kids First RNAseq pipeline uses multiple tools/packages for expression detection and fusion calls. The workflow requires raw FASTQ files (controlled access) as input and generates multiple outputs including the Kallisto transcript quantification files. All the output files of this pipeline are available on the portal as open access data. In addition to the restricted data access issue, it is computationally taxing to run this workflow on multiple files.

Step 1: Filter for open access data

  • Login to the KF portal
  • Select the File Repository tab
  • Select the Browse All option for the Filter.

File Repository

Data Summary

At the time of the tutorial (Jan 2021), the portal contained a total of 88,728 files. Since new datasets are constantly uploaded to the KF portal, the query numbers may change when run in the future.

  • Select the Access filter listed under FILE field
  • Select Open value
  • Click View Results to update selection. This results in 18,162 files.

Open access filter

Step 2: Apply File Filters to obtain RNAseq files

Select the File Filters tab and apply the following filters:

  • Experimental Strategy --> RNA-Seq
  • Data Type --> Gene expression
  • File Format --> tsv

This results in 1,477 files.

File filters

Step 3: Select cancer type

Switch to the Clinical Filter tab and apply:

  • Diagnosis (Source Text) --> Medulloblastoma and Ependymoma.

This filters the number of files to 235.

Cancer type

Step 4: Subset cohort

To reduce possible sources of variation from sex and race, we subset further to include data from only white male patients.

Under the Clinical Filters tab select:

  • Gender --> Male
  • Race --> White

This results in 99 files.

Subset by Clinical Filters

Step 5: Copy files to Cavatica

Important

It is crucial to ensure the Cavatica integrations are enabled to allow for file transfers. Find more details in our Push to Cavatica lesson. You do not have to have the Data Repository Integrations set up to continue with this lesson.

  • Click on the ANALYZE IN CAVATICA button.
  • Select the CREATE A PROJECT option and provide an appropriate name for your folder. In this tutorial, cancer-dge was chosen as the project name.
  • Use the SAVE option to create the project.

Create project on Cavatica

Following project creation, the option will update to enable copying of the selected files to Cavatica.

Copy files to Cavatica

Successful copying of the files to the project folder will result in a pop-up box summarizing the details along with a link to view the project folder on Cavatica. If the pop-up box disappears before you have a chance to click on the project link, you can login to Cavatica and follow the steps to view files in Cavatica.

Successful copy to Cavatica

Query link

The KF portal enables sharing of the query with the unique filter combinations including as a short URL. Login to your KF account and click on the query link to obtain the selected cohort.

Sharing query

You can learn more about the different options to save/share queries in the KF portal from our lesson.

In our next lesson, we will explore the newly created project folder and files on the Cavatica platform!

Media resources

A video walkthrough of the cancer cohort selection on Kids First portal:


Last update: April 2, 2021