Using the CFDE Search Portal to Find Files¶
The Common Fund Data Ecosystem Coordinating Center (CFDE-CC) supports efforts to make Common Fund data sets more findable, accessible, interoperable, and reusable (FAIR) for the scientific community through collaboration, end-user training, and data set sustainability. The CFDE-CC manages and organizes CFDE activities, engages with participating Common Fund programs, connects with user communities, supports training, develops tools and standards, and provides technical expertise to Common Fund programs.
The CFDE Search Portal uses the Crosscut Metadata Model (C2M2), a flexible metadata standard for describing experimental resources in biomedicine and related fields. This portal supports faceted search of metadata concepts such as anatomical location, species, and assay type, across a wide variety of datasets using a controlled vocabulary (we do not currently support protected metadata). This allows researchers to find a wide variety of data that would otherwise need to be searched individually, using varying nomenclatures. The portal only accepts C2M2 data packages from Common Fund Programs.
This tutorial focuses on the Human Microbiome Project (HMP). The goal of this tutorial is to identify small FASTQ files with persistent identifiers from a longitudinal multi-comics study. Please refer to the Portal User Guide for a detailed description of all portal features.
Learning Objectives
- learn how to access the CEFD search portal
- learn how to create personal collections
- learn how to search for files meeting a specific criterion
Create a new personal collection¶
Go to the CFDE data portal at app.nih-cfde.org/.
Log in (upper right).
Under your username (upper right), create a new personal collection.
For name, you can use "Tuesday demo" or anything else. You can leave description blank.
Find some files¶
Go back to the CFDE data portal main page.
Select File (upper left).
Use the facets on the left to select: * Common Fund Program: HMP * Project: "Longitudinal multi 'omics" * "has persistent ID" - True * Uncompressed size in Bytes - 50000000 to 60000000 (50 MB to 60 MB).
With these selections, the first result should have the "Filename" of SRR5935743_1.fastq
.
Add files to your personal collection¶
Click on the first result to get a detailed view. Then add it to your personal collection: * scroll down to "Part of personal collection" and click Link records. * Select your personal collection, click Link (upper right).
Click back in your browser, to get back to your filtered search.
Repeat linking to a collection with the third result (Filename: SRR5950647_1.fastq
). Add it to the same personal collection.
Info
For today, here are the direct links to the two files we'll be using: file 1, SRR5935743_1.fastq
and file 2, SRR5950647_1.fastq
.
(Note, you could select any files you like, but these are small enough to work and I know what the results will be. So it's good for today's demo; I suggest trying new/different files as a Thursday exercise!)
Export your personal collection¶
Go to your collection, and select export and choose NCPI manifest format.
Info
What is NCPI? "NCPI" stands for "NIH Cloud Platform Interoperability", an effort by the NIH to convene around interoperation for cloud workbenches.
Examine the NCPI manifest file¶
You should now have a CSV file in your Downloads that, when examined with a spreadsheet program, looks like this:
The key piece of information in here is the drs_uri
column, which provides a Data Repository Service location from which to download the files.