Download KF data from the Cavatica portal¶
The easiest way to download a large number of KF files is through Cavatica.
Step 1: Download the list of links¶
On your local machine, create a folder on the Desktop called
Then push those files to Cavatica. If you have multiple Cavatica projects, be sure to remember which project you chose to transfer your files into.
Switch to the Cavatica interface and click on the project name.
Click the "Files" tab on the top left to view the list of files you just pushed from the KF portal.
Select all files by checking the box next to "Name". A tiny pop-up window tells you how many files you have selected.
- Click on the drop down arrow next to the "Download" button (shown in the figure below)
Do not click on the "Download" button
Clicking on the "Download" button instead of the arrow next to it will result in the download of individual files (one at a time). While this works for a small number of file, it is unfeasible for a large number of files. If you selected 50 files, 50 pop-up windows will appears for manual download of individual files. If this happens on accident, you can close one pop-up window at a time.
- A pop-up window will appear after a few seconds.
Click "Download link" and save the text file in the "KF_Data" folder on the "Desktop" of your local machine.
A text file called "download-links.txt" will appear on your computer. It contains a list of links that you can use for downloading vcf files.
Step 2: Download data with wget¶
The download links expire after 48 hours. If you need to re-download files after 48 hours of first download, you must go back to Cavatica and download the "dowload-link.txt" file once again.
- Open up your Mac or Linux terminal window and navigate to the
KF_Datafolder by typing:
- Run the following
wgetcommand to download all the files:
wget -i download-links.txt
wgetto look for a list of URLs from a file.
- Depending on the number of files you selected, the download command may take a few seconds to minutes. When it is complete, all your files will appear in your "KF_Data" folder.
Terminal vidlet: wget to download data¶
Step 3: Clean up file names¶
Notice that the file names look odd? There's a lot of text after the
Not only does this extra text clutter your filenames, it also prevents your computer from recognizing the file format.
One way to clean up file names is to change each file name manually. However, manual editing takes too much time if you are working with multiple files. Luckily, you can run one line of code to clean up all file names at once:
for i in `ls`; do mv $i `ls $i | grep 'vcf.gz' | cut -d'?' -f1`; done
This command loops through the list of file names in "KF_Data" and renames the files by deleting all characters after the
vcf.gz extension starting from the
? in the file name (
-d specifies a text delimiter,
-f1 keeps the first element which is the file name up to the
This clean up command was written to look through files with the pattern
vcf.gz in them. To apply the same renaming command to files with other extensions, such as a
.tsv file, modify the code as follows:
for i in `ls`; do mv $i `ls $i | grep 'tsv.gz' | cut -d'?' -f1`; done
Check your folder - you have now successfully downloaded KF files to your local machine!