Skip to content

Running Workflow via Nextflow

The following documentation details on how to run the Covid19 analysis pipeline using Nextflow on any computing environment.

Installation

1. Check out Github repo

First, check out the Github repo:

git clone https://github.com/compbiocore/covid19_analysis.git

2. Install Nextflow and Singularity

Option A: On Any Computing Environment

If you do not have Singularity already; you can install it by referring to the Singularity installation guide here.

If you do not have Nextflow already; you can install it by referring to the Nextflow installation guide here.

After installing Singularity, ensure that in your Nextflow configuration file, you have enabled Singularity in Nextflow. You can refer to the Singularity configuration guide here; or in another words, add the following block in the nextflow.config file that Nextflow is sourcing:

...
singularity {
    enabled = true
}

Option B: On Brown OSCAR Computing Environment

If you are on Brown OSCAR computing environment, you can simply install Nextflow and Singularity computing environment by following the set up instructions here. And then to initialize the Nextflow environment, simply type in:

nextflow_start

Running the Nextflow Workflow

Once you have finished installing (or already have the requisites satisfied), you can run the Nextflow pipeline with the following command:

cd $PROJECT_REPO
nextflow run $PROJECT_REPO/workflows/covid19.nf \
--output_dir $OUTPUT_DIR --username $GISAID_USER --password='$GISAID_PASSWORD' \
--project_github $PROJECT_REPO

Output Directory

Below is a brief walk-through and explaination of all the workflow workproducts:

Output 1: GISAID Sequence Files and Metadata

In $OUTPUT_DIR/gisaid: - gisaid.fasta, the sequence containing for all sequences downloaded from GISAID given a certain geolocation (e.g., USA/Rhode Island). - gisaid.csv, the GISAID metadata file for all the sequences given the certain geolocation - sra_run.txt, all of the SRA id's linked to the GISAID sequences in this workflow.

Output 2: Analysis Files