This tutorial shows how to run a standard predefined QIIME2 analysis on the Brown HPC cluster OSCAR, using the bioflows tool. The particular analysis is the first half of the Moving pictures tutorial from QIIME2.
We will assume that you have run through the RNA-Seq tutorial and know how to set up a control file, create a working directory, and setup a screen session as well as have the prerequisites set up. The following is more details specific to the workflow and YAML setup.
Getting Started
The workflow consists of the following steps:
- qiime tools import for importing raw amplicon sequencing data into a QIIME2 artifact
- qiime demux for demultiplexing data
- qiime dada2 for detecting and correcting data and creating feature tables and representative sequences
- qiime feature-table for summarizing and visualizing the feature table and representative sequences
- qiime phylogeny align-to-tree-mafft-fasttree for multiple sequence alignment and phylogeny inference (with mafft and fasttree)
- qiime diversity core-metrics-phylogenetic for computing alpha and beta diversity statistics
- qiime feature-classifier classify-sklearn for taxonomic assignment
- qiime taxa barplot for generating interactive taxonomy barplots
- qiime composition for differential abundance testing with ANCOM
Setup the YAML configuration file (control file)
For the current example, copy the following code into a text file and save it in /users/username
as test_run.yaml
Don't forget to edit the work_dir parameter to reflect the path to your own working directory.
bioproject: Project_test_localhost # Project Name Required
experiment: qiime_pilot # Experiment type Required
--type: EMPSingleEndSequences
--input-path: emp-single-end-sequences
--output-path: emp-single-end-sequences.qza
--m-barcodes-file: sample-metadata.tsv
#--output-suffix: test1
conda_command: source /gpfs/runtime/cbc_conda/bin/activate_cbc_conda; conda activate qiime2-2019.1
work_dir: */users/username*
log_dir: logs
paired_end: True
local_targets: False
saga_host: localhost
ssh_user: *ccv username*
saga_scheduler: slurm
reference_fasta_path: /gpfs/scratch/test.fa
gtf_file: /gpfs/scratch/aragaven/lapierre/caenorhabditis_elegans.PRJNA13758.WBPS8.canonical_geneset.gtf
- qiime:
subcommand: "demux emp-single"
--i-seqs: emp-single-end-sequences.qza
--m-barcodes-file: sample-metadata.tsv
--m-barcodes-column: BarcodeSequence
--o-per-sample-sequences: demux.qza
- qiime:
subcommand: demux summarize
--i-data: demux.qza
--o-visualization: demux.qzv
- qiime:
subcommand: "dada2 denoise-single"
--i-demultiplexed-seqs: demux.qza
--p-trim-left: 0
--p-trunc-len: 120
--o-representative-sequences: rep-seqs-dada2.qza
--o-table: table-dada2.qza
--o-denoising-stats: stats-dada2.qza
- qiime:
subcommand: metadata tabulate
--m-input-file: stats-dada2.qza
--o-visualization: stats-dada2.qzv
- qiime:
subcommand: feature-table summarize
--i-table: table.qza
--o-visualization: table.qzv
--m-sample-metadata-file: sample-metadata.tsv
- qiime:
subcommands: feature-table tabulate-seqs
--i-data rep-seqs.qza
--o-visualization rep-seqs.qzv
- qiime:
subcommand: phylogeny align-to-tree-mafft-fasttree
--p-n-threads: 2
--i-sequences: rep-seqs.qza
--o-alignment: aligned-rep-seqs.qza
--o-masked-alignment: masked-aligned-rep-seqs.qza
--o-tree: unrooted-tree.qza
--o-rooted-tree: rooted-tree.qza
Submit the workflow
If you haven't done so already, copy the above into a text file and save it in /users/username
as test_run.yaml
. The data here is the same as from the Moving pictures tutorial. Because it follows the EMP format, no manifest file is needed, but if providing other data the user will need to provide a manifest file matching the description in QIIME2, specified in the YAML in the same way as in the RNA-seq tutorial
If you haven't already started a screen session in the setup, start one using the following command:
screen -S rnaseq_tutorial
source activate_cbc_conda
bioflows-qiime2 test_run.yaml
(TODO: bioflows-qiime2 is not a defined wrapper...)
Workflow outputs
The bioflows-qiime2 call will automatically generate several directories, which may or may not have any outputs directed to them depending on which analyses have been run in bioflows. These directories include: qiime2
, slurm_scripts
, logs
, and checkpoints
. (TODO: not actually sure what output gets made)
