qckitfastq: A comprehensive quality control R package for Next Generation Sequencing FASTQ data
Overview
This R package contains tools for comprehensive quality control of FASTQ format data. We hope to replicate existing tools for FASTQ quality control as well as advance FASTQ metrics where data is truncated for the analysis. We enable efficient processing of FASTQ format data by implementing efficient C++ functions using Rcpp
.
The metrics that qckitfastq
provides are as following:
1. data dimension
2. per base sequence content
3. per base quality score statisitcs
4. per read GC content
5. per read mean quality score
6. overrepresented sequence
7. per base kmer count
8. overrepresented kmer
The above metrices include both analysis results tables and visualizations of results.
Getting started
Prerequisites
qckitfastq
has dependencies on both CRAN packages and Bioconductor packages. Commands to install all prerequisites from R are given below:
install.packages(c('magrittr','ggplot2','dplyr','testthat','data.table','reshape2','grDevices','graphics','stats','utils','Rcpp','kableExtra','rlang','knitr','rmarkdown')) if (!requireNamespace("BiocManager", quietly = TRUE)) install.packages("BiocManager") BiocManager::install(c("RSeqAn","seqTools","zlibbioc")
Installing
From Bioconductor
qckitfastq
release version is on Bioconductor. To install from, follow instructions on the package page.
From Github repo
This repository contains the development version. You will need devtools
to install.
devtools::install_github("compbiocore/qckitfastq",build_vignettes=TRUE) library(qckitfastq)
Usage
The simplest way to run qckitfastq
and its intended usage is by executing run_all
, a single command that will produce a report of all of the included metrics in a user-provided directory with some default parameters and default filenames. These default parameters and filenames cannot be changed. An example using tempdir()
and an example fq.gz
file is given below:
library(qckitfastq) infile <- system.file("extdata","10^5_reads_test.fq.gz",package="qckitfastq") testfolder <- tempdir() run_all(infile,testfolder)
However, each metric can also be run separately for closer examination, parameter tuning, or if the user wishes to save reports with a different filename. In those cases, we recommend taking a look at the qckitfastq
vignette to get started. The vignette can also be viewed in RStudio with the following commands:
library(qckitfastq) browseVignettes("qckitfastq")
Release history
See NEWS
for changes.
Authors
- August Guang, creator and maintainer.
- Wenyue Xing, creator.