Skip to content

qckitfastq: A comprehensive quality control R package for Next Generation Sequencing FASTQ data

Travis coverage Docs

Overview

This R package contains tools for comprehensive quality control of FASTQ format data. We hope to replicate existing tools for FASTQ quality control as well as advance FASTQ metrics where data is truncated for the analysis. We enable efficient processing of FASTQ format data by implementing efficient C++ functions using Rcpp.

The metrics that qckitfastq provides are as following: 1. data dimension 2. per base sequence content 3. per base quality score statisitcs 4. per read GC content 5. per read mean quality score 6. overrepresented sequence 7. per base kmer count 8. overrepresented kmer

The above metrices include both analysis results tables and visualizations of results.

Getting started

Prerequisites

qckitfastq has dependencies on both CRAN packages and Bioconductor packages. Commands to install all prerequisites from R are given below:

install.packages(c('magrittr','ggplot2','dplyr','testthat','data.table','reshape2','grDevices','graphics','stats','utils','Rcpp','kableExtra','rlang','knitr','rmarkdown'))
if (!requireNamespace("BiocManager", quietly = TRUE))
    install.packages("BiocManager")
BiocManager::install(c("RSeqAn","seqTools","zlibbioc")

Installing

From Bioconductor

qckitfastq release version is on Bioconductor. To install from, follow instructions on the package page.

From Github repo

This repository contains the development version. You will need devtools to install.

devtools::install_github("compbiocore/qckitfastq",build_vignettes=TRUE)
library(qckitfastq)

Usage

The simplest way to run qckitfastq and its intended usage is by executing run_all, a single command that will produce a report of all of the included metrics in a user-provided directory with some default parameters and default filenames. These default parameters and filenames cannot be changed. An example using tempdir() and an example fq.gz file is given below:

library(qckitfastq)
infile <- system.file("extdata","10^5_reads_test.fq.gz",package="qckitfastq")
testfolder <- tempdir()
run_all(infile,testfolder)

However, each metric can also be run separately for closer examination, parameter tuning, or if the user wishes to save reports with a different filename. In those cases, we recommend taking a look at the qckitfastq vignette to get started. The vignette can also be viewed in RStudio with the following commands:

library(qckitfastq)
browseVignettes("qckitfastq")

Release history

See NEWS for changes.

Authors

  • August Guang, creator and maintainer.
  • Wenyue Xing, creator.