Skip to content

Linking to and configuring RSeqAn

Introduction

The reason RSeqAn was created was to allow for easy integration of the SeqAn biological sequence analysis C++ library into R packages. This vignette describes how to link to RSeqAn from another R package as well as how to configure RSeqAn for your own build system such as enabling zlib or bzip2.

Linking

Dependencies for linking

Prerequisites for linking to RSeqAn are:

  • Compiler needs to support C++14 standard. This is the default standard from GCC6 on. You need to tell the build system to use C++14, either by modifying the SystemRequirements field of the DESCRIPTION file: SystemRequirements: C++14 or (preferred) by specifying it in src/Makevars: CXX_STD = CXX14
  • Rcpp needs to be installed and imported inside the DESCRIPTION file: Imports: Rcpp as well as specified in the NAMESPACE file: importFrom(Rcpp, sourceCpp) Note: If you generate your NAMESPACE with roxygen2 then don't worry about the NAMESPACE file.

Linking to RSeqAn

As long as the prerequisites are satisfied, then linking to RSeqAn is simple. Just put RSeqAn into the Imports field of the DESCRIPTION file as well, and then put

LinkingTo: Rcpp, RSeqAn

also in the DESCRIPTION file.

In C or C++ code, use #include <seqan/$filename.h> as usual, as well as // [[Rcpp::depends(RSeqAn)]] as usual. For an example, you can look at the qckitfastq package source code.

Configuring RSeqAn

By default SeqAn and thus RSeqAn are not set up to make use of libraries like zlib and bzip2 although it has the capabilities. In order to enable and set options for these libraries (assuming the libraries are installed), preprocessor flags for it should be set in src/Makevars (preferred) or using Sys.setenv(). As an example for enabling zlib:

  • In src/Makevars, write: PKG_CXXFLAGS=-DSEQAN_HAS_ZLIB
  • Using Sys.setenv(): Sys.setenv("PKG_CXXFLAGS"="-DSEQAN_HAS_ZLIB")

You can see other preprocessor defines that can be set at the SeqAn documentation.

Example script

An example script using Sys.setenv() to set preprocessor defines that follows the SeqAn SAM and BAM I/O tutorial is below:

Sys.setenv("PKG_CXXFLAGS"="-DSEQAN_HAS_ZLIB -std=c++14")
// [[Rcpp::depends(RSeqAn)]]

#include <seqan/bam_io.h>
#include <Rcpp.h>

using namespace Rcpp;

// [[Rcpp::export]]
int readBam()
{
    // test.bam is in vignettes folder
    seqan::CharString bamFileName = "toy.bam";

    // Open input file, BamFileIn can read SAM and BAM files.
    seqan::BamFileIn bamFileIn(toCString(bamFileName));

    // Open output file, BamFileOut accepts also an ostream and a format tag.
    // Note the usage of Rcout instead of std::cout
    seqan::BamFileOut bamFileOut(context(bamFileIn), Rcout, seqan::Sam());

    // Copy header.
    seqan::BamHeader header;
    seqan::readHeader(header, bamFileIn);
    seqan::writeHeader(bamFileOut, header);

    // Copy records.
    seqan::BamAlignmentRecord record;
    while (!atEnd(bamFileIn))
    {
        seqan::readRecord(record, bamFileIn);
        seqan::writeRecord(bamFileOut, record);
    }

    return 0;
}
readBam()
## @SQ  SN:ref  LN:45
## @SQ  SN:ref2 LN:40
## r001 163 ref 7   30  8M4I4M1D3M  =   37  39  TTAGATAAAGAGGATACTG *   XX:B:S,12561,2,20,112
## r002 0   ref 9   30  1S2I6M1P1I1P1I4M2I  *   0   0   AAAAGATAAGGGATAAA   *
## r003 0   ref 9   30  5H6M    *   0   0   AGCTAA  *   SA:Z:ref,29,-,6H5M,17,0;
## r004 0   ref 16  30  6M14N1I5M   *   0   0   ATAGCTCTCAGC    *
## r003 16  ref 29  30  6H5M    *   0   0   TAGGC   *   SA:Z:ref,9,+,5S6M,30,1;
## r001 83  ref 37  30  9M  =   7   -39 CAGCGCCAT   *   NM:i:1
## r005 4   *   0   0   8X  *   0   8   AAAAAAAA    *
## [1] 0