BROAD OVERVIEW OF SINGLE CELL RNA SEQ TECHNOLOGY

Ashok Ragavendran

COMPARISON OF RNASEQ and SINGLE CELL DATA



  • Measures the average expression level for each gene across a large population of input cells
  • Useful for comparative transcriptomics, e.g. samples of the same tissue from different species
  • Useful for quantifying expression signatures from ensembles, e.g. in disease studies
  • Insufficient for studying heterogeneous systems, e.g. early development studies, complex tissues (brain)
  • Does not provide insights into the stochastic nature of gene expression

Challenges in Single Cell Data

The main difference between bulk and single cell RNA-seq is that

  • Each sequencing library represents a single cell, instead of a population of cells.
  • The main sources of discrepancy between the libraries are:
    • Amplification (up to 1 million fold)
    • Gene ‘dropouts’ in which a gene is observed at a moderate expression level in one cell but is not detected in another cell (Kharchenko, Silberstein, and Scadden 2014).

In both cases the discrepancies are introduced due to low starting amounts of transcripts since the RNA comes from one cell only.

Improving the transcript capture efficiency and reducing the amplification bias are currently active areas of research.


ref: Analysis of single cell RNA-seq data by Vladimir Kiselev, Tallulah Andrews, Jennifer Westoby, Davis McCarthy, Maren Büttner and Martin Hemberg.

OVERVIEW OF EXPERIMENTAL METHODS

Development of new methods and protocols for scRNA-seq is currently a very active area of research and several protocols have been published over the last few years. Technological developments and protocol improvements have fueled consistent and exponential increases in the number of cells that can be studied in single-cell RNA-seq analyses.

scaling of scRNA-seq experiments

alt text

a) Key technologies that have allowed jumps in experimental scale. A jump to ∼100 cells was enabled by sample multiplexing, and then a jump to ∼1,000 cells was achieved by large-scale studies using integrated fluidic circuits, followed by a jump to several thousands of cells with liquid-handling robotics. Further orders-of-magnitude increases bringing the number of cells assayed into the tens of thousands were enabled by random capture technologies using nanodroplets and picowell technologies. Recent studies have used in situ barcoding to inexpensively reach the next order of magnitude of hundreds of thousands of cells. (b) Cell numbers reported in representative publications by publication date. Key technologies are indicated.


ref: Svensson et.al Exponential scaling of single-cell RNA-seq in the past decade. Nature Protocols(2018)

The methods can be categorized in different ways, but the two most important aspects are quantification and capture.

For quantification, there are two types, full-length and tag-based.

  • full-length: Tries to achieve a uniform read coverage of each transcript.
  • tag-based: These protocols only capture either the 5’- or 3’-end of each RNA.

The choice of quantification method has important implications for what types of analyses the data can be used for. In theory, full-length protocols should provide an even coverage of transcripts, but as we shall see, there are often biases in the coverage. The main advantage of tag-based protocol is that they can be combined with unique molecular identifiers (UMIs) which can help improve the quantification (see chapter 4.6). On the other hand, being restricted to one end of the transcript may reduce the mappability and it also makes it harder to distinguish different isoforms .

The strategy used for capture determines throughput, how the cells can be selected as well as what kind of additional information besides the sequencing that can be obtained. The three most widely used options are microwell-, microfluidic- and droplet- based.


ref: Analysis of single cell RNA-seq data by Vladimir Kiselev, Tallulah Andrews, Jennifer Westoby, Davis McCarthy, Maren Büttner and Martin Hemberg.

A recent study on the comparison of methods


  • While Smart-seq2 detected the most genes per cell and across cells, CEL-seq2, Drop-seq, MARS-seq, and SCRB-seq quantified mRNA levels with less amplification noise due to the use of unique molecular identifiers (UMIs).
  • Power simulations at different sequencing depths showed that Drop-seq is more cost-efficient for tran- scriptome quantification of large numbers of cells, while MARS-seq, SCRB-seq, and Smart-seq2 are more efficient when analyzing fewer cells.

ref: Ziegenhain, C. et al. Comparative Analysis of Single-Cell RNA Sequencing Methods. Molecular Cell 65, 631–643.e4 (2017)

Overview of Methods For Analysis

In [7]:
IRdisplay::display_html('<iframe src=http://www.scrna-tools.org/ width=2000, height=1000></iframe> ')

Overview of Workflows

image.png

image.png

Bioinformatics processing

  • Follows the standard analysis workflow for the procesing of RNA Seq data
  • Many of the standard QC procedures apply including Fastqc, Mapping QC
For droplet based methods only a fraction of droplets contain both beads and an intact cell.
Variation in droplet size, amplification efficiency, and sequencing will lead both “background” and real cells to have a wide range of library sizes.
Various approaches have been used to try to distinguish those cell barcodes which correspond to real cells.

This is obtained as standard output with the 10x platform and for others you might have to carry them out yourself image.png


_ref_: [Analysis of single cell RNA-seq data Chapter 4](http://hemberg-lab.github.io/scRNA.seq.course/construction-of-expression-matrix.html) Provides a good overview of some of the methods used here

Data Analysis workflow

  • QC on cells and genes
  • Normalization
  • Correction for Batch Effects
  • Detection of Highly variable genes
  • Clustering and classification
  • Differential Expression Testing
  • Pseudotemporal ordering

Differential Expression Testing

  • Two main assumptions on the statistical distributions for counts
  • Non-parametric methods
  • Bayesian Approaches

Pseudotemporal Ordering

alt text

  • Current algorithms best identify trajectories between the most phenotypically distant cell states, which molecularly are very distinct
  • They are less robust in reconstructing trajectories from early states towards intermediate or transitory cell states.
  • Limitations include reconstructing linear trajectories (Waterfall, Monocle 1) or identifying only a single branch point (Wishbone), or requiring a priori knowledge of the number of branches (Diffusion Pseudotime, DPT)-
  • Monocle 2 addresses many of these challenges but is not designed to reconstruct trajectories between any two chosen cell states, which might include transitions from or to rare cell types.
  • Moreover, as they are designed to identify branching trajectories, Wishbone, DPT, and Monocle 2 are less suited to detect convergent differentiation paths, such as during plasmacytoid dendritic cell development from distinct precursor cells

ref:da Rocha, E. L. et al. Reconstruction of complex single-cell trajectories using CellRouter. Nat Comms 9, 892 (2018) </div>

Publicly Available 10X datasets

In [2]:
IRdisplay::display_html('<iframe src=https://support.10xgenomics.com/single-cell-gene-expression/datasets width=1000,
height=500></iframe> ')