Single-cell RNA sequencing (scRNA-Seq) technologies are opening the way for transcriptome-wide profiling across diverse and complex mammalian tissues, facilitating unbiased identification of novel cell sub-populations and their functional roles. As in other high-throughput assays, a fraction of the heterogeneity observed in scRNA-Seq data results from batch effects and other technical artifacts. In particular, these protocols’ reliance on minuscule amounts of starting mRNA can lead to widespread “drop-out effects,” in which expressed transcripts are missed. Due to the biases inherent to these assays, data normalization is an essential step prior to any downstream analyses. Furthermore, due to wide-range of scRNA-Seq study designs used in the field, we cannot expect to find a one-size-fits-all solution to these problems.
SCONE (Single-Cell Overview of Normalized Expression) is an R Biodconductor package that supports a rational, data-driven framework for assessing the efficacy of various normalization workflows, encouraging users to explore trade-offs inherent to their data set prior to finalizing a data normalization strategy. We provide an interface for running multiple normalization workflows in parallel. We also offer tools for ranking workflows and visualizing trade-offs. We import some common normalization modules used in traditional bulk sequencing, and provide support for integrating user-specified normalization modules.
- Expression Matrix (e.g. Read Counts)
- Library Alignment Metrics
- Biological Exposures
- Batch Conditions
- Control Gene Sets
General Normalization Workflow
- Data Imputation Module: replacing zero-abundance values with expected values under a drop-out model.
- Scaling or Quantile Normalization Module: either i) normalization that scales each sample’s transcriptome abundances by a single factor or ii) more complex offsets that match quantiles across samples.
- Regression Module. Approaches for removing unwanted correlated variation from the data (e.g. RUVg, Risso et al. 2014).
- Hundreds of Normalized Expression Matrices
- Up to 8 Performance Metrics per Matrix
- Ranking by Performance Scores
Download the R Bioconductor package here
Vignette is available here
Workshops and Tutorials
Risso, D., Cole M.B., and Street K. (2016) Analysis of single-cell RNA-seq data with R and Bioconductor. BioC2016. Stanford, CA. [Workshop materials here]
Risso, D., Ngai, J., Speed, T.P., and Dudoit, S. (2014) Normalization of RNA-seq data using factor analysis of control genes or samples. Nat. Biotech., 32, 896–902.