ImpulseDE is an R Bioconductor package for detecting differentially expressed genes whether for a single time course (analysis of differential expression behavior over time) or between two time courses (comparative analysis). It is based on the impulse model introduced by Chechik and Koller in 2009, which captures the impulse-like changes of gene expression typically observed for cells responding to perturbations or environmental changes.
To test for differential expression in case of a single time course experiment, we evaluate the extent by which the impulse model (representing the alternative hypothesis) fits the expression profile better than a flat line (null hypothesis). In the case of two time courses (case and control) we evaluate the extent to which two models (a separate model for each time course; representing the alternative hypothesis) describe the expression profile better than a single model (null hypothesis; for example, both data sets are generated by the same model (combined)).
- any kind or normalized high throughput expression data set, including microarray and RNA-Seq gene expression data as well as ChiP-seq data
- at least six time points (required for model inference)
- The genes are clustered (separately for each time course) using k-means, where k is determined iteratively to optimize the similarity of gene patterns within a cluster.
- The parameters of the impulse model are fit to the mean expression profile of each cluster (minimizing the sum of squared error; SSE) using two different optimization strategies. Since both approaches return local optima, the analysis is repeated 50 times (default) for each strategy using different initializations for the parameters’ values.
- The top three sets of parameters (out of the 100, default) for each cluster are used as starting points to fit the model to each gene separately.
- Random sampling and bootstrapping is used (Storey et al. , 2005) in order to enable the determination of significance levels (evidences to reject the null hypothesis).
- The resulting p-values are FDR-corrected (q-value) to account for multiple testing (Benjamini and Hochberg, 1995).
- The implementation includes the use of multi-threading (running fits to different clusters or different randomized samples in parallel) to further reduce running time
Chechik, G. and Koller, D. (2009) Timing of Gene Expression Responses to Envi-ronmental Changes. J. Comput. Biol., 16, 279–290.
Storey, J.D. et al. (2005) Significance analysis of time course microarray experiments. Proc. Natl. Acad. Sci. USA, 102(36), 12837-12841.
Benjamini, Y. and Hochberg, Y. (1995) Controlling the false discovery rate: a practical and powerful approach to multiple testing. J. R. Stat. Soc. Series B Stat. Methodol., 57, 289–300.