Skip to contents

R-CMD-check

Designed for RNA-seq workflows, ExpreSEd provides a streamlined pipeline to differential expression results. Use a SummarizedExperiment object through the R package interface, or supply raw count matrices and sample metadata (TSV/CSV) directly via the command-line interface.


Reproducibility / How to Run

This project has: - Conda env file at r-package/environment.yml - Dockerfile at r-package/Dockerfile - Nextflow pipeline at nextflow/main.nf

From repo root (JW26ADS8192.v0.0.2):

# Step 1. Create Conda environment
conda env create -f environment.yml
conda activate ads8192

# Step 2. Build and Run a Docker Image
docker build -t hw2-jewel:0.0.3 ./r-package
docker run --rm -it -v "$PWD":/ads8192 -w /ads8192 hw2-jewel:0.0.3

# Step 3. Run Nextflow workflow
nextflow run nextflow/main.nf -c nextflow/nextflow.config -profile docker --outdir nextflow/results

R Studio Analysis

Installation

# Install the package from GitHub
remotes::install_github("ExpreSEd")

Quick Start

# Load package and example data (SummarizedExperiment)
library(ExpreSEd)
data(example_se)

# Pick the optimal 'minimun gene count' filtering threshold.
example_se_filtering_assessment <- determine_filter_threshold(
  se_ln            = example_se,
  count_thresholds = c(0, 1, 5, 10, 20, 50, 100, 200, 500), 
  assay_name       = "counts", 
  ref_level        = "Tconv", 
  group_var        = "cell_type", 
  p_threshold      = 0.05
  )

# Filter out the low expression genes
se_filtered <- filter_low_exp_genes(
  se_ln               = example_se, 
  min_count_per_group = 10, 
  assay_name          = "counts"
  )

# Run the DESeq2 pipeline
se_dge<- run_DESeq2(se_ln = se_filtered, group_var = "cell_type", ref_level = "Tconv")

# Shrink log2 fold-change estimates
se_dge_shrink <- log2_shrinkage(dds = se_dge, shrinkage = "apeglm")

# Summarize Gene Expression
DESeq2_gene_reg_summary<- gene_regulation_summary(
  res_df       = se_dge_shrink, 
  p_threshold  = 0.05, 
  fc_threshold =  0.5
  )

# Visualize
example_se_volcano<- generate_volcano(
  res_df        = se_dge_shrink, 
  fc_threshold  =  0.5, 
  xlab          = "log2 Fold Change (Treg vs Tconv)", 
  set_title     = "Volcano Plot - Lymph Node Treg vs Tconv", 
  p_threshold   = 0.05
  )

# Export Results
example_se_exports<- export_outputs(
  res_df         = se_dge_shrink, 
  summary_df     = DESeq2_gene_reg_summary, 
  filtering_diag = example_se_filtering_assessment, 
  volcano        = example_se_volcano, 
  output_dir     = file.path("(tempdir()", "de_output") )

Command-Line Interface (via Rapp)

Installation

# Install the package from GitHub
Rscript -e "Rapp::install_pkg_cli_apps('ExpreSEd')"

Quick Start

# Load package and example data (TSV)
ex_counts_path <- system.file("testdata", "example_counts.tsv", package = "ExpreSEd")
ex_meta_path   <- system.file("testdata", "example_meta.tsv", package = "ExpreSEd")

Sys.setenv(EX_COUNTS = ex_counts_path)
Sys.setenv(EX_META   = ex_meta_path)

# Pick the optimal 'minimun gene count' filtering threshold.
ExpreSEd determine_filter_threshold --count $EX_COUNTS --meta $EX_META --output ./results/

# Filter out the low expression genes
ExpreSEd filter_low_exp_genes --count $EX_COUNTS --meta $EX_META --output ./results/

# Run the DESeq2 pipeline
ExpreSEd run_DESeq2 --input ./results/se_filtered.rds --output ./results/

# Shrink log2 fold-change estimates
ExpreSEd log2_shrinkage  --input ./results/se_dge.rds --output ./results/

# Summarize Gene Expression
ExpreSEd gene_regulation_summary  --input ./results/dge_shrink.rds --output ./results/

# Visualize
ExpreSEd generate_volcano --input ./results/dge_shrink.rds --output ./results/

Notes

ExpreSEd R-package includes an additional function (7 total functions) which collectively generates and exports filtering_analysis.tsv, dge_shrink.tsv, volcano_plot.pdf, and volcano_plot.png to current working directory. ExpreSEd CLI only includes 6 functions, which generate and export deliverable within a single step.