Designed for RNA-seq workflows, ExpreSEd provides a streamlined pipeline to differential expression results. Use a SummarizedExperiment object through the R package interface, or supply raw count matrices and sample metadata (TSV/CSV) directly via the command-line interface.
Reproducibility / How to Run
This project has: - Conda env file at r-package/environment.yml - Dockerfile at r-package/Dockerfile - Nextflow pipeline at nextflow/main.nf
From repo root (JW26ADS8192.v0.0.2):
# Step 1. Create Conda environment
conda env create -f environment.yml
conda activate ads8192
# Step 2. Build and Run a Docker Image
docker build -t hw2-jewel:0.0.3 ./r-package
docker run --rm -it -v "$PWD":/ads8192 -w /ads8192 hw2-jewel:0.0.3
# Step 3. Run Nextflow workflow
nextflow run nextflow/main.nf -c nextflow/nextflow.config -profile docker --outdir nextflow/resultsR Studio Analysis
Quick Start
# Load package and example data (SummarizedExperiment)
library(ExpreSEd)
data(example_se)
# Pick the optimal 'minimun gene count' filtering threshold.
example_se_filtering_assessment <- determine_filter_threshold(
se_ln = example_se,
count_thresholds = c(0, 1, 5, 10, 20, 50, 100, 200, 500),
assay_name = "counts",
ref_level = "Tconv",
group_var = "cell_type",
p_threshold = 0.05
)
# Filter out the low expression genes
se_filtered <- filter_low_exp_genes(
se_ln = example_se,
min_count_per_group = 10,
assay_name = "counts"
)
# Run the DESeq2 pipeline
se_dge<- run_DESeq2(se_ln = se_filtered, group_var = "cell_type", ref_level = "Tconv")
# Shrink log2 fold-change estimates
se_dge_shrink <- log2_shrinkage(dds = se_dge, shrinkage = "apeglm")
# Summarize Gene Expression
DESeq2_gene_reg_summary<- gene_regulation_summary(
res_df = se_dge_shrink,
p_threshold = 0.05,
fc_threshold = 0.5
)
# Visualize
example_se_volcano<- generate_volcano(
res_df = se_dge_shrink,
fc_threshold = 0.5,
xlab = "log2 Fold Change (Treg vs Tconv)",
set_title = "Volcano Plot - Lymph Node Treg vs Tconv",
p_threshold = 0.05
)
# Export Results
example_se_exports<- export_outputs(
res_df = se_dge_shrink,
summary_df = DESeq2_gene_reg_summary,
filtering_diag = example_se_filtering_assessment,
volcano = example_se_volcano,
output_dir = file.path("(tempdir()", "de_output") )Command-Line Interface (via Rapp)
Quick Start
# Load package and example data (TSV)
ex_counts_path <- system.file("testdata", "example_counts.tsv", package = "ExpreSEd")
ex_meta_path <- system.file("testdata", "example_meta.tsv", package = "ExpreSEd")
Sys.setenv(EX_COUNTS = ex_counts_path)
Sys.setenv(EX_META = ex_meta_path)
# Pick the optimal 'minimun gene count' filtering threshold.
ExpreSEd determine_filter_threshold --count $EX_COUNTS --meta $EX_META --output ./results/
# Filter out the low expression genes
ExpreSEd filter_low_exp_genes --count $EX_COUNTS --meta $EX_META --output ./results/
# Run the DESeq2 pipeline
ExpreSEd run_DESeq2 --input ./results/se_filtered.rds --output ./results/
# Shrink log2 fold-change estimates
ExpreSEd log2_shrinkage --input ./results/se_dge.rds --output ./results/
# Summarize Gene Expression
ExpreSEd gene_regulation_summary --input ./results/dge_shrink.rds --output ./results/
# Visualize
ExpreSEd generate_volcano --input ./results/dge_shrink.rds --output ./results/Notes
ExpreSEd R-package includes an additional function (7 total functions) which collectively generates and exports filtering_analysis.tsv, dge_shrink.tsv, volcano_plot.pdf, and volcano_plot.png to current working directory. ExpreSEd CLI only includes 6 functions, which generate and export deliverable within a single step.