Introducing The Daylily Multi-Omic Analysis Framework: 30x WGS Pipelines Running < 1hr, for dollars a sample & achieving Fscores of up to 0.998.
Daylily
daylily
is an analysis framework which automates everything from AWS compute resource generation and management through running various WGS pipelines and tools. It is highly configurable, and offers several pipelines to suit your needs. Pre-packages pipelines can run as quickly as < 1hr, for dollars a sample and achieve Fscores of 0.998. For exhaustive pipeline details, please see the daylily repo README. For more information please contact me at john@daylilyinformatics.com.
Benchmarking
All 7 GIAB Samples Run vs. b37
The 7 giab samples may be run with the following commands once daylily installation is complete.
# From the cloned repo root directory, run
source dyinit # initialize daylily cli
cp .test_data/data/giab_30x_b37_analysis_manifest.csv config/analysis_manifest.csv. # Copy the manifest to run the 30x google brain novaseq fastqs
dy-a slurm # activate the slurm analysis profile
dy-r produce_snv_concordances # run the mapping->dedup->varcalling pipeline, default: bwa2 meme + doppelmark + deepvariant
# Wait ~1hr + EC2 spot instance config time
more results/day/b37/other_reports/giab* # view concordance results for each sample.
dy-d reset # clear daylily slurm profile.
Results
Further detail on analysis results can be found here.
Pipeline | SNPts/SNPtv fscore | INS fscore | DEL fscore | Indel fscore | e2e walltime | e2e instance min | Avg EC2 Cost |
---|---|---|---|---|---|---|---|
Sentieon BWA + SentDeDup + DNAscope (BD) | 0.996 / 0.996 | 0.997* | 0.997 | 0.998* | 61m | 68m* | $3.34^*1 - 128vcpu |
BWA-MEM2 + DpplDeDup + Octopus (B2O) | 0.994 / 0.992 | 0.991 | 0.971 | 0.800 | 72.4m | 273m | $12.92 - various vcpu |
BWA-MEM2 + DpplDeDup + Deepvariant (B2D) | 0.997 / 0.996* | 0.996 | 0.998* | 0.998* | 57m* | 156m | $8.54 - 128 vcpu |
^=s/w licensing required to run the sentieon tool *=highest value
Full Pipeline DAG (including QC metrics)
The entire pipeline will produce this multiqc report, and the dag is as follows:
Runtime Performance of All Rules
Additional Pipeline Features
Built In Observability
Detailed per-Projecct Cost Tracking and Budgeting
Automated Spin Up of Daylily Ephemeral Clusters
- Including cost tracking, spot market navigation, job scheduling and spot instance loss recovery.
Daylily Informatics Services
Consulting Services
If you wish to contract with me to integrate daylily into your compute environment, the stock pipeline tales < 1 week to stand up. Please contact me at john@daylilyinformatics.com for details.
Managed WGS Analysis
If you wish to get spun up as quickly as a day, a managed analysis service can be deployed in a matter of days. More details may be found here, and contact john@daylilyinformatics.com for more information.
1: plus sentieon licensing fees