Introducing The Daylily Multi-Omic Analysis Framework: 30x WGS Pipelines Running < 1hr, for dollars a sample & achieving Fscores of up to 0.998.

Daylily

daylily is an analysis framework which automates everything from AWS compute resource generation and management through running various WGS pipelines and tools. It is highly configurable, and offers several pipelines to suit your needs. Pre-packages pipelines can run as quickly as < 1hr, for dollars a sample and achieve Fscores of 0.998. For exhaustive pipeline details, please see the daylily repo README. For more information please contact me at john@daylilyinformatics.com.

Benchmarking

All 7 GIAB Samples Run vs. b37

The 7 giab samples may be run with the following commands once daylily installation is complete.

# From the cloned repo root directory, run
source dyinit  # initialize daylily cli
cp .test_data/data/giab_30x_b37_analysis_manifest.csv config/analysis_manifest.csv. # Copy the manifest to run the 30x google brain novaseq fastqs
dy-a slurm  # activate the slurm analysis profile
dy-r produce_snv_concordances  # run the mapping->dedup->varcalling pipeline, default: bwa2 meme + doppelmark + deepvariant

# Wait ~1hr + EC2 spot instance config time

more results/day/b37/other_reports/giab*  # view concordance results for each sample.

dy-d reset  # clear daylily slurm profile.

Results

Further detail on analysis results can be found here.

Pipeline SNPts/SNPtv fscore INS fscore DEL fscore Indel fscore e2e walltime e2e instance min Avg EC2 Cost
Sentieon BWA + SentDeDup + DNAscope (BD) 0.996 / 0.996 0.997* 0.997 0.998* 61m 68m* $3.34^*1 - 128vcpu
BWA-MEM2 + DpplDeDup + Octopus (B2O) 0.994 / 0.992 0.991 0.971 0.800 72.4m 273m $12.92 - various vcpu
BWA-MEM2 + DpplDeDup + Deepvariant (B2D) 0.997 / 0.996* 0.996 0.998* 0.998* 57m* 156m $8.54 - 128 vcpu

^=s/w licensing required to run the sentieon tool *=highest value

Full Pipeline DAG (including QC metrics)

The entire pipeline will produce this multiqc report, and the dag is as follows:

Runtime Performance of All Rules

Additional Pipeline Features

Built In Observability

Detailed per-Projecct Cost Tracking and Budgeting

Automated Spin Up of Daylily Ephemeral Clusters

  • Including cost tracking, spot market navigation, job scheduling and spot instance loss recovery.

Daylily Informatics Services

Consulting Services

If you wish to contract with me to integrate daylily into your compute environment, the stock pipeline tales < 1 week to stand up. Please contact me at john@daylilyinformatics.com for details.

Managed WGS Analysis

If you wish to get spun up as quickly as a day, a managed analysis service can be deployed in a matter of days. More details may be found here, and contact john@daylilyinformatics.com for more information.


1: plus sentieon licensing fees