Quick Start & Reference

Everything you need to install, configure, run, and analyze experiments with the DOE Helper Tool.

1 Installation & Setup

Get up and running in under two minutes.

System Requirements

Python 3.10 or higher. Works on Linux, macOS, and Windows. Full-factorial designs use only the standard library — no external packages needed.

Terminal
# Install from PyPI $ pip install doehelper # Verify installation $ doe --version

Dependencies

PackageVersionPurpose
pyDOE3≥1.0PB, fractional, LHS, CCD, Box-Behnken, Taguchi designs
numpy≥1.26Array operations, RSM, hat matrix, design evaluation
pandas≥2.0Data manipulation
matplotlib≥3.7Pareto, effects, diagnostic, normal/half-normal, RSM surface plots
scipy≥1.11ANOVA F-tests, confidence intervals, surface optimization, power analysis
Jinja2≥3.1Runner script template rendering

2 Writing a Configuration File

Everything starts with a JSON config file. It defines your factors, responses, design type, and execution settings.

config.json
{ "metadata": { "name": "My Experiment", "description": "Testing 3 factors at 2 levels" }, "factors": [ {"name": "temperature", "levels": ["150", "200"], "type": "continuous", "unit": "°C"}, {"name": "pressure", "levels": ["2", "6"], "type": "continuous", "unit": "bar"}, {"name": "catalyst", "levels": ["A", "B"], "type": "categorical"} ], "fixed_factors": { "duration": "60" }, "responses": [ {"name": "yield", "optimize": "maximize", "unit": "%"}, {"name": "cost", "optimize": "minimize", "unit": "USD"} ], "runner": { "arg_style": "double-dash", "result_file": "json" }, "settings": { "operation": "full_factorial", "test_script": "test.sh", "out_directory": "results", "block_count": 1 } }
Factor Types Explained

Categorical — Discrete, unordered levels (e.g., "A", "B", "dark", "light"). Cannot be interpolated.

Continuous — Numeric values that can be interpolated (e.g., temperature 150–200). Required for CCD and Box-Behnken star/center points.

Ordinal — Ordered categorical levels (e.g., "low", "medium", "high"). Treated as categorical in most designs.

Argument Styles

double-dash (default): --temperature 150 --pressure 2 --catalyst A

env: TEMPERATURE=150 PRESSURE=2 CATALYST=A ./test.sh

positional: ./test.sh 150 2 A

Choosing a Design Type
GoalDesignWhy
Test everythingFull FactorialAll combinations, all interactions
Reduce runs (2-level)Fractional FactorialHalf the runs, some aliasing
Screen many factorsPlackett-BurmanN+1 runs for N factors
Modern screeningDefinitive Screening3-level, detects curvature, 2k+1 runs
Robust design (Taguchi)TaguchiOrthogonal arrays, S/N ratios
Continuous space fillingLatin HypercubeEven coverage, configurable N
Find the optimumCentral CompositeQuadratic model, star points
Avoid corner pointsBox-BehnkenSafe RSM, no extreme combos
Custom run countD-OptimalAlgorithmic design, max information per run
Formulation/blendingMixture (Simplex)Components that sum to 1

The factors Section

Required. Array of factor objects, each with:

FieldRequiredDescription
nameYesUnique factor name
levelsYesArray of at least 2 level values (strings)
typeNocategorical (default), continuous, or ordinal
unitNoUnit of measurement (for display)
descriptionNoHuman-readable description

The responses Section

Optional (defaults to a single response named "response"). Each response has:

FieldRequiredDescription
nameYesMust match keys in result JSON files
optimizeNomaximize (default) or minimize
unitNoUnit of measurement
descriptionNoHuman-readable description
weightNoRelative importance for multi-objective optimization (default: 1.0)
boundsNo[worst, best] for desirability scaling in --multi mode (auto-computed if omitted)

The settings Section

FieldDefaultDescription
operationfull_factorialDesign type (11 supported — see table above)
test_scriptPath to test script
block_count1Number of blocks (replicates)
out_directoryresultsDirectory for per-run JSON results
processed_directoryDirectory for analysis outputs
lhs_samples0 (auto)LHS sample count; 0 = max(10, 2×factors)

Validation Rules

3 CLI Command Reference

Every command the tool supports, with all flags and annotated examples.

Generate a Design & Runner Script

Usage
doe generate --config FILE [--output FILE] [--format sh|py] [--seed N] [--dry-run]
FlagDefaultDescription
--configrequiredPath to JSON config file
--outputrun_experiments.shOutput script path
--formatshScript format: sh (Bash) or py (Python)
--seedrandomSeed for reproducible run order
--dry-runoffPrint design matrix without writing files

Pro Tip

Always use --dry-run first to preview your design matrix before committing to a full run. Use --seed for reproducible experiments.

Analyze Experiment Results

Usage
doe analyze --config FILE [--results-dir DIR] [--no-plots] [--csv DIR] [--partial]
FlagDescription
--configPath to JSON config file (required)
--results-dirOverride out_directory from config
--no-plotsSkip generating Pareto charts and effects plots (headless mode)
--csvExport main effects and summary stats to CSV files
--partialAnalyze only completed runs (skip missing result files)

What It Computes

  • ANOVA table — Full analysis of variance with SS decomposition, F-tests, and p-values. Uses Lenth's pseudo-standard-error for unreplicated designs (same approach as R's FrF2 package). Includes lack-of-fit test when replicates are available. Significant terms (p < 0.05) are highlighted.
  • Main effects — For 2-level factors: mean(high) − mean(low). For 3+ levels: max(means) − min(means).
  • Interaction effects — Two-factor interactions for all pairs of 2-level factors.
  • 95% Confidence intervals — Using the t-distribution on effect estimates.
  • Summary statistics — Per-factor, per-level: count, mean, std, min, max.
  • Model diagnostics — 2×2 diagnostic panel: residuals vs fitted values, normal probability plot of residuals, residuals vs run order, and predicted vs actual. Includes PRESS statistic and predicted R².
  • Pareto chart — Ranked bar chart with cumulative contribution line.
  • Main effects plot — Grid of line plots showing mean response at each factor level.
  • Normal probability plot — Effects plotted against normal quantiles; significant effects deviate from the reference line and are labeled.
  • Half-normal plot — Absolute effects against half-normal quantiles for screening.
  • Response surface plots — 3D surface plots for each pair of continuous factors.

Get Optimization Recommendations

Usage
doe optimize --config FILE [--results-dir DIR] [--response NAME] [--partial] [--multi] [--steepest]
FlagDescription
--configPath to JSON config file (required)
--results-dirOverride out_directory from config
--responseOptimize for a single response (default: all responses)
--partialUse only completed runs for optimization
--multiMulti-objective optimization using Derringer-Suich desirability functions
--steepestShow steepest ascent/descent pathway for sequential experimentation

The optimizer reports the best observed run, fits linear and quadratic RSM models, finds the true surface optimum using L-BFGS-B optimization with multi-start, and ranks factors by importance. Use --response to focus on a specific metric.

Use --steepest to generate a table of follow-up experiment points along the gradient direction (standard RSM Phase 1 methodology).

Multi-Objective Optimization

When your experiment has multiple responses that conflict (e.g., maximize yield AND minimize cost), use --multi to find the best compromise using Derringer-Suich desirability functions.

Terminal
$ doe optimize --config config.json --multi ============================================================ MULTI-OBJECTIVE OPTIMIZATION Method: Derringer-Suich Desirability Function ============================================================ Overall desirability: D = 0.7008 Response Weight Desirability Predicted Direction --------------------------------------------------------------------- yield 1.0 0.5648 67.55 % ↑ purity 1.0 0.9545 97.99 % ↑ cost 1.0 0.6383 47.76 USD ↓ Recommended settings: temperature = 200 °C pressure = 4 bar catalyst = 2 g/L Trade-off summary: yield: 67.55 (best observed: 78.52, sacrifice: +10.97) purity: 97.99 (best observed: 97.99, sacrifice: +0.00) cost: 47.76 (best observed: 34.05, sacrifice: +13.71)

To prioritize certain responses, add weight and optional bounds to your config:

config.json excerpt
"responses": [ {"name": "yield", "optimize": "maximize", "weight": 3, "bounds": [60, 95]}, {"name": "cost", "optimize": "minimize", "weight": 1, "bounds": [20, 80]} ]

Weights & Bounds

weight (default: 1.0) controls relative importance — a weight of 3 means that response matters 3× more than a weight of 1. bounds (optional) define [worst, best] for desirability scaling; if omitted, bounds are auto-computed from observed data.

Generate an Interactive HTML Report

Usage
doe report --config FILE [--results-dir DIR] [--output FILE] [--partial]

Generates a self-contained HTML file with:

  • Design summary (factors, levels, operation type)
  • ANOVA tables with F-tests, p-values, and significance highlighting
  • Main effects and interaction tables for each response
  • Pareto charts, main effects plots, and normal/half-normal probability plots (base64 embedded)
  • Model diagnostic panels (residuals vs fitted, normal probability, etc.)
  • 3D response surface plots for continuous factor pairs
  • Optimization results with true surface optimum
  • Full design matrix as an interactive table
  • Collapsible sections for easy navigation

Use --partial to generate reports from incomplete experiments. The report will note which runs are missing.

Self-Contained

The HTML report has zero external dependencies. All plots are base64-encoded directly in the file. Share it via email, Slack, or drop it in a wiki — it just works.

Display Design Summary

Usage
doe info --config FILE

Prints the experiment plan summary without writing any files: operation type, number of factors, base runs, total runs with blocking, response definitions, fixed factors, alias structure (for fractional factorials), and design evaluation metrics (D-efficiency, A-efficiency, G-efficiency). Use this to quickly verify your config before running anything.

Compute Statistical Power

Usage
doe power --config FILE [--sigma FLOAT] [--delta FLOAT] [--alpha FLOAT] [--results-dir DIR]
FlagDefaultDescription
--configrequiredPath to JSON config file
--sigmaautoError standard deviation (estimated from results if omitted)
--delta2×sigmaMinimum detectable effect size
--alpha0.05Significance level
--results-dirfrom configOverride out_directory from config

Computes statistical power for detecting effects of a given size using the non-central F distribution. Power < 0.80 indicates you may need more runs or blocks to reliably detect the specified effect size.

When to Use Power Analysis

Run this before your experiment to check if your design has enough runs. If power is low, consider adding blocks (replicates) or switching to a design with more runs. If results already exist, sigma is estimated automatically from residuals.

Augment an Existing Design

Usage
doe augment --config FILE --type TYPE [--output FILE] [--format sh|py]
FlagDefaultDescription
--configrequiredPath to JSON config file
--typerequiredAugmentation type: fold_over, star_points, or center_points
--outputrun_experiments_augmented.shOutput script path
--formatshScript format: sh or py

Extends an existing design with additional runs without re-running completed experiments:

  • fold_over — Mirrors all runs (swaps high/low levels) to de-alias confounded effects in fractional factorials.
  • star_points — Adds axial (star) points for continuous factors, converting a factorial design into a CCD for response surface modeling.
  • center_points — Adds 3 center-point replicates to detect curvature and estimate pure error.

Record Results Interactively

Usage
doe record --config FILE --run N|all [--seed N]
FlagDefaultDescription
--configrequiredPath to JSON config file
--runrequiredRun number to record, or all to enter all pending runs
--seed42Seed for consistent run ordering

For real-world experiments without a test script: the tool displays each run's factor settings, prompts for response values, validates numeric input, and saves the result as run_N.json. If results already exist, shows current values and asks before overwriting.

Example session
$ doe record --config config.json --run 3 Run 3 / 8 (Block 1) temperature = 200 °C pressure = 6 bar catalyst = B Enter value for 'yield' (%): 87.3 Enter value for 'cost' (USD): 42.10 Saved → results/run_3.json

Check Experiment Progress

Usage
doe status --config FILE [--seed N]
FlagDefaultDescription
--configrequiredPath to JSON config file
--seed42Seed for consistent run ordering

Shows a progress bar, lists completed and pending runs, and displays full factor details for the next run to complete. Especially useful for long-running real-world experiments.

Example output
Experiment: Chemical Reactor Optimization Design: box_behnken | 15 runs | 3 factors | 3 responses Progress: 9/15 complete [############........] 60% Pending runs: Run 10: temperature=200, pressure=6, catalyst=B Run 11: temperature=150, pressure=4, catalyst=A ... Next run to complete: Run 10 temperature = 200 °C pressure = 6 bar catalyst = B Record results with: doe record --config config.json --run 10

Export a Printable Worksheet

Usage
doe export-worksheet --config FILE [--format csv|markdown] [--output FILE]
FlagDefaultDescription
--configrequiredPath to JSON config file
--formatcsvOutput format: csv or markdown
--outputstdoutWrite to file instead of stdout
--seed42Seed for consistent run ordering

Generates a worksheet with all runs, factor values, and empty columns for response measurements and notes. Pre-fills response values for any runs that already have results. Perfect for printing and taking to the lab or field.

Pro Tip

Use --format markdown for documentation or wikis, and --format csv for importing into Excel or Google Sheets.

4 Writing a Test Script

Your test script is the bridge between the DOE tool and your actual experiment. It must follow a simple protocol.

The Protocol

Your script must: (1) accept factor values via the configured arg_style, (2) accept --out <path> for the output file, and (3) write a JSON file with keys matching your response names.

test.sh (double-dash style)
#!/bin/bash # Parse arguments while [[ $# -gt 0 ]]; do case $1 in --temperature) TEMP=$2; shift 2;; --pressure) PRES=$2; shift 2;; --catalyst) CAT=$2; shift 2;; --out) OUT=$2; shift 2;; *) shift;; esac done # Run your experiment here... YIELD=$(your_experiment $TEMP $PRES $CAT) COST=$(calculate_cost $TEMP $PRES $CAT) # Write results as JSON echo "{\"yield\": $YIELD, \"cost\": $COST}" > "$OUT"
test.py (double-dash style)
import argparse, json parser = argparse.ArgumentParser() parser.add_argument("--temperature", required=True) parser.add_argument("--pressure", required=True) parser.add_argument("--catalyst", required=True) parser.add_argument("--out", required=True) args = parser.parse_args() # Run your experiment here... result = { "yield": run_experiment(args.temperature, args.pressure, args.catalyst), "cost": calculate_cost(args.temperature, args.pressure, args.catalyst), } with open(args.out, "w") as f: json.dump(result, f)

No Test Script? No Problem.

For real-world experiments (lab work, physical tests, field measurements), you don't need a test script at all. Leave test_script empty or omit it, and use the manual workflow instead:

Manual experiment workflow
# 1. Design your experiment $ doe generate --config config.json --seed 42 # 2. Print a worksheet for the lab $ doe export-worksheet --config config.json --format csv --output worksheet.csv # 3. Check what to run next $ doe status --config config.json # 4. After each physical experiment, record the result $ doe record --config config.json --run 1 # 5. Analyze as you go (partial results OK) $ doe analyze --config config.json --partial # 6. When all runs are done, get the full analysis $ doe analyze --config config.json $ doe optimize --config config.json $ doe report --config config.json --output report.html

Works With Any Experiment

The analysis pipeline doesn't care how results were produced. Whether you ran a simulation, measured something in a lab, or collected field data — as long as the response values end up in run_N.json files, everything works.

5 Advanced Tips & Patterns

Use blocking for multi-day experiments

If your experiments span multiple days, sessions, or batches, use "block_count": 2 (or more) in your config. Each block is an independent replicate with its own randomized order. This lets you detect and account for day-to-day variation.

config.json excerpt
"settings": { "operation": "plackett_burman", "block_count": 2, // base runs x 2 = total ... }
Export to CSV for custom analysis in R or pandas

The --csv flag on the analyze command exports structured data files:

  • main_effects_{response}.csv — Factor, effect magnitude, std error, % contribution, CI bounds
  • summary_stats_{response}.csv — Per-factor, per-level statistics (N, mean, std, min, max)
Terminal
$ doe analyze --config config.json --csv results/csv/ $ ls results/csv/ main_effects_yield.csv summary_stats_yield.csv main_effects_cost.csv summary_stats_cost.csv
Run headless in CI/CD pipelines

Use --no-plots to skip matplotlib chart generation. This avoids display issues in SSH sessions and CI environments while still computing all statistical results.

GitHub Actions example
- name: Run DOE analysis run: | doe generate --config config.json --seed 42 bash run_experiments.sh doe analyze --config config.json --no-plots --csv results/csv/
The screening-to-optimization pipeline

The most efficient experimental strategy uses two (or three) stages:

  1. Screen with Plackett-Burman (N+1 runs): identify the 2–3 factors that matter most out of many candidates.
  2. Optimize with Box-Behnken or CCD (15–20 runs): fit a quadratic model to the important factors and find the true optimum.
  3. Confirm with 3–5 runs at the predicted optimum to validate.

Total: ~25 runs to fully optimize, compared to hundreds with grid search.

Reproducible experiments with --seed

The --seed flag controls run-order randomization. The same seed always produces the same run order, making your experiments reproducible. The design matrix itself (which factor combinations are tested) is always deterministic — the seed only affects the order within each block.

6 Complete Tutorials

End-to-end walkthroughs of real experimental workflows.

Tutorial 1: Database Performance Tuning

A Plackett-Burman design with 6 PostgreSQL configuration parameters, 2 blocks for replication, and CSV export for downstream analysis in R.

Complete workflow
# Step 1: Preview $ doe info --config use_cases/04_database_performance_tuning/config.json # Step 2: Generate $ doe generate --config use_cases/04_database_performance_tuning/config.json \ --output results/run.sh --seed 42 # Step 3: Execute $ bash results/run.sh # Step 4: Analyze (headless + CSV) $ doe analyze --config use_cases/04_database_performance_tuning/config.json \ --no-plots --csv results/csv/ # Step 5: Optimize $ doe optimize --config use_cases/04_database_performance_tuning/config.json # Step 6: Report $ doe report --config use_cases/04_database_performance_tuning/config.json \ --output results/report.html

Tutorial 2: Chemical Reactor Optimization

A Box-Behnken design with 3 continuous factors and 3 responses (yield, purity, cost). Demonstrates multi-response analysis, RSM, and the trade-offs inherent in multi-objective optimization.

Complete workflow
# Preview the Box-Behnken design (15 runs) $ doe info --config use_cases/01_reactor_optimization/config.json # Generate, run, and analyze $ doe generate --config use_cases/01_reactor_optimization/config.json \ --output results/run.sh --seed 42 $ bash results/run.sh $ doe analyze --config use_cases/01_reactor_optimization/config.json # Get optimization recommendations $ doe optimize --config use_cases/01_reactor_optimization/config.json # Generate interactive HTML report $ doe report --config use_cases/01_reactor_optimization/config.json \ --output results/report.html

Tutorial 3: Real-World Lab Experiment (Manual Entry)

A hands-on walkthrough for running physical experiments without a test script. Uses the record, status, and export-worksheet commands to manage a manual workflow.

Complete manual workflow
# Create a config with no test_script $ cat config.json { "factors": [ {"name": "temperature", "levels": ["150", "200"], "type": "continuous", "unit": "°C"}, {"name": "pressure", "levels": ["2", "6"], "type": "continuous", "unit": "bar"}, {"name": "catalyst", "levels": ["A", "B"], "type": "categorical"} ], "responses": [ {"name": "yield", "optimize": "maximize", "unit": "%"}, {"name": "cost", "optimize": "minimize", "unit": "USD"} ], "settings": {"operation": "full_factorial", "out_directory": "results"} } # Preview the design $ doe info --config config.json # Print worksheet for lab notebook $ doe export-worksheet --config config.json --format markdown # Check progress $ doe status --config config.json # Record results after each experiment $ doe record --config config.json --run 1 $ doe record --config config.json --run 2 # Peek at partial results $ doe analyze --config config.json --partial # Record remaining runs and finalize $ doe record --config config.json --run all $ doe analyze --config config.json $ doe report --config config.json --output report.html

7 Using AI to Help Design Experiments

Not sure how to set up your experiment? Use the AI Prompts page to generate experiment configurations with the help of an AI assistant like Claude or ChatGPT.

How It Works

The AI Prompts page provides ready-to-use prompts that guide an AI assistant through the process of creating a DOE configuration file for your specific problem. Describe your experiment in plain English, and get a complete config.json, simulation script, and analysis workflow.

The prompts cover common scenarios:

Example: asking AI to design your experiment
"I'm optimizing a PostgreSQL database. I want to test shared_buffers (256MB to 4GB), work_mem (4MB to 256MB), effective_cache_size (1GB to 8GB), and max_parallel_workers (2 to 8). I care about query throughput and p99 latency. Generate a DOE config.json for me."

Browse all AI prompts →