← All Use Cases
🧠
Fractional Factorial Design

ML Hyperparameter Screening

Screen 5 hyperparameters in 16 runs instead of 32 — saving expensive GPU time.

Summary

This experiment investigates ml hyperparameter screening. Fractional factorial to screen 5 hyperparameters with minimal runs.

The design varies 5 factors: learning rate, ranging from 0.001 to 0.1, batch size, ranging from 32 to 256, dropout, ranging from 0.1 to 0.5, hidden layers, ranging from 2 to 6, and optimizer, ranging from sgd to adam. The goal is to optimize 2 responses: accuracy (%) (maximize) and training time (sec) (minimize). Fixed conditions held constant across all runs include epochs = 50, dataset = cifar10.

A fractional factorial design reduces the number of runs from 32 to 8 by deliberately confounding higher-order interactions. This is ideal for screening — identifying which of the 5 factors matter most before investing in a full study.

Key Findings

For accuracy, the most influential factors were learning rate (34.8%), dropout (32.5%), batch size (22.6%). The best observed value was 78.1 (at learning rate = 0.001, batch size = 256, dropout = 0.1).

For training time, the most influential factors were learning rate (33.2%), dropout (29.0%), batch size (23.3%). The best observed value was 77.3 (at learning rate = 0.001, batch size = 256, dropout = 0.1).

Recommended Next Steps

The Scenario

You are training a deep learning model and need to screen 5 hyperparameters to find which ones matter most. A full factorial would require 25 = 32 runs, and each training run takes significant GPU time. A fractional factorial cuts this in half.

Why Fractional Factorial?

Resolution III fractional factorial uses only 16 runs. You're screening — you just need to know which hyperparameters have the largest effects. Some main effects are aliased with 2-factor interactions, but follow-up experiments can resolve ambiguities.

Experimental Setup

Factors

FactorLowHighType
learning_rate0.0010.1continuous
batch_size32256continuous
dropout0.10.5continuous
hidden_layers26continuous
optimizersgdadamcategorical

Fixed: epochs = 50, dataset = cifar10

Responses

ResponseDirectionUnit
accuracy↑ maximize%
training_time↓ minimizesec

Conflicting objectives

Look for factors that improve accuracy without hurting training time — those are the "free wins."

Experimental Matrix

The Fractional Factorial Design produces 8 runs. Each row is one experiment with specific factor settings.

Runlearning_ratebatch_sizedropouthidden_layersoptimizer
10.0012560.52sgd
20.1320.12sgd
30.12560.16sgd
40.12560.56adam
50.0012560.12adam
60.1320.52adam
70.001320.16adam
80.001320.56sgd

Step-by-Step Workflow

Complete workflow
# Preview (notice: 16 runs instead of 32) $ doe info --config use_cases/03_ml_hyperparameter_screening/config.json # Generate with positional arg style $ doe generate --config use_cases/03_ml_hyperparameter_screening/config.json \ --output results/run.sh --seed 7 # Run the simulated training $ bash results/run.sh # Analyze both responses $ doe analyze --config use_cases/03_ml_hyperparameter_screening/config.json # Optimize and report $ doe optimize --config use_cases/03_ml_hyperparameter_screening/config.json $ doe optimize --config use_cases/03_ml_hyperparameter_screening/config.json --multi # multi-objective $ doe report --config use_cases/03_ml_hyperparameter_screening/config.json \ --output results/report.html

Positional argument style

This use case uses positional args: factors are passed in order without flag names. The runner script calls sim.sh 0.001 32 0.1 2 sgd --out run_1.json. Useful when your training script expects ordered arguments.

Features Exercised

FeatureValue
Design typefractional_factorial
Factor typescontinuous (4) + categorical (1)
Arg stylepositional
Run reduction16 runs instead of 32 (50% savings)
Multi-responseaccuracy ↑, training_time ↓
--seed7 (reproducible run order)

Analysis Results

Generated from actual experiment runs using the DOE Helper Tool.

Response: accuracy

The Pareto chart identifies which hyperparameters contribute most to model accuracy.

Pareto Chart

Pareto chart for accuracy

Main Effects Plot

Main effects plot for accuracy

Response: training_time

Training time is driven by a different set of hyperparameters, revealing trade-offs with accuracy.

Pareto Chart

Pareto chart for training time

Main Effects Plot

Main effects plot for training time

Response Surface Plots

3D surfaces fitted with quadratic RSM. Red dots are observed data points.

📊

How to Read These Surfaces

Each plot shows predicted response (vertical axis) across two factors while other factors are held at center. Red dots are actual experimental observations.

  • Flat surface — these two factors have little effect on the response.
  • Tilted plane — strong linear effect; moving along one axis consistently changes the response.
  • Curved/domed surface — quadratic curvature; there is an optimum somewhere in the middle.
  • Saddle shape — significant interaction; the best setting of one factor depends on the other.
  • Red dots far from surface — poor model fit in that region; be cautious about predictions there.

accuracy (%) — R² = 1.000, Adj R² = 1.000
The model fits well — the surface shape is reliable.
Curvature detected in learning_rate, batch_size — look for a peak or valley in the surface.
Strongest linear driver: batch_size (decreases accuracy).
Notable interaction: learning_rate × hidden_layers — the effect of one depends on the level of the other. Look for a twisted surface.

training_time (sec) — R² = 1.000, Adj R² = 1.000
The model fits well — the surface shape is reliable.
Curvature detected in learning_rate, batch_size — look for a peak or valley in the surface.
Strongest linear driver: batch_size (increases training_time).
Notable interaction: learning_rate × hidden_layers — the effect of one depends on the level of the other. Look for a twisted surface.

accuracy: batch size vs dropout

RSM surface: accuracy — batch size vs dropout

accuracy: batch size vs hidden layers

RSM surface: accuracy — batch size vs hidden layers

accuracy: dropout vs hidden layers

RSM surface: accuracy — dropout vs hidden layers

accuracy: learning rate vs batch size

RSM surface: accuracy — learning rate vs batch size

accuracy: learning rate vs dropout

RSM surface: accuracy — learning rate vs dropout

accuracy: learning rate vs hidden layers

RSM surface: accuracy — learning rate vs hidden layers

training: time batch size vs dropout

RSM surface: training — time batch size vs dropout

training: time batch size vs hidden layers

RSM surface: training — time batch size vs hidden layers

training: time dropout vs hidden layers

RSM surface: training — time dropout vs hidden layers

training: time learning rate vs batch size

RSM surface: training — time learning rate vs batch size

training: time learning rate vs dropout

RSM surface: training — time learning rate vs dropout

training: time learning rate vs hidden layers

RSM surface: training — time learning rate vs hidden layers

Full Analysis Output

doe analyze
=== Main Effects: accuracy === Factor Effect Std Error % Contribution -------------------------------------------------------------- optimizer -7.8675 3.0940 33.5% hidden_layers 7.2775 3.0940 31.0% learning_rate -6.9425 3.0940 29.6% dropout 0.7725 3.0940 3.3% batch_size -0.6125 3.0940 2.6% === Interaction Effects: accuracy === Factor A Factor B Interaction % Contribution ------------------------------------------------------------------------ batch_size optimizer 9.9875 18.3% dropout hidden_layers 9.9875 18.3% learning_rate dropout 7.8675 14.4% learning_rate batch_size -7.2775 13.3% batch_size hidden_layers 6.9425 12.7% dropout optimizer 6.9425 12.7% batch_size dropout -2.0625 3.8% hidden_layers optimizer -2.0625 3.8% learning_rate optimizer -0.7725 1.4% learning_rate hidden_layers 0.6125 1.1% === Summary Statistics: accuracy === learning_rate: Level N Mean Std Min Max ------------------------------------------------------------ 0.001 4 70.7525 7.2472 59.9100 74.9300 0.1 4 63.8100 9.6972 57.4100 78.1000 batch_size: Level N Mean Std Min Max ------------------------------------------------------------ 256 4 67.5875 10.4395 57.4100 78.1000 32 4 66.9750 8.3340 58.1600 74.5000 dropout: Level N Mean Std Min Max ------------------------------------------------------------ 0.1 4 66.8950 8.7327 57.4100 74.9300 0.5 4 67.6675 10.1010 58.1600 78.1000 hidden_layers: Level N Mean Std Min Max ------------------------------------------------------------ 2 4 63.6425 7.6527 58.1600 74.9300 6 4 70.9200 9.2096 57.4100 78.1000 optimizer: Level N Mean Std Min Max ------------------------------------------------------------ adam 4 71.2150 8.9006 58.1600 78.1000 sgd 4 63.3475 7.6291 57.4100 74.5000 === Main Effects: training_time === Factor Effect Std Error % Contribution -------------------------------------------------------------- optimizer 19.3500 7.2462 33.6% hidden_layers -14.7000 7.2462 25.5% learning_rate 14.2000 7.2462 24.7% dropout -9.2500 7.2462 16.1% batch_size -0.1000 7.2462 0.2% === Interaction Effects: training_time === Factor A Factor B Interaction % Contribution ------------------------------------------------------------------------ batch_size optimizer -24.0500 18.9% dropout hidden_layers -24.0500 18.9% learning_rate dropout -19.3500 15.2% learning_rate batch_size 14.7000 11.5% batch_size hidden_layers -14.2000 11.1% dropout optimizer -14.2000 11.1% learning_rate optimizer 9.2500 7.3% batch_size dropout 3.7500 2.9% hidden_layers optimizer 3.7500 2.9% learning_rate hidden_layers 0.1000 0.1% === Summary Statistics: training_time === learning_rate: Level N Mean Std Min Max ------------------------------------------------------------ 0.001 4 98.6000 15.6327 86.1000 121.2000 0.1 4 112.8000 24.5218 77.3000 133.7000 batch_size: Level N Mean Std Min Max ------------------------------------------------------------ 256 4 105.7500 26.1586 77.3000 133.7000 32 4 105.6500 17.2003 86.1000 120.5000 dropout: Level N Mean Std Min Max ------------------------------------------------------------ 0.1 4 110.3250 20.2307 90.8000 133.7000 0.5 4 101.0750 22.6672 77.3000 121.2000 hidden_layers: Level N Mean Std Min Max ------------------------------------------------------------ 2 4 113.0500 14.8460 90.8000 121.2000 6 4 98.3500 24.8126 77.3000 133.7000 optimizer: Level N Mean Std Min Max ------------------------------------------------------------ adam 4 96.0250 17.6872 77.3000 119.7000 sgd 4 115.3750 20.4371 86.1000 133.7000

Optimization Recommendations

doe optimize
=== Optimization: accuracy === Direction: maximize Best observed run: #4 learning_rate = 0.001 batch_size = 32 dropout = 0.5 hidden_layers = 6 optimizer = sgd Value: 78.1 RSM Model (linear, R² = 0.76): Coefficients: intercept: +67.2812 learning_rate: -4.2637 batch_size: +0.4462 dropout: -3.0338 hidden_layers: +4.7863 optimizer: +0.4937 Predicted optimum: learning_rate = 0.001 batch_size = 32 dropout = 0.1 hidden_layers = 6 optimizer = adam Predicted value: 78.4250 Factor importance: 1. hidden_layers (effect: 9.6, contribution: 36.8%) 2. learning_rate (effect: -8.5, contribution: 32.7%) 3. dropout (effect: -6.1, contribution: 23.3%) 4. optimizer (effect: 1.0, contribution: 3.8%) 5. batch_size (effect: -0.9, contribution: 3.4%) === Optimization: training_time === Direction: minimize Best observed run: #4 learning_rate = 0.001 batch_size = 32 dropout = 0.5 hidden_layers = 6 optimizer = sgd Value: 77.3 RSM Model (linear, R² = 0.73): Coefficients: intercept: +105.7000 learning_rate: +10.4750 batch_size: -1.0500 dropout: +7.4750 hidden_layers: -9.4750 optimizer: -3.4500 Predicted optimum: learning_rate = 0.1 batch_size = 32 dropout = 0.5 hidden_layers = 2 optimizer = adam Predicted value: 137.6250 Factor importance: 1. learning_rate (effect: 21.0, contribution: 32.8%) 2. hidden_layers (effect: -19.0, contribution: 29.7%) 3. dropout (effect: 15.0, contribution: 23.4%) 4. optimizer (effect: -6.9, contribution: 10.8%) 5. batch_size (effect: 2.1, contribution: 3.3%)

Multi-Objective Optimization

When responses compete, Derringer–Suich desirability finds the best compromise. Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.

Overall Desirability
D = 1.0000

Per-Response Desirability

ResponseWeightDesirabilityPredictedDir
accuracy 1.5
1.0000
80.87 1.0000 80.87 %
training_time 1.0
1.0000
73.29 1.0000 73.29 sec

Recommended Settings

FactorValue
learning_rate0.09247
batch_size34.55
dropout0.4207
hidden_layers5.91
optimizeradam

Source: from RSM model prediction

Trade-off Summary

Sacrifice = how much worse than single-objective best.

ResponsePredictedBest ObservedSacrifice
training_time73.2977.30-4.01

Top 3 Runs by Desirability

RunDFactor Settings
#30.8029learning_rate=0.1, batch_size=256, dropout=0.5, hidden_layers=6, optimizer=adam
#60.7830learning_rate=0.001, batch_size=32, dropout=0.5, hidden_layers=6, optimizer=sgd

Model Quality

ResponseType
training_time0.9277linear

Full Multi-Objective Output

doe optimize --multi
============================================================ MULTI-OBJECTIVE OPTIMIZATION Method: Derringer-Suich Desirability Function ============================================================ Overall desirability: D = 1.0000 Response Weight Desirability Predicted Direction --------------------------------------------------------------------- accuracy 1.5 1.0000 80.87 % ↑ training_time 1.0 1.0000 73.29 sec ↓ Recommended settings: learning_rate = 0.09247 batch_size = 34.55 dropout = 0.4207 hidden_layers = 5.91 optimizer = adam (from RSM model prediction) Trade-off summary: accuracy: 80.87 (best observed: 78.10, sacrifice: -2.77) training_time: 73.29 (best observed: 77.30, sacrifice: -4.01) Model quality: accuracy: R² = 0.9745 (linear) training_time: R² = 0.9277 (linear) Top 3 observed runs by overall desirability: 1. Run #4 (D=0.9545): learning_rate=0.001, batch_size=32, dropout=0.1, hidden_layers=6, optimizer=adam 2. Run #3 (D=0.8029): learning_rate=0.1, batch_size=256, dropout=0.5, hidden_layers=6, optimizer=adam 3. Run #6 (D=0.7830): learning_rate=0.001, batch_size=32, dropout=0.5, hidden_layers=6, optimizer=sgd
← Web App A/B Testing Next: Database Tuning →