Summary
This experiment investigates ml hyperparameter screening. Fractional factorial to screen 5 hyperparameters with minimal runs.
The design varies 5 factors: learning rate, ranging from 0.001 to 0.1, batch size, ranging from 32 to 256, dropout, ranging from 0.1 to 0.5, hidden layers, ranging from 2 to 6, and optimizer, ranging from sgd to adam. The goal is to optimize 2 responses: accuracy (%) (maximize) and training time (sec) (minimize). Fixed conditions held constant across all runs include epochs = 50, dataset = cifar10.
A fractional factorial design reduces the number of runs from 32 to 8 by deliberately confounding higher-order interactions. This is ideal for screening — identifying which of the 5 factors matter most before investing in a full study.
Key Findings
For accuracy, the most influential factors were learning rate (34.8%), dropout (32.5%), batch size (22.6%). The best observed value was 78.1 (at learning rate = 0.001, batch size = 256, dropout = 0.1).
For training time, the most influential factors were learning rate (33.2%), dropout (29.0%), batch size (23.3%). The best observed value was 77.3 (at learning rate = 0.001, batch size = 256, dropout = 0.1).
Recommended Next Steps
- Follow up with a response surface design (CCD or Box-Behnken) on the top 3–4 factors to model curvature and find the true optimum.
- Consider whether any fixed factors should be varied in a future study.
- The screening results can guide factor reduction — drop factors contributing less than 5% and re-run with a smaller, more focused design.
The Scenario
You are training a deep learning model and need to screen 5 hyperparameters to find which ones matter most. A full factorial would require 25 = 32 runs, and each training run takes significant GPU time. A fractional factorial cuts this in half.
ℹ
Why Fractional Factorial?
Resolution III fractional factorial uses only 16 runs. You're screening — you just need to know which hyperparameters have the largest effects. Some main effects are aliased with 2-factor interactions, but follow-up experiments can resolve ambiguities.
Experimental Setup
Factors
| Factor | Low | High | Type |
learning_rate | 0.001 | 0.1 | continuous |
batch_size | 32 | 256 | continuous |
dropout | 0.1 | 0.5 | continuous |
hidden_layers | 2 | 6 | continuous |
optimizer | sgd | adam | categorical |
Fixed: epochs = 50, dataset = cifar10
Responses
| Response | Direction | Unit |
accuracy | ↑ maximize | % |
training_time | ↓ minimize | sec |
✔
Conflicting objectives
Look for factors that improve accuracy without hurting training time — those are the "free wins."
Experimental Matrix
The Fractional Factorial Design produces 8 runs. Each row is one experiment with specific factor settings.
| Run | learning_rate | batch_size | dropout | hidden_layers | optimizer |
| 1 | 0.001 | 256 | 0.5 | 2 | sgd |
| 2 | 0.1 | 32 | 0.1 | 2 | sgd |
| 3 | 0.1 | 256 | 0.1 | 6 | sgd |
| 4 | 0.1 | 256 | 0.5 | 6 | adam |
| 5 | 0.001 | 256 | 0.1 | 2 | adam |
| 6 | 0.1 | 32 | 0.5 | 2 | adam |
| 7 | 0.001 | 32 | 0.1 | 6 | adam |
| 8 | 0.001 | 32 | 0.5 | 6 | sgd |
Step-by-Step Workflow
$ doe info --config use_cases/03_ml_hyperparameter_screening/config.json
$ doe generate --config use_cases/03_ml_hyperparameter_screening/config.json \
--output results/run.sh --seed 7
$ bash results/run.sh
$ doe analyze --config use_cases/03_ml_hyperparameter_screening/config.json
$ doe optimize --config use_cases/03_ml_hyperparameter_screening/config.json
$ doe optimize --config use_cases/03_ml_hyperparameter_screening/config.json --multi
$ doe report --config use_cases/03_ml_hyperparameter_screening/config.json \
--output results/report.html
⚠
Positional argument style
This use case uses positional args: factors are passed in order without flag names. The runner script calls sim.sh 0.001 32 0.1 2 sgd --out run_1.json. Useful when your training script expects ordered arguments.
Features Exercised
| Feature | Value |
| Design type | fractional_factorial |
| Factor types | continuous (4) + categorical (1) |
| Arg style | positional |
| Run reduction | 16 runs instead of 32 (50% savings) |
| Multi-response | accuracy ↑, training_time ↓ |
--seed | 7 (reproducible run order) |
Analysis Results
Generated from actual experiment runs using the DOE Helper Tool.
Response: accuracy
The Pareto chart identifies which hyperparameters contribute most to model accuracy.
Pareto Chart
Main Effects Plot
Response: training_time
Training time is driven by a different set of hyperparameters, revealing trade-offs with accuracy.
Pareto Chart
Main Effects Plot
Response Surface Plots
3D surfaces fitted with quadratic RSM. Red dots are observed data points.
📊
How to Read These Surfaces
Each plot shows predicted response (vertical axis) across two factors while other factors are held at center. Red dots are actual experimental observations.
- Flat surface — these two factors have little effect on the response.
- Tilted plane — strong linear effect; moving along one axis consistently changes the response.
- Curved/domed surface — quadratic curvature; there is an optimum somewhere in the middle.
- Saddle shape — significant interaction; the best setting of one factor depends on the other.
- Red dots far from surface — poor model fit in that region; be cautious about predictions there.
accuracy (%) — R² = 1.000, Adj R² = 1.000
The model fits well — the surface shape is reliable.
Curvature detected in learning_rate, batch_size — look for a peak or valley in the surface.
Strongest linear driver: batch_size (decreases accuracy).
Notable interaction: learning_rate × hidden_layers — the effect of one depends on the level of the other. Look for a twisted surface.
training_time (sec) — R² = 1.000, Adj R² = 1.000
The model fits well — the surface shape is reliable.
Curvature detected in learning_rate, batch_size — look for a peak or valley in the surface.
Strongest linear driver: batch_size (increases training_time).
Notable interaction: learning_rate × hidden_layers — the effect of one depends on the level of the other. Look for a twisted surface.
accuracy: batch size vs dropout
accuracy: batch size vs hidden layers
accuracy: dropout vs hidden layers
accuracy: learning rate vs batch size
accuracy: learning rate vs dropout
accuracy: learning rate vs hidden layers
training: time batch size vs dropout
training: time batch size vs hidden layers
training: time dropout vs hidden layers
training: time learning rate vs batch size
training: time learning rate vs dropout
training: time learning rate vs hidden layers
Full Analysis Output
=== Main Effects: accuracy ===
Factor Effect Std Error % Contribution
--------------------------------------------------------------
optimizer -7.8675 3.0940 33.5%
hidden_layers 7.2775 3.0940 31.0%
learning_rate -6.9425 3.0940 29.6%
dropout 0.7725 3.0940 3.3%
batch_size -0.6125 3.0940 2.6%
=== Interaction Effects: accuracy ===
Factor A Factor B Interaction % Contribution
------------------------------------------------------------------------
batch_size optimizer 9.9875 18.3%
dropout hidden_layers 9.9875 18.3%
learning_rate dropout 7.8675 14.4%
learning_rate batch_size -7.2775 13.3%
batch_size hidden_layers 6.9425 12.7%
dropout optimizer 6.9425 12.7%
batch_size dropout -2.0625 3.8%
hidden_layers optimizer -2.0625 3.8%
learning_rate optimizer -0.7725 1.4%
learning_rate hidden_layers 0.6125 1.1%
=== Summary Statistics: accuracy ===
learning_rate:
Level N Mean Std Min Max
------------------------------------------------------------
0.001 4 70.7525 7.2472 59.9100 74.9300
0.1 4 63.8100 9.6972 57.4100 78.1000
batch_size:
Level N Mean Std Min Max
------------------------------------------------------------
256 4 67.5875 10.4395 57.4100 78.1000
32 4 66.9750 8.3340 58.1600 74.5000
dropout:
Level N Mean Std Min Max
------------------------------------------------------------
0.1 4 66.8950 8.7327 57.4100 74.9300
0.5 4 67.6675 10.1010 58.1600 78.1000
hidden_layers:
Level N Mean Std Min Max
------------------------------------------------------------
2 4 63.6425 7.6527 58.1600 74.9300
6 4 70.9200 9.2096 57.4100 78.1000
optimizer:
Level N Mean Std Min Max
------------------------------------------------------------
adam 4 71.2150 8.9006 58.1600 78.1000
sgd 4 63.3475 7.6291 57.4100 74.5000
=== Main Effects: training_time ===
Factor Effect Std Error % Contribution
--------------------------------------------------------------
optimizer 19.3500 7.2462 33.6%
hidden_layers -14.7000 7.2462 25.5%
learning_rate 14.2000 7.2462 24.7%
dropout -9.2500 7.2462 16.1%
batch_size -0.1000 7.2462 0.2%
=== Interaction Effects: training_time ===
Factor A Factor B Interaction % Contribution
------------------------------------------------------------------------
batch_size optimizer -24.0500 18.9%
dropout hidden_layers -24.0500 18.9%
learning_rate dropout -19.3500 15.2%
learning_rate batch_size 14.7000 11.5%
batch_size hidden_layers -14.2000 11.1%
dropout optimizer -14.2000 11.1%
learning_rate optimizer 9.2500 7.3%
batch_size dropout 3.7500 2.9%
hidden_layers optimizer 3.7500 2.9%
learning_rate hidden_layers 0.1000 0.1%
=== Summary Statistics: training_time ===
learning_rate:
Level N Mean Std Min Max
------------------------------------------------------------
0.001 4 98.6000 15.6327 86.1000 121.2000
0.1 4 112.8000 24.5218 77.3000 133.7000
batch_size:
Level N Mean Std Min Max
------------------------------------------------------------
256 4 105.7500 26.1586 77.3000 133.7000
32 4 105.6500 17.2003 86.1000 120.5000
dropout:
Level N Mean Std Min Max
------------------------------------------------------------
0.1 4 110.3250 20.2307 90.8000 133.7000
0.5 4 101.0750 22.6672 77.3000 121.2000
hidden_layers:
Level N Mean Std Min Max
------------------------------------------------------------
2 4 113.0500 14.8460 90.8000 121.2000
6 4 98.3500 24.8126 77.3000 133.7000
optimizer:
Level N Mean Std Min Max
------------------------------------------------------------
adam 4 96.0250 17.6872 77.3000 119.7000
sgd 4 115.3750 20.4371 86.1000 133.7000
Optimization Recommendations
=== Optimization: accuracy ===
Direction: maximize
Best observed run: #4
learning_rate = 0.001
batch_size = 32
dropout = 0.5
hidden_layers = 6
optimizer = sgd
Value: 78.1
RSM Model (linear, R² = 0.76):
Coefficients:
intercept: +67.2812
learning_rate: -4.2637
batch_size: +0.4462
dropout: -3.0338
hidden_layers: +4.7863
optimizer: +0.4937
Predicted optimum:
learning_rate = 0.001
batch_size = 32
dropout = 0.1
hidden_layers = 6
optimizer = adam
Predicted value: 78.4250
Factor importance:
1. hidden_layers (effect: 9.6, contribution: 36.8%)
2. learning_rate (effect: -8.5, contribution: 32.7%)
3. dropout (effect: -6.1, contribution: 23.3%)
4. optimizer (effect: 1.0, contribution: 3.8%)
5. batch_size (effect: -0.9, contribution: 3.4%)
=== Optimization: training_time ===
Direction: minimize
Best observed run: #4
learning_rate = 0.001
batch_size = 32
dropout = 0.5
hidden_layers = 6
optimizer = sgd
Value: 77.3
RSM Model (linear, R² = 0.73):
Coefficients:
intercept: +105.7000
learning_rate: +10.4750
batch_size: -1.0500
dropout: +7.4750
hidden_layers: -9.4750
optimizer: -3.4500
Predicted optimum:
learning_rate = 0.1
batch_size = 32
dropout = 0.5
hidden_layers = 2
optimizer = adam
Predicted value: 137.6250
Factor importance:
1. learning_rate (effect: 21.0, contribution: 32.8%)
2. hidden_layers (effect: -19.0, contribution: 29.7%)
3. dropout (effect: 15.0, contribution: 23.4%)
4. optimizer (effect: -6.9, contribution: 10.8%)
5. batch_size (effect: 2.1, contribution: 3.3%)
Multi-Objective Optimization
When responses compete, Derringer–Suich desirability finds the best compromise.
Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.
Overall Desirability
D = 1.0000
Per-Response Desirability
| Response | Weight | Desirability | Predicted | Dir |
accuracy |
1.5 |
|
80.87 1.0000 80.87 % |
↑ |
training_time |
1.0 |
|
73.29 1.0000 73.29 sec |
↓ |
Recommended Settings
| Factor | Value |
learning_rate | 0.09247 |
batch_size | 34.55 |
dropout | 0.4207 |
hidden_layers | 5.91 |
optimizer | adam |
Source: from RSM model prediction
Trade-off Summary
Sacrifice = how much worse than single-objective best.
| Response | Predicted | Best Observed | Sacrifice |
training_time | 73.29 | 77.30 | -4.01 |
Top 3 Runs by Desirability
| Run | D | Factor Settings |
| #3 | 0.8029 | learning_rate=0.1, batch_size=256, dropout=0.5, hidden_layers=6, optimizer=adam |
| #6 | 0.7830 | learning_rate=0.001, batch_size=32, dropout=0.5, hidden_layers=6, optimizer=sgd |
Model Quality
| Response | R² | Type |
training_time | 0.9277 | linear |
Full Multi-Objective Output
============================================================
MULTI-OBJECTIVE OPTIMIZATION
Method: Derringer-Suich Desirability Function
============================================================
Overall desirability: D = 1.0000
Response Weight Desirability Predicted Direction
---------------------------------------------------------------------
accuracy 1.5 1.0000 80.87 % ↑
training_time 1.0 1.0000 73.29 sec ↓
Recommended settings:
learning_rate = 0.09247
batch_size = 34.55
dropout = 0.4207
hidden_layers = 5.91
optimizer = adam
(from RSM model prediction)
Trade-off summary:
accuracy: 80.87 (best observed: 78.10, sacrifice: -2.77)
training_time: 73.29 (best observed: 77.30, sacrifice: -4.01)
Model quality:
accuracy: R² = 0.9745 (linear)
training_time: R² = 0.9277 (linear)
Top 3 observed runs by overall desirability:
1. Run #4 (D=0.9545): learning_rate=0.001, batch_size=32, dropout=0.1, hidden_layers=6, optimizer=adam
2. Run #3 (D=0.8029): learning_rate=0.1, batch_size=256, dropout=0.5, hidden_layers=6, optimizer=adam
3. Run #6 (D=0.7830): learning_rate=0.001, batch_size=32, dropout=0.5, hidden_layers=6, optimizer=sgd