Summary
This experiment investigates job scheduler packing optimization. Optimize HPC job scheduler packing parameters to maximize throughput and resource efficiency.
The design varies 3 factors: nodes (count), ranging from 4 to 64, tasks per node (count), ranging from 8 to 48, and mem per task (GB), ranging from 1 to 8. The goal is to optimize 2 responses: throughput (jobs/h) (maximize) and efficiency (%) (maximize).
A Central Composite Design (CCD) was selected to fit a full quadratic response surface model, including curvature and interaction effects. With 3 factors this produces 22 runs including center points and axial (star) points that extend beyond the factorial range.
Quadratic response surface models were fitted to capture potential curvature and factor interactions. The RSM contour plots below visualize how pairs of factors jointly affect each response.
Key Findings
For throughput, the most influential factors were tasks per node (34.4%), mem per task (34.0%), nodes (31.7%). The best observed value was 261.62 (at nodes = 64, tasks per node = 48, mem per task = 1).
For efficiency, the most influential factors were tasks per node (40.1%), nodes (37.1%), mem per task (22.8%). The best observed value was 86.93 (at nodes = 34, tasks per node = 28, mem per task = 4.5).
Recommended Next Steps
- Run confirmation experiments at the predicted optimal settings to validate the model.
- Consider whether any fixed factors should be varied in a future study.
Experimental Setup
Factors
| Factor | Levels | Type | Unit |
nodes | 4, 64 | continuous | count |
tasks_per_node | 8, 48 | continuous | count |
mem_per_task | 1, 8 | continuous | GB |
Fixed: none
Responses
| Response | Direction | Unit |
throughput | ↑ maximize | jobs/h |
efficiency | ↑ maximize | % |
Experimental Matrix
The Central Composite Design produces 22 runs. Each row is one experiment with specific factor settings.
| Run | nodes | tasks_per_node | mem_per_task |
| 1 | 34 | 28 | 4.5 |
| 2 | 64 | 8 | 8 |
| 3 | 4 | 48 | 1 |
| 4 | 34 | 64.5148 | 4.5 |
| 5 | 34 | 28 | 4.5 |
| 6 | -20.7723 | 28 | 4.5 |
| 7 | 34 | 28 | -1.8901 |
| 8 | 34 | 28 | 4.5 |
| 9 | 64 | 48 | 1 |
| 10 | 88.7723 | 28 | 4.5 |
| 11 | 34 | 28 | 4.5 |
| 12 | 34 | -8.51484 | 4.5 |
| 13 | 34 | 28 | 4.5 |
| 14 | 4 | 8 | 8 |
| 15 | 34 | 28 | 4.5 |
| 16 | 64 | 8 | 1 |
| 17 | 34 | 28 | 10.8901 |
| 18 | 64 | 48 | 8 |
| 19 | 34 | 28 | 4.5 |
| 20 | 4 | 8 | 1 |
| 21 | 4 | 48 | 8 |
| 22 | 34 | 28 | 4.5 |
How to Run
$ doe info --config use_cases/12_job_scheduler_packing/config.json
$ doe generate --config use_cases/12_job_scheduler_packing/config.json --output results/run.sh --seed 42
$ bash results/run.sh
$ doe analyze --config use_cases/12_job_scheduler_packing/config.json
$ doe optimize --config use_cases/12_job_scheduler_packing/config.json
$ doe report --config use_cases/12_job_scheduler_packing/config.json --output report.html
Analysis Results
Generated from actual experiment runs.
Response: throughput
Pareto Chart
Main Effects Plot
Response: efficiency
Pareto Chart
Main Effects Plot
Response Surface Plots
3D surfaces fitted with quadratic RSM. Red dots are observed data points.
📊
How to Read These Surfaces
Each plot shows predicted response (vertical axis) across two factors while other factors are held at center. Red dots are actual experimental observations.
- Flat surface — these two factors have little effect on the response.
- Tilted plane — strong linear effect; moving along one axis consistently changes the response.
- Curved/domed surface — quadratic curvature; there is an optimum somewhere in the middle.
- Saddle shape — significant interaction; the best setting of one factor depends on the other.
- Red dots far from surface — poor model fit in that region; be cautious about predictions there.
throughput (jobs/h) — R² = 0.653, Adj R² = 0.393
Moderate fit — surface shows general trends but some noise remains.
Curvature detected in tasks_per_node, nodes — look for a peak or valley in the surface.
Strongest linear driver: nodes (decreases throughput).
Notable interaction: nodes × tasks_per_node — the effect of one depends on the level of the other. Look for a twisted surface.
efficiency (%) — R² = 0.785, Adj R² = 0.624
Moderate fit — surface shows general trends but some noise remains.
Curvature detected in tasks_per_node, mem_per_task — look for a peak or valley in the surface.
Strongest linear driver: nodes (increases efficiency).
Notable interaction: nodes × tasks_per_node — the effect of one depends on the level of the other. Look for a twisted surface.
efficiency: nodes vs mem per task
efficiency: nodes vs tasks per node
efficiency: tasks per node vs mem per task
throughput: nodes vs mem per task
throughput: nodes vs tasks per node
throughput: tasks per node vs mem per task
Full Analysis Output
=== Main Effects: throughput ===
Factor Effect Std Error % Contribution
--------------------------------------------------------------
tasks_per_node 254.1600 14.8504 55.2%
nodes 120.8200 14.8504 26.2%
mem_per_task 85.4775 14.8504 18.6%
=== Summary Statistics: throughput ===
nodes:
Level N Mean Std Min Max
------------------------------------------------------------
-20.7723 1 5.0000 0.0000 5.0000 5.0000
34 12 85.3292 78.5758 7.4600 261.6200
4 4 91.3550 82.6748 7.7300 182.2200
64 4 109.5325 32.5750 60.6700 125.8200
88.7723 1 125.8200 0.0000 125.8200 125.8200
tasks_per_node:
Level N Mean Std Min Max
------------------------------------------------------------
-8.51484 1 7.4600 0.0000 7.4600 7.4600
28 12 73.8075 58.2104 5.0000 137.1600
48 4 117.6150 60.1993 36.6000 182.2200
64.5148 1 261.6200 0.0000 261.6200 261.6200
8 4 83.2725 60.8794 7.7300 138.8700
mem_per_task:
Level N Mean Std Min Max
------------------------------------------------------------
-1.8901 1 62.3800 0.0000 62.3800 62.3800
1 4 57.7050 50.3036 7.7300 125.8200
10.8901 1 125.8200 0.0000 125.8200 125.8200
4.5 12 80.5475 81.7799 5.0000 261.6200
8 4 143.1825 26.7422 125.8200 182.2200
=== Main Effects: efficiency ===
Factor Effect Std Error % Contribution
--------------------------------------------------------------
mem_per_task 24.5550 2.6532 35.0%
tasks_per_node 23.0325 2.6532 32.9%
nodes 22.4800 2.6532 32.1%
=== Summary Statistics: efficiency ===
nodes:
Level N Mean Std Min Max
------------------------------------------------------------
-20.7723 1 49.3900 0.0000 49.3900 49.3900
34 12 67.2883 12.6911 47.3400 86.9300
4 4 71.3800 14.7700 56.8400 84.8400
64 4 66.0150 11.7100 48.4500 71.8700
88.7723 1 71.8700 0.0000 71.8700 71.8700
tasks_per_node:
Level N Mean Std Min Max
------------------------------------------------------------
-8.51484 1 67.0100 0.0000 67.0100 67.0100
28 12 67.8133 12.5615 47.3400 86.9300
48 4 70.9825 10.8711 56.8400 83.3500
64.5148 1 47.9500 0.0000 47.9500 47.9500
8 4 66.4125 15.5680 48.4500 84.8400
mem_per_task:
Level N Mean Std Min Max
------------------------------------------------------------
-1.8901 1 47.3400 0.0000 47.3400 47.3400
1 4 65.5000 16.1277 48.4500 84.8400
10.8901 1 71.8700 0.0000 71.8700 71.8700
4.5 12 67.4592 12.4089 47.9500 86.9300
8 4 71.8950 9.3326 60.4900 83.3500
Optimization Recommendations
=== Optimization: throughput ===
Direction: maximize
Best observed run: #10
nodes = 88.7723
tasks_per_node = 28
mem_per_task = 4.5
Value: 261.62
RSM Model (linear, R² = 0.09):
Coefficients:
intercept: +89.0145
nodes: +21.1838
tasks_per_node: +12.8323
mem_per_task: +3.8512
Predicted optimum:
nodes = 88.7723
tasks_per_node = 28
mem_per_task = 4.5
Predicted value: 127.6907
Factor importance:
1. nodes (effect: 221.1, contribution: 55.2%)
2. mem_per_task (effect: 93.3, contribution: 23.3%)
3. tasks_per_node (effect: 86.2, contribution: 21.5%)
=== Optimization: efficiency ===
Direction: maximize
Best observed run: #3
nodes = 4
tasks_per_node = 48
mem_per_task = 1
Value: 86.93
RSM Model (linear, R² = 0.17):
Coefficients:
intercept: +67.1955
nodes: -5.7499
tasks_per_node: +1.4127
mem_per_task: -1.6230
Predicted optimum:
nodes = -20.7723
tasks_per_node = 28
mem_per_task = 4.5
Predicted value: 77.6933
Factor importance:
1. nodes (effect: 25.3, contribution: 41.0%)
2. tasks_per_node (effect: 23.4, contribution: 37.9%)
3. mem_per_task (effect: 13.1, contribution: 21.1%)
Multi-Objective Optimization
When responses compete, Derringer–Suich desirability finds the best compromise.
Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.
Overall Desirability
D = 0.7468
Per-Response Desirability
| Response | Weight | Desirability | Predicted | Dir |
throughput |
1.5 |
|
182.22 0.6733 182.22 jobs/h |
↑ |
efficiency |
1.0 |
|
83.35 0.8723 83.35 % |
↑ |
Recommended Settings
| Factor | Value |
nodes | 4 count |
tasks_per_node | 48 count |
mem_per_task | 8 GB |
Source: from observed run #7
Trade-off Summary
Sacrifice = how much worse than single-objective best.
| Response | Predicted | Best Observed | Sacrifice |
efficiency | 83.35 | 86.93 | +3.58 |
Top 3 Runs by Desirability
| Run | D | Factor Settings |
| #1 | 0.5235 | nodes=64, tasks_per_node=8, mem_per_task=8 |
| #5 | 0.5235 | nodes=88.7723, tasks_per_node=28, mem_per_task=4.5 |
Model Quality
| Response | R² | Type |
efficiency | 0.0177 | linear |
Full Multi-Objective Output
============================================================
MULTI-OBJECTIVE OPTIMIZATION
Method: Derringer-Suich Desirability Function
============================================================
Overall desirability: D = 0.7468
Response Weight Desirability Predicted Direction
---------------------------------------------------------------------
throughput 1.5 0.6733 182.22 jobs/h ↑
efficiency 1.0 0.8723 83.35 % ↑
Recommended settings:
nodes = 4 count
tasks_per_node = 48 count
mem_per_task = 8 GB
(from observed run #7)
Trade-off summary:
throughput: 182.22 (best observed: 261.62, sacrifice: +79.40)
efficiency: 83.35 (best observed: 86.93, sacrifice: +3.58)
Model quality:
throughput: R² = 0.1197 (linear)
efficiency: R² = 0.0177 (linear)
Top 3 observed runs by overall desirability:
1. Run #7 (D=0.7468): nodes=4, tasks_per_node=48, mem_per_task=8
2. Run #1 (D=0.5235): nodes=64, tasks_per_node=8, mem_per_task=8
3. Run #5 (D=0.5235): nodes=88.7723, tasks_per_node=28, mem_per_task=4.5