Job Scheduler Packing — DOE Use Case

Summary

This experiment investigates job scheduler packing optimization. Optimize HPC job scheduler packing parameters to maximize throughput and resource efficiency.

The design varies 3 factors: nodes (count), ranging from 4 to 64, tasks per node (count), ranging from 8 to 48, and mem per task (GB), ranging from 1 to 8. The goal is to optimize 2 responses: throughput (jobs/h) (maximize) and efficiency (%) (maximize).

A Central Composite Design (CCD) was selected to fit a full quadratic response surface model, including curvature and interaction effects. With 3 factors this produces 22 runs including center points and axial (star) points that extend beyond the factorial range.

Quadratic response surface models were fitted to capture potential curvature and factor interactions. The RSM contour plots below visualize how pairs of factors jointly affect each response.

Key Findings

For throughput, the most influential factors were tasks per node (34.4%), mem per task (34.0%), nodes (31.7%). The best observed value was 261.62 (at nodes = 64, tasks per node = 48, mem per task = 1).

For efficiency, the most influential factors were tasks per node (40.1%), nodes (37.1%), mem per task (22.8%). The best observed value was 86.93 (at nodes = 34, tasks per node = 28, mem per task = 4.5).

Recommended Next Steps

Run confirmation experiments at the predicted optimal settings to validate the model.
Consider whether any fixed factors should be varied in a future study.

Experimental Setup

Factors

Factor	Levels	Type	Unit
`nodes`	4, 64	continuous	count
`tasks_per_node`	8, 48	continuous	count
`mem_per_task`	1, 8	continuous	GB

Fixed: none

Responses

Response	Direction	Unit
`throughput`	↑ maximize	jobs/h
`efficiency`	↑ maximize	%

Experimental Matrix

The Central Composite Design produces 22 runs. Each row is one experiment with specific factor settings.

Run	`nodes`	`tasks_per_node`	`mem_per_task`
1	34	28	4.5
2	64	8	8
3	4	48	1
4	34	64.5148	4.5
5	34	28	4.5
6	-20.7723	28	4.5
7	34	28	-1.8901
8	34	28	4.5
9	64	48	1
10	88.7723	28	4.5
11	34	28	4.5
12	34	-8.51484	4.5
13	34	28	4.5
14	4	8	8
15	34	28	4.5
16	64	8	1
17	34	28	10.8901
18	64	48	8
19	34	28	4.5
20	4	8	1
21	4	48	8
22	34	28	4.5

How to Run

terminal
$ doe info --config use_cases/12_job_scheduler_packing/config.json
$ doe generate --config use_cases/12_job_scheduler_packing/config.json --output results/run.sh --seed 42
$ bash results/run.sh
$ doe analyze --config use_cases/12_job_scheduler_packing/config.json
$ doe optimize --config use_cases/12_job_scheduler_packing/config.json
$ doe report --config use_cases/12_job_scheduler_packing/config.json --output report.html

Analysis Results

Generated from actual experiment runs.

Response: throughput

Pareto Chart

Main Effects Plot

Response: efficiency

Pareto Chart

Main Effects Plot

Response Surface Plots

3D surfaces fitted with quadratic RSM. Red dots are observed data points.

📊

How to Read These Surfaces

Each plot shows predicted response (vertical axis) across two factors while other factors are held at center. Red dots are actual experimental observations.

Flat surface — these two factors have little effect on the response.
Tilted plane — strong linear effect; moving along one axis consistently changes the response.
Curved/domed surface — quadratic curvature; there is an optimum somewhere in the middle.
Saddle shape — significant interaction; the best setting of one factor depends on the other.
Red dots far from surface — poor model fit in that region; be cautious about predictions there.

throughput (jobs/h) — R² = 0.653, Adj R² = 0.393
Moderate fit — surface shows general trends but some noise remains.
Curvature detected in tasks_per_node, nodes — look for a peak or valley in the surface.
Strongest linear driver: nodes (decreases throughput).
Notable interaction: nodes × tasks_per_node — the effect of one depends on the level of the other. Look for a twisted surface.

efficiency (%) — R² = 0.785, Adj R² = 0.624
Moderate fit — surface shows general trends but some noise remains.
Curvature detected in tasks_per_node, mem_per_task — look for a peak or valley in the surface.
Strongest linear driver: nodes (increases efficiency).
Notable interaction: nodes × tasks_per_node — the effect of one depends on the level of the other. Look for a twisted surface.

efficiency: nodes vs mem per task

efficiency: nodes vs tasks per node

efficiency: tasks per node vs mem per task

throughput: nodes vs mem per task

throughput: nodes vs tasks per node

throughput: tasks per node vs mem per task

Full Analysis Output

doe analyze
=== Main Effects: throughput ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
tasks_per_node         254.1600      14.8504            55.2%
nodes                  120.8200      14.8504            26.2%
mem_per_task            85.4775      14.8504            18.6%

=== Summary Statistics: throughput ===

nodes:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  -20.7723            1     5.0000     0.0000     5.0000     5.0000
  34                 12    85.3292    78.5758     7.4600   261.6200
  4                   4    91.3550    82.6748     7.7300   182.2200
  64                  4   109.5325    32.5750    60.6700   125.8200
  88.7723             1   125.8200     0.0000   125.8200   125.8200

tasks_per_node:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  -8.51484            1     7.4600     0.0000     7.4600     7.4600
  28                 12    73.8075    58.2104     5.0000   137.1600
  48                  4   117.6150    60.1993    36.6000   182.2200
  64.5148             1   261.6200     0.0000   261.6200   261.6200
  8                   4    83.2725    60.8794     7.7300   138.8700

mem_per_task:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  -1.8901             1    62.3800     0.0000    62.3800    62.3800
  1                   4    57.7050    50.3036     7.7300   125.8200
  10.8901             1   125.8200     0.0000   125.8200   125.8200
  4.5                12    80.5475    81.7799     5.0000   261.6200
  8                   4   143.1825    26.7422   125.8200   182.2200

=== Main Effects: efficiency ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
mem_per_task            24.5550       2.6532            35.0%
tasks_per_node          23.0325       2.6532            32.9%
nodes                   22.4800       2.6532            32.1%

=== Summary Statistics: efficiency ===

nodes:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  -20.7723            1    49.3900     0.0000    49.3900    49.3900
  34                 12    67.2883    12.6911    47.3400    86.9300
  4                   4    71.3800    14.7700    56.8400    84.8400
  64                  4    66.0150    11.7100    48.4500    71.8700
  88.7723             1    71.8700     0.0000    71.8700    71.8700

tasks_per_node:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  -8.51484            1    67.0100     0.0000    67.0100    67.0100
  28                 12    67.8133    12.5615    47.3400    86.9300
  48                  4    70.9825    10.8711    56.8400    83.3500
  64.5148             1    47.9500     0.0000    47.9500    47.9500
  8                   4    66.4125    15.5680    48.4500    84.8400

mem_per_task:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  -1.8901             1    47.3400     0.0000    47.3400    47.3400
  1                   4    65.5000    16.1277    48.4500    84.8400
  10.8901             1    71.8700     0.0000    71.8700    71.8700
  4.5                12    67.4592    12.4089    47.9500    86.9300
  8                   4    71.8950     9.3326    60.4900    83.3500

Optimization Recommendations

doe optimize
=== Optimization: throughput ===
Direction: maximize

Best observed run: #10
  nodes = 88.7723
  tasks_per_node = 28
  mem_per_task = 4.5
  Value: 261.62

RSM Model (linear, R² = 0.09):
  Coefficients:
    intercept:  +89.0145
    nodes:  +21.1838
    tasks_per_node:  +12.8323
    mem_per_task:  +3.8512
  Predicted optimum:
    nodes = 88.7723
    tasks_per_node = 28
    mem_per_task = 4.5
    Predicted value: 127.6907

Factor importance:
  1. nodes  (effect: 221.1, contribution: 55.2%)
  2. mem_per_task  (effect: 93.3, contribution: 23.3%)
  3. tasks_per_node  (effect: 86.2, contribution: 21.5%)

=== Optimization: efficiency ===
Direction: maximize

Best observed run: #3
  nodes = 4
  tasks_per_node = 48
  mem_per_task = 1
  Value: 86.93

RSM Model (linear, R² = 0.17):
  Coefficients:
    intercept:  +67.1955
    nodes:  -5.7499
    tasks_per_node:  +1.4127
    mem_per_task:  -1.6230
  Predicted optimum:
    nodes = -20.7723
    tasks_per_node = 28
    mem_per_task = 4.5
    Predicted value: 77.6933

Factor importance:
  1. nodes  (effect: 25.3, contribution: 41.0%)
  2. tasks_per_node  (effect: 23.4, contribution: 37.9%)
  3. mem_per_task  (effect: 13.1, contribution: 21.1%)

Multi-Objective Optimization

When responses compete, Derringer–Suich desirability finds the best compromise. Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.

Overall Desirability

D = 0.7468

Per-Response Desirability

Response	Weight	Desirability	Predicted	Dir
`throughput`	1.5	0.6733	182.22 0.6733 182.22 jobs/h	↑
`efficiency`	1.0	0.8723	83.35 0.8723 83.35 %	↑

Recommended Settings

Factor	Value
`nodes`	4 count
`tasks_per_node`	48 count
`mem_per_task`	8 GB

Source: from observed run #7

Trade-off Summary

Sacrifice = how much worse than single-objective best.

Response	Predicted	Best Observed	Sacrifice
`efficiency`	83.35	86.93	+3.58

Top 3 Runs by Desirability

Run	D	Factor Settings
#1	0.5235	nodes=64, tasks_per_node=8, mem_per_task=8
#5	0.5235	nodes=88.7723, tasks_per_node=28, mem_per_task=4.5

Model Quality

Response	R²	Type
`efficiency`	0.0177	linear

Full Multi-Objective Output

doe optimize --multi
============================================================
MULTI-OBJECTIVE OPTIMIZATION
Method: Derringer-Suich Desirability Function
============================================================

Overall desirability: D = 0.7468

Response                  Weight Desirability    Predicted  Direction
---------------------------------------------------------------------
throughput                   1.5       0.6733      182.22 jobs/h   ↑
efficiency                   1.0       0.8723       83.35 %   ↑

Recommended settings:
  nodes = 4 count
  tasks_per_node = 48 count
  mem_per_task = 8 GB
  (from observed run #7)

Trade-off summary:
  throughput: 182.22 (best observed: 261.62, sacrifice: +79.40)
  efficiency: 83.35 (best observed: 86.93, sacrifice: +3.58)

Model quality:
  throughput: R² = 0.1197 (linear)
  efficiency: R² = 0.0177 (linear)

Top 3 observed runs by overall desirability:
  1. Run #7 (D=0.7468): nodes=4, tasks_per_node=48, mem_per_task=8
  2. Run #1 (D=0.5235): nodes=64, tasks_per_node=8, mem_per_task=8
  3. Run #5 (D=0.5235): nodes=88.7723, tasks_per_node=28, mem_per_task=4.5