NUMA Memory Placement — DOE Use Case

Summary

This experiment investigates numa memory placement. Full factorial design to optimize memory placement and thread binding on a dual-socket NUMA system.

The design varies 3 factors: mem policy, ranging from local to interleave, thread bind, ranging from close to spread, and hugepages, ranging from off to 2M. The goal is to optimize 2 responses: stream triad (GB/s) (maximize) and latency ns (ns) (minimize). Fixed conditions held constant across all runs include sockets = 2, cores per socket = 32.

A full factorial design was used to explore all 8 possible combinations of the 3 factors at two levels. This guarantees that every main effect and interaction can be estimated independently, at the cost of a larger experiment (8 runs).

Key Findings

For stream triad, the most influential factors were thread bind (49.7%), mem policy (26.1%), hugepages (24.2%). The best observed value was 293.9 (at mem policy = interleave, thread bind = spread, hugepages = off).

For latency ns, the most influential factors were thread bind (45.6%), mem policy (31.6%), hugepages (22.8%). The best observed value was 34.0 (at mem policy = local, thread bind = spread, hugepages = 2M).

Recommended Next Steps

Consider whether any fixed factors should be varied in a future study.

Experimental Setup

Factors

Factor	Levels	Type
`mem_policy`	local, interleave	categorical
`thread_bind`	close, spread	categorical
`hugepages`	off, 2M	categorical

Fixed: sockets=2, cores_per_socket=32

Responses

Response	Direction	Unit
`stream_triad`	↑ maximize	GB/s
`latency_ns`	↓ minimize	ns

Experimental Matrix

The Full Factorial Design produces 8 runs. Each row is one experiment with specific factor settings.

Run	`mem_policy`	`thread_bind`	`hugepages`
1	local	spread	2M
2	interleave	close	off
3	interleave	spread	off
4	interleave	spread	2M
5	local	spread	off
6	interleave	close	2M
7	local	close	off
8	local	close	2M

How to Run

terminal
$ doe info --config use_cases/10_numa_memory_placement/config.json
$ doe generate --config use_cases/10_numa_memory_placement/config.json --output results/run.sh --seed 42
$ bash results/run.sh
$ doe analyze --config use_cases/10_numa_memory_placement/config.json
$ doe optimize --config use_cases/10_numa_memory_placement/config.json
$ doe optimize --config use_cases/10_numa_memory_placement/config.json --multi  # multi-objective
$ doe report --config use_cases/10_numa_memory_placement/config.json --output report.html

Analysis Results

Generated from actual experiment runs.

Response: stream_triad

Pareto Chart

Main Effects Plot

Response: latency_ns

Pareto Chart

Main Effects Plot

Full Analysis Output

doe analyze
=== Main Effects: stream_triad ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
mem_policy              35.7750      11.1421            51.4%
thread_bind             26.2250      11.1421            37.7%
hugepages               -7.5750      11.1421            10.9%

=== Interaction Effects: stream_triad ===
Factor A             Factor B              Interaction   % Contribution
------------------------------------------------------------------------
mem_policy           thread_bind               22.0250            54.5%
mem_policy           hugepages                  9.4250            23.3%
thread_bind          hugepages                  8.9750            22.2%

=== Summary Statistics: stream_triad ===

mem_policy:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  interleave          4   225.3000    23.7380   196.1000   250.3000
  local               4   261.0750    30.0114   226.4000   293.9000

thread_bind:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  close               4   230.0750    25.0366   196.1000   250.3000
  spread              4   256.3000    35.1010   217.3000   293.9000

hugepages:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  2M                  4   246.9750    34.2391   217.3000   293.9000
  off                 4   239.4000    33.2692   196.1000   276.5000

=== Main Effects: latency_ns ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
mem_policy              14.0250       5.1510            51.8%
thread_bind              8.5250       5.1510            31.5%
hugepages                4.5250       5.1510            16.7%

=== Interaction Effects: latency_ns ===
Factor A             Factor B              Interaction   % Contribution
------------------------------------------------------------------------
thread_bind          hugepages                 16.5250            53.8%
mem_policy           thread_bind                9.2250            30.0%
mem_policy           hugepages                 -4.9750            16.2%

=== Summary Statistics: latency_ns ===

mem_policy:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  interleave          4    51.5500    15.3921    34.0000    68.4000
  local               4    65.5750    11.2796    52.4000    78.3000

thread_bind:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  close               4    54.3000     7.7158    44.2000    61.0000
  spread              4    62.8250    19.6798    34.0000    78.3000

hugepages:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  2M                  4    56.3000    15.6499    34.0000    70.6000
  off                 4    60.8250    15.3854    44.2000    78.3000

Optimization Recommendations

doe optimize
=== Optimization: stream_triad ===
Direction: maximize

Best observed run: #4
  mem_policy = interleave
  thread_bind = spread
  hugepages = 2M
  Value: 293.9

RSM Model (linear, R² = 0.20):
  Coefficients:
    intercept:  +243.1875
    mem_policy:  -0.5625
    thread_bind:  +11.3375
    hugepages:  -6.5625
  Predicted optimum:
    mem_policy = interleave
    thread_bind = spread
    hugepages = 2M
    Predicted value: 261.6500

Factor importance:
  1. thread_bind  (effect: 22.7, contribution: 61.4%)
  2. hugepages  (effect: -13.1, contribution: 35.5%)
  3. mem_policy  (effect: -1.1, contribution: 3.0%)

=== Optimization: latency_ns ===
Direction: minimize

Best observed run: #8
  mem_policy = local
  thread_bind = close
  hugepages = 2M
  Value: 34.0

RSM Model (linear, R² = 0.03):
  Coefficients:
    intercept:  +58.5625
    mem_policy:  -0.3375
    thread_bind:  +2.3375
    hugepages:  +0.4125
  Predicted optimum:
    mem_policy = interleave
    thread_bind = spread
    hugepages = off
    Predicted value: 61.6500

Factor importance:
  1. thread_bind  (effect: 4.7, contribution: 75.7%)
  2. hugepages  (effect: 0.8, contribution: 13.4%)
  3. mem_policy  (effect: -0.7, contribution: 10.9%)

Multi-Objective Optimization

When responses compete, Derringer–Suich desirability finds the best compromise. Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.

Overall Desirability

D = 0.5441

Per-Response Desirability

Response	Weight	Desirability	Predicted	Dir
`stream_triad`	1.5	0.5232	247.50 0.5232 247.50 GB/s	↑
`latency_ns`	1.0	0.5770	52.40 0.5770 52.40 ns	↓

Recommended Settings

Factor	Value
`mem_policy`	local
`thread_bind`	spread
`hugepages`	2M

Source: from observed run #1

Trade-off Summary

Sacrifice = how much worse than single-objective best.

Response	Predicted	Best Observed	Sacrifice
`latency_ns`	52.40	34.00	+18.40

Top 3 Runs by Desirability

Run	D	Factor Settings
#4	0.5144	mem_policy=interleave, thread_bind=close, hugepages=off
#6	0.4977	mem_policy=local, thread_bind=close, hugepages=off

Model Quality

Response	R²	Type
`latency_ns`	0.1848	linear

Full Multi-Objective Output

doe optimize --multi
============================================================
MULTI-OBJECTIVE OPTIMIZATION
Method: Derringer-Suich Desirability Function
============================================================

Overall desirability: D = 0.5441

Response                  Weight Desirability    Predicted  Direction
---------------------------------------------------------------------
stream_triad                 1.5       0.5232      247.50 GB/s   ↑
latency_ns                   1.0       0.5770       52.40 ns   ↓

Recommended settings:
  mem_policy = local
  thread_bind = spread
  hugepages = 2M
  (from observed run #1)

Trade-off summary:
  stream_triad: 247.50 (best observed: 293.90, sacrifice: +46.40)
  latency_ns: 52.40 (best observed: 34.00, sacrifice: +18.40)

Model quality:
  stream_triad: R² = 0.0475 (linear)
  latency_ns: R² = 0.1848 (linear)

Top 3 observed runs by overall desirability:
  1. Run #1 (D=0.5441): mem_policy=local, thread_bind=spread, hugepages=2M
  2. Run #4 (D=0.5144): mem_policy=interleave, thread_bind=close, hugepages=off
  3. Run #6 (D=0.4977): mem_policy=local, thread_bind=close, hugepages=off