← All Use Cases
🧱
Latin Hypercube

Cache Blocking Strategy

Explore DGEMM tile sizes and prefetch distances for peak cache performance.

Summary

This experiment investigates cache blocking strategy optimization. Optimize cache blocking tile sizes and prefetch distance for DGEMM on a 4096x4096 matrix.

The design varies 4 factors: block i (elements), ranging from 16 to 256, block j (elements), ranging from 16 to 256, block k (elements), ranging from 16 to 256, and prefetch dist (iterations), ranging from 1 to 8. The goal is to optimize 2 responses: gflops (GFLOPS) (maximize) and cache miss rate (%) (maximize).

Latin Hypercube Sampling was used to space 25 runs across the 4-dimensional factor space with good coverage and minimal gaps, making it ideal for computer experiments where the response surface may be complex.

Quadratic response surface models were fitted to capture potential curvature and factor interactions. The RSM contour plots below visualize how pairs of factors jointly affect each response.

Key Findings

For gflops, the most influential factors were block i (25.0%), block j (25.0%), block k (25.0%). The best observed value was 52.02 (at block i = 37.0728, block j = 93.3655, block k = 57.6266).

For cache miss rate, the most influential factors were block i (25.0%), block j (25.0%), block k (25.0%). The best observed value was 9.28 (at block i = 125.8, block j = 23.4436, block k = 84.1187).

Recommended Next Steps

Experimental Setup

Factors

FactorLevelsTypeUnit
block_i16, 256continuouselements
block_j16, 256continuouselements
block_k16, 256continuouselements
prefetch_dist1, 8continuousiterations

Fixed: none

Responses

ResponseDirectionUnit
gflops↑ maximizeGFLOPS
cache_miss_rate↑ maximize%

Experimental Matrix

The Latin Hypercube Design produces 25 runs. Each row is one experiment with specific factor settings.

Runblock_iblock_jblock_kprefetch_dist
1125.56593.9219246.996.14317
2253.366135.373109.1131.58733
3191.1372.44283.49994.06585
4205.21791.2761229.2871.99472
523.3792220.39137.45162.3045
631.341733.1503217.3715.67698
7119.71247.9475147.4861.08597
895.380863.238472.22092.57363
9137.114205.738116.8087.94096
1065.498117.0158122.8663.73912
1142.3318180.021196.2297.65567
12148.041105.51230.50676.50236
1386.1808244.233225.7074.9552
14236.364215.429160.4547.22458
1583.0051234.301183.5757.07721
16179.17152.993155.232.774
17211.767247.23101.1466.73868
1845.989441.262253.59123.29855
19107.477179.089205.5763.01821
20160.35127.63244.2194.2807
21238.162121.00657.73575.82436
22187.837194.16182.14654.84304
23219.75874.6757134.2864.61224
24152.99166.88420.58651.38158
2555.4892142.126178.2385.28402

How to Run

terminal
$ doe info --config use_cases/14_cache_blocking/config.json $ doe generate --config use_cases/14_cache_blocking/config.json --output results/run.sh --seed 42 $ bash results/run.sh $ doe analyze --config use_cases/14_cache_blocking/config.json $ doe optimize --config use_cases/14_cache_blocking/config.json $ doe report --config use_cases/14_cache_blocking/config.json --output report.html

Analysis Results

Generated from actual experiment runs.

Response: gflops

Pareto Chart

Pareto chart for gflops

Main Effects Plot

Main effects plot for gflops

Response: cache_miss_rate

Pareto Chart

Pareto chart for cache_miss_rate

Main Effects Plot

Main effects plot for cache_miss_rate

Response Surface Plots

3D surfaces fitted with quadratic RSM. Red dots are observed data points.

📊

How to Read These Surfaces

Each plot shows predicted response (vertical axis) across two factors while other factors are held at center. Red dots are actual experimental observations.

  • Flat surface — these two factors have little effect on the response.
  • Tilted plane — strong linear effect; moving along one axis consistently changes the response.
  • Curved/domed surface — quadratic curvature; there is an optimum somewhere in the middle.
  • Saddle shape — significant interaction; the best setting of one factor depends on the other.
  • Red dots far from surface — poor model fit in that region; be cautious about predictions there.

gflops (GFLOPS) — R² = 0.656, Adj R² = 0.175
Moderate fit — surface shows general trends but some noise remains.
Curvature detected in prefetch_dist, block_i — look for a peak or valley in the surface.
Strongest linear driver: block_i (increases gflops).
Notable interaction: block_i × block_j — the effect of one depends on the level of the other. Look for a twisted surface.

cache_miss_rate (%) — R² = 0.446, Adj R² = -0.329
Weak fit — interpret the surface shape with caution.
Curvature detected in block_k, block_i — look for a peak or valley in the surface.
Strongest linear driver: block_j (increases cache_miss_rate).
Notable interaction: block_i × block_j — the effect of one depends on the level of the other. Look for a twisted surface.

cache: miss rate block i vs block j

RSM surface: cache — miss rate block i vs block j

cache: miss rate block i vs block k

RSM surface: cache — miss rate block i vs block k

cache: miss rate block i vs prefetch dist

RSM surface: cache — miss rate block i vs prefetch dist

cache: miss rate block j vs block k

RSM surface: cache — miss rate block j vs block k

cache: miss rate block j vs prefetch dist

RSM surface: cache — miss rate block j vs prefetch dist

cache: miss rate block k vs prefetch dist

RSM surface: cache — miss rate block k vs prefetch dist

gflops: block i vs block j

RSM surface: gflops — block i vs block j

gflops: block i vs block k

RSM surface: gflops — block i vs block k

gflops: block i vs prefetch dist

RSM surface: gflops — block i vs prefetch dist

gflops: block j vs block k

RSM surface: gflops — block j vs block k

gflops: block j vs prefetch dist

RSM surface: gflops — block j vs prefetch dist

gflops: block k vs prefetch dist

RSM surface: gflops — block k vs prefetch dist

Full Analysis Output

doe analyze
=== Main Effects: gflops === Factor Effect Std Error % Contribution -------------------------------------------------------------- block_i 44.0200 3.0082 25.0% block_j 44.0200 3.0082 25.0% block_k 44.0200 3.0082 25.0% prefetch_dist 44.0200 3.0082 25.0% === Summary Statistics: gflops === block_i: Level N Mean Std Min Max ------------------------------------------------------------ 100.441 1 8.0000 0.0000 8.0000 8.0000 108.716 1 8.0000 0.0000 8.0000 8.0000 117.135 1 8.0000 0.0000 8.0000 8.0000 122.48 1 8.0000 0.0000 8.0000 8.0000 135.282 1 8.0000 0.0000 8.0000 8.0000 146.065 1 46.2000 0.0000 46.2000 46.2000 155.04 1 8.0000 0.0000 8.0000 8.0000 162.672 1 22.7500 0.0000 22.7500 22.7500 176.471 1 22.7700 0.0000 22.7700 22.7700 185.381 1 44.3100 0.0000 44.3100 44.3100 196.569 1 8.0000 0.0000 8.0000 8.0000 205.77 1 14.6600 0.0000 14.6600 14.6600 21.5346 1 8.0000 0.0000 8.0000 8.0000 214.607 1 33.6200 0.0000 33.6200 33.6200 226.543 1 35.5400 0.0000 35.5400 35.5400 233.123 1 8.0000 0.0000 8.0000 8.0000 237.44 1 8.0000 0.0000 8.0000 8.0000 255.346 1 10.5900 0.0000 10.5900 10.5900 35.0991 1 30.3000 0.0000 30.3000 30.3000 38.9592 1 45.8800 0.0000 45.8800 45.8800 49.3449 1 52.0200 0.0000 52.0200 52.0200 59.482 1 28.0500 0.0000 28.0500 28.0500 64.5572 1 8.0000 0.0000 8.0000 8.0000 79.3272 1 21.2100 0.0000 21.2100 21.2100 89.2871 1 33.2500 0.0000 33.2500 33.2500 block_j: Level N Mean Std Min Max ------------------------------------------------------------ 104.313 1 45.8800 0.0000 45.8800 45.8800 118.972 1 30.3000 0.0000 30.3000 30.3000 122.525 1 44.3100 0.0000 44.3100 44.3100 131.734 1 14.6600 0.0000 14.6600 14.6600 146.999 1 8.0000 0.0000 8.0000 8.0000 159.9 1 8.0000 0.0000 8.0000 8.0000 165.97 1 33.2500 0.0000 33.2500 33.2500 173.031 1 8.0000 0.0000 8.0000 8.0000 185.565 1 35.5400 0.0000 35.5400 35.5400 193.468 1 10.5900 0.0000 10.5900 10.5900 205.02 1 8.0000 0.0000 8.0000 8.0000 21.2032 1 8.0000 0.0000 8.0000 8.0000 217.343 1 21.2100 0.0000 21.2100 21.2100 222.59 1 33.6200 0.0000 33.6200 33.6200 236.066 1 8.0000 0.0000 8.0000 8.0000 237.966 1 46.2000 0.0000 46.2000 46.2000 254.692 1 22.7700 0.0000 22.7700 22.7700 33.3359 1 8.0000 0.0000 8.0000 8.0000 38.2403 1 22.7500 0.0000 22.7500 22.7500 53.8312 1 8.0000 0.0000 8.0000 8.0000 63.5083 1 52.0200 0.0000 52.0200 52.0200 64.0667 1 28.0500 0.0000 28.0500 28.0500 75.2314 1 8.0000 0.0000 8.0000 8.0000 91.2207 1 8.0000 0.0000 8.0000 8.0000 98.0826 1 8.0000 0.0000 8.0000 8.0000 block_k: Level N Mean Std Min Max ------------------------------------------------------------ 101.38 1 21.2100 0.0000 21.2100 21.2100 109.289 1 8.0000 0.0000 8.0000 8.0000 118.737 1 22.7700 0.0000 22.7700 22.7700 123.817 1 28.0500 0.0000 28.0500 28.0500 135.075 1 8.0000 0.0000 8.0000 8.0000 146.78 1 35.5400 0.0000 35.5400 35.5400 157.73 1 8.0000 0.0000 8.0000 8.0000 167.406 1 46.2000 0.0000 46.2000 46.2000 175.099 1 45.8800 0.0000 45.8800 45.8800 185.835 1 8.0000 0.0000 8.0000 8.0000 195.306 1 8.0000 0.0000 8.0000 8.0000 205.069 1 33.2500 0.0000 33.2500 33.2500 209.282 1 30.3000 0.0000 30.3000 30.3000 219.528 1 14.6600 0.0000 14.6600 14.6600 227.546 1 8.0000 0.0000 8.0000 8.0000 24.2475 1 8.0000 0.0000 8.0000 8.0000 241.84 1 8.0000 0.0000 8.0000 8.0000 246.554 1 10.5900 0.0000 10.5900 10.5900 27.8374 1 22.7500 0.0000 22.7500 22.7500 38.5433 1 8.0000 0.0000 8.0000 8.0000 52.3275 1 33.6200 0.0000 33.6200 33.6200 62.6205 1 8.0000 0.0000 8.0000 8.0000 71.1996 1 8.0000 0.0000 8.0000 8.0000 82.1852 1 44.3100 0.0000 44.3100 44.3100 84.3081 1 52.0200 0.0000 52.0200 52.0200 prefetch_dist: Level N Mean Std Min Max ------------------------------------------------------------ 1.07742 1 8.0000 0.0000 8.0000 8.0000 1.50185 1 10.5900 0.0000 10.5900 10.5900 1.58273 1 45.8800 0.0000 45.8800 45.8800 1.86147 1 22.7500 0.0000 22.7500 22.7500 2.36858 1 8.0000 0.0000 8.0000 8.0000 2.59996 1 46.2000 0.0000 46.2000 46.2000 2.81946 1 30.3000 0.0000 30.3000 30.3000 3.23597 1 52.0200 0.0000 52.0200 52.0200 3.50345 1 44.3100 0.0000 44.3100 44.3100 3.70061 1 8.0000 0.0000 8.0000 8.0000 4.06586 1 8.0000 0.0000 8.0000 8.0000 4.11329 1 8.0000 0.0000 8.0000 8.0000 4.6244 1 14.6600 0.0000 14.6600 14.6600 4.74691 1 8.0000 0.0000 8.0000 8.0000 5.03241 1 8.0000 0.0000 8.0000 8.0000 5.43876 1 21.2100 0.0000 21.2100 21.2100 5.70389 1 8.0000 0.0000 8.0000 8.0000 5.86451 1 33.2500 0.0000 33.2500 33.2500 6.28784 1 33.6200 0.0000 33.6200 33.6200 6.52731 1 28.0500 0.0000 28.0500 28.0500 6.72222 1 8.0000 0.0000 8.0000 8.0000 6.93657 1 8.0000 0.0000 8.0000 8.0000 7.24071 1 35.5400 0.0000 35.5400 35.5400 7.69683 1 8.0000 0.0000 8.0000 8.0000 7.88867 1 22.7700 0.0000 22.7700 22.7700 === Main Effects: cache_miss_rate === Factor Effect Std Error % Contribution -------------------------------------------------------------- block_i 6.3000 0.2999 25.0% block_j 6.3000 0.2999 25.0% block_k 6.3000 0.2999 25.0% prefetch_dist 6.3000 0.2999 25.0% === Summary Statistics: cache_miss_rate === block_i: Level N Mean Std Min Max ------------------------------------------------------------ 100.441 1 6.3800 0.0000 6.3800 6.3800 108.716 1 6.5300 0.0000 6.5300 6.5300 117.135 1 5.0800 0.0000 5.0800 5.0800 122.48 1 5.0800 0.0000 5.0800 5.0800 135.282 1 6.7100 0.0000 6.7100 6.7100 146.065 1 3.7600 0.0000 3.7600 3.7600 155.04 1 6.7600 0.0000 6.7600 6.7600 162.672 1 5.8500 0.0000 5.8500 5.8500 176.471 1 5.0600 0.0000 5.0600 5.0600 185.381 1 2.9800 0.0000 2.9800 2.9800 196.569 1 5.4300 0.0000 5.4300 5.4300 205.77 1 5.5400 0.0000 5.5400 5.5400 21.5346 1 7.3900 0.0000 7.3900 7.3900 214.607 1 5.0400 0.0000 5.0400 5.0400 226.543 1 4.3100 0.0000 4.3100 4.3100 233.123 1 9.2800 0.0000 9.2800 9.2800 237.44 1 7.8100 0.0000 7.8100 7.8100 255.346 1 3.8500 0.0000 3.8500 3.8500 35.0991 1 3.8500 0.0000 3.8500 3.8500 38.9592 1 3.8700 0.0000 3.8700 3.8700 49.3449 1 4.7700 0.0000 4.7700 4.7700 59.482 1 5.4500 0.0000 5.4500 5.4500 64.5572 1 7.3300 0.0000 7.3300 7.3300 79.3272 1 4.0900 0.0000 4.0900 4.0900 89.2871 1 5.0300 0.0000 5.0300 5.0300 block_j: Level N Mean Std Min Max ------------------------------------------------------------ 104.313 1 3.8700 0.0000 3.8700 3.8700 118.972 1 3.8500 0.0000 3.8500 3.8500 122.525 1 2.9800 0.0000 2.9800 2.9800 131.734 1 5.5400 0.0000 5.5400 5.5400 146.999 1 5.4300 0.0000 5.4300 5.4300 159.9 1 5.0800 0.0000 5.0800 5.0800 165.97 1 5.0300 0.0000 5.0300 5.0300 173.031 1 7.3900 0.0000 7.3900 7.3900 185.565 1 4.3100 0.0000 4.3100 4.3100 193.468 1 3.8500 0.0000 3.8500 3.8500 205.02 1 6.7600 0.0000 6.7600 6.7600 21.2032 1 7.3300 0.0000 7.3300 7.3300 217.343 1 4.0900 0.0000 4.0900 4.0900 222.59 1 5.0400 0.0000 5.0400 5.0400 236.066 1 5.0800 0.0000 5.0800 5.0800 237.966 1 3.7600 0.0000 3.7600 3.7600 254.692 1 5.0600 0.0000 5.0600 5.0600 33.3359 1 7.8100 0.0000 7.8100 7.8100 38.2403 1 5.8500 0.0000 5.8500 5.8500 53.8312 1 6.7100 0.0000 6.7100 6.7100 63.5083 1 4.7700 0.0000 4.7700 4.7700 64.0667 1 5.4500 0.0000 5.4500 5.4500 75.2314 1 6.3800 0.0000 6.3800 6.3800 91.2207 1 9.2800 0.0000 9.2800 9.2800 98.0826 1 6.5300 0.0000 6.5300 6.5300 block_k: Level N Mean Std Min Max ------------------------------------------------------------ 101.38 1 4.0900 0.0000 4.0900 4.0900 109.289 1 9.2800 0.0000 9.2800 9.2800 118.737 1 5.0600 0.0000 5.0600 5.0600 123.817 1 5.4500 0.0000 5.4500 5.4500 135.075 1 7.3900 0.0000 7.3900 7.3900 146.78 1 4.3100 0.0000 4.3100 4.3100 157.73 1 5.0800 0.0000 5.0800 5.0800 167.406 1 3.7600 0.0000 3.7600 3.7600 175.099 1 3.8700 0.0000 3.8700 3.8700 185.835 1 6.7600 0.0000 6.7600 6.7600 195.306 1 6.3800 0.0000 6.3800 6.3800 205.069 1 5.0300 0.0000 5.0300 5.0300 209.282 1 3.8500 0.0000 3.8500 3.8500 219.528 1 5.5400 0.0000 5.5400 5.5400 227.546 1 7.8100 0.0000 7.8100 7.8100 24.2475 1 5.0800 0.0000 5.0800 5.0800 241.84 1 6.7100 0.0000 6.7100 6.7100 246.554 1 3.8500 0.0000 3.8500 3.8500 27.8374 1 5.8500 0.0000 5.8500 5.8500 38.5433 1 5.4300 0.0000 5.4300 5.4300 52.3275 1 5.0400 0.0000 5.0400 5.0400 62.6205 1 6.5300 0.0000 6.5300 6.5300 71.1996 1 7.3300 0.0000 7.3300 7.3300 82.1852 1 2.9800 0.0000 2.9800 2.9800 84.3081 1 4.7700 0.0000 4.7700 4.7700 prefetch_dist: Level N Mean Std Min Max ------------------------------------------------------------ 1.07742 1 7.3300 0.0000 7.3300 7.3300 1.50185 1 3.8500 0.0000 3.8500 3.8500 1.58273 1 3.8700 0.0000 3.8700 3.8700 1.86147 1 5.8500 0.0000 5.8500 5.8500 2.36858 1 6.3800 0.0000 6.3800 6.3800 2.59996 1 3.7600 0.0000 3.7600 3.7600 2.81946 1 3.8500 0.0000 3.8500 3.8500 3.23597 1 4.7700 0.0000 4.7700 4.7700 3.50345 1 2.9800 0.0000 2.9800 2.9800 3.70061 1 7.8100 0.0000 7.8100 7.8100 4.06586 1 6.7600 0.0000 6.7600 6.7600 4.11329 1 5.0800 0.0000 5.0800 5.0800 4.6244 1 5.5400 0.0000 5.5400 5.5400 4.74691 1 5.0800 0.0000 5.0800 5.0800 5.03241 1 7.3900 0.0000 7.3900 7.3900 5.43876 1 4.0900 0.0000 4.0900 4.0900 5.70389 1 9.2800 0.0000 9.2800 9.2800 5.86451 1 5.0300 0.0000 5.0300 5.0300 6.28784 1 5.0400 0.0000 5.0400 5.0400 6.52731 1 5.4500 0.0000 5.4500 5.4500 6.72222 1 6.5300 0.0000 6.5300 6.5300 6.93657 1 5.4300 0.0000 5.4300 5.4300 7.24071 1 4.3100 0.0000 4.3100 4.3100 7.69683 1 6.7100 0.0000 6.7100 6.7100 7.88867 1 5.0600 0.0000 5.0600 5.0600

Optimization Recommendations

doe optimize
=== Optimization: gflops === Direction: maximize Best observed run: #25 block_i = 178.763 block_j = 159.836 block_k = 200.864 prefetch_dist = 4.24895 Value: 52.02 RSM Model (linear, R² = 0.02): Coefficients: intercept: +21.1777 block_i: +0.5849 block_j: -0.8405 block_k: -0.9666 prefetch_dist: +3.7788 Predicted optimum: block_i = 153.05 block_j = 40.55 block_k = 39.4576 prefetch_dist = 6.72165 Predicted value: 25.1056 Factor importance: 1. block_i (effect: 44.0, contribution: 25.0%) 2. block_j (effect: 44.0, contribution: 25.0%) 3. block_k (effect: 44.0, contribution: 25.0%) 4. prefetch_dist (effect: 44.0, contribution: 25.0%) === Optimization: cache_miss_rate === Direction: maximize Best observed run: #24 block_i = 65.4257 block_j = 235.986 block_k = 80.5579 prefetch_dist = 2.06055 Value: 9.28 RSM Model (linear, R² = 0.15): Coefficients: intercept: +5.4943 block_i: +0.4046 block_j: +0.4066 block_k: -0.4976 prefetch_dist: -0.4461 Predicted optimum: block_i = 226.979 block_j = 196.757 block_k = 93.4081 prefetch_dist = 2.71501 Predicted value: 6.4110 Factor importance: 1. block_i (effect: 6.3, contribution: 25.0%) 2. block_j (effect: 6.3, contribution: 25.0%) 3. block_k (effect: 6.3, contribution: 25.0%) 4. prefetch_dist (effect: 6.3, contribution: 25.0%)

Multi-Objective Optimization

When responses compete, Derringer–Suich desirability finds the best compromise. Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.

Overall Desirability
D = 0.7543

Per-Response Desirability

ResponseWeightDesirabilityPredictedDir
gflops 1.5
1.0000
57.91 1.0000 57.91 GFLOPS
cache_miss_rate 1.0
0.4941
6.09 0.4941 6.09 %

Recommended Settings

FactorValue
block_i251.5 elements
block_j228.5 elements
block_k202.9 elements
prefetch_dist1.243 iterations

Source: from RSM model prediction

Trade-off Summary

Sacrifice = how much worse than single-objective best.

ResponsePredictedBest ObservedSacrifice
cache_miss_rate6.099.28+3.19

Top 3 Runs by Desirability

RunDFactor Settings
#140.4673block_i=153.207, block_j=70.2871, block_k=171.652, prefetch_dist=2.17463
#160.4628block_i=181.498, block_j=131.038, block_k=247.396, prefetch_dist=4.1246

Model Quality

ResponseType
cache_miss_rate0.0618linear

Full Multi-Objective Output

doe optimize --multi
============================================================ MULTI-OBJECTIVE OPTIMIZATION Method: Derringer-Suich Desirability Function ============================================================ Overall desirability: D = 0.7543 Response Weight Desirability Predicted Direction --------------------------------------------------------------------- gflops 1.5 1.0000 57.91 GFLOPS ↑ cache_miss_rate 1.0 0.4941 6.09 % ↑ Recommended settings: block_i = 251.5 elements block_j = 228.5 elements block_k = 202.9 elements prefetch_dist = 1.243 iterations (from RSM model prediction) Trade-off summary: gflops: 57.91 (best observed: 52.02, sacrifice: -5.89) cache_miss_rate: 6.09 (best observed: 9.28, sacrifice: +3.19) Model quality: gflops: R² = 0.6796 (quadratic) cache_miss_rate: R² = 0.0618 (linear) Top 3 observed runs by overall desirability: 1. Run #25 (D=0.6038): block_i=42.5109, block_j=54.9322, block_k=105.318, prefetch_dist=4.39188 2. Run #14 (D=0.4673): block_i=153.207, block_j=70.2871, block_k=171.652, prefetch_dist=2.17463 3. Run #16 (D=0.4628): block_i=181.498, block_j=131.038, block_k=247.396, prefetch_dist=4.1246
← Compiler Optimization Flags Distributed Deep Learning →