Database Performance Tuning

Summary

This experiment investigates database performance tuning. Plackett-Burman screening of 6 PostgreSQL parameters with 2 replicate blocks.

The design varies 6 factors: shared buffers (MB), ranging from 256 to 1024, work mem (MB), ranging from 4 to 64, max connections, ranging from 50 to 200, effective cache (MB), ranging from 512 to 4096, wal level, ranging from minimal to replica, and checkpoint timeout (sec), ranging from 60 to 900. The goal is to optimize 2 responses: throughput (tps) (maximize) and p99 latency (ms) (minimize). Fixed conditions held constant across all runs include pg version = 16, storage = nvme.

A Plackett-Burman screening design was used to efficiently test 6 factors in only 16 runs. This design assumes interactions are negligible and focuses on identifying the most influential main effects.

Key Findings

For throughput, the most influential factors were max connections (43.1%), work mem (17.6%), wal level (16.7%). The best observed value was 6654.0 (at shared buffers = 256, work mem = 4, max connections = 200).

For p99 latency, the most influential factors were max connections (38.8%), wal level (18.0%), shared buffers (16.8%). The best observed value was 7.1 (at shared buffers = 256, work mem = 64, max connections = 50).

Recommended Next Steps

Follow up with a response surface design (CCD or Box-Behnken) on the top 3–4 factors to model curvature and find the true optimum.
Consider whether any fixed factors should be varied in a future study.
The screening results can guide factor reduction — drop factors contributing less than 5% and re-run with a smaller, more focused design.

The Scenario

You are tuning a PostgreSQL database for an OLTP workload. There are 6 configuration parameters to investigate, but running benchmarks is time-consuming. A full factorial would require 64 runs — way too many. Plackett-Burman gives you the answer in 8 base runs, replicated across 2 blocks for 16 total.

ℹ

Why Plackett-Burman with Blocking?

PB gives 8 runs for 6 factors (N+1 rounded to multiple of 4). With block_count: 2, you get 16 total runs and can assess run-to-run variability. If the same factors dominate in both blocks, the finding is robust.

Experimental Setup

Factors

Factor	Low	High	Unit
`shared_buffers`	256	1024	MB
`work_mem`	4	64	MB
`max_connections`	50	200	—
`effective_cache`	512	4096	MB
`wal_level`	minimal	replica	—
`checkpoint_timeout`	60	900	sec

Fixed: pg_version = 16, storage = nvme

Responses

Response	Direction	Unit
`throughput`	↑ maximize	tps
`p99_latency`	↓ minimize	ms

Experimental Matrix

The Plackett-Burman Design produces 16 runs. Each row is one experiment with specific factor settings.

Run	Block	`shared_buffers`	`work_mem`	`max_connections`	`effective_cache`	`wal_level`	`checkpoint_timeout`
1	1	1024	64	200	512	minimal	60
2	1	256	4	200	4096	minimal	60
3	1	256	64	50	4096	minimal	900
4	1	1024	64	200	4096	replica	900
5	1	256	64	50	512	replica	60
6	1	1024	4	50	4096	replica	60
7	1	256	4	200	512	replica	900
8	1	1024	4	50	512	minimal	900
9	2	1024	64	200	512	minimal	60
10	2	1024	64	200	4096	replica	900
11	2	256	64	50	512	replica	60
12	2	256	4	200	512	replica	900
13	2	256	4	200	4096	minimal	60
14	2	256	64	50	4096	minimal	900
15	2	1024	4	50	4096	replica	60
16	2	1024	4	50	512	minimal	900

Step-by-Step Workflow

Complete workflow

# Preview: 8 base x 2 blocks = 16 total runs
$ doe info --config use_cases/04_database_performance_tuning/config.json

# Generate and execute
$ doe generate --config use_cases/04_database_performance_tuning/config.json \
    --output results/run.sh --seed 42
$ bash results/run.sh

# Analyze headless (no display needed)
$ doe analyze --config use_cases/04_database_performance_tuning/config.json --no-plots

# Export to CSV for R / pandas / Excel
$ doe analyze --config use_cases/04_database_performance_tuning/config.json \
    --csv results/csv/

# Optimize and report
$ doe optimize --config use_cases/04_database_performance_tuning/config.json
$ doe optimize --config use_cases/04_database_performance_tuning/config.json --multi  # multi-objective
$ doe report --config use_cases/04_database_performance_tuning/config.json \
    --output results/report.html

Interpreting the Results

Typical PostgreSQL Findings

shared_buffers and effective_cache usually have the largest throughput effects
max_connections often drives latency (more connections = more contention)
wal_level affects write throughput but not reads

Blocking Analysis

✔

Replicate Consistency

If the same factors dominate in both blocks → robust finding. If results differ between blocks → environmental noise may be significant, consider more replicates.

Next Steps

Drop unimportant factors (fix them at convenient values)
Run a full factorial or CCD on the 2–3 important factors
Fine-tune for your specific workload

Features Exercised

Feature	Value
Design type	`plackett_burman`
Factor types	`continuous` (5) + `categorical` (1)
Blocking	`block_count: 2` (16 total = 8 × 2)
`--no-plots`	Headless analysis for CI/SSH
`--csv`	Export 4 CSV files for downstream analysis
Arg style	`double-dash` (default)

Analysis Results

Generated from actual experiment runs using the DOE Helper Tool.

Response: throughput

The Pareto chart shows which database configuration parameters have the greatest impact on throughput.

Pareto Chart

Main Effects Plot

Response: p99_latency

P99 latency is sensitive to a different mix of parameters, highlighting the need to balance throughput and tail latency.

Pareto Chart

Main Effects Plot

Response Surface Plots

3D surfaces fitted with quadratic RSM. Red dots are observed data points.

📊

How to Read These Surfaces

Each plot shows predicted response (vertical axis) across two factors while other factors are held at center. Red dots are actual experimental observations.

Flat surface — these two factors have little effect on the response.
Tilted plane — strong linear effect; moving along one axis consistently changes the response.
Curved/domed surface — quadratic curvature; there is an optimum somewhere in the middle.
Saddle shape — significant interaction; the best setting of one factor depends on the other.
Red dots far from surface — poor model fit in that region; be cautious about predictions there.

throughput (tps) — R² = 0.184, Adj R² = 0.184
Weak fit — interpret the surface shape with caution.
Curvature detected in shared_buffers, work_mem — look for a peak or valley in the surface.
Strongest linear driver: shared_buffers (decreases throughput).
Notable interaction: effective_cache × wal_level — the effect of one depends on the level of the other. Look for a twisted surface.

p99_latency (ms) — R² = 0.378, Adj R² = 0.378
Weak fit — interpret the surface shape with caution.
Curvature detected in work_mem, max_connections — look for a peak or valley in the surface.
Strongest linear driver: checkpoint_timeout (decreases p99_latency).
Notable interaction: max_connections × wal_level — the effect of one depends on the level of the other. Look for a twisted surface.

p99: latency effective cache vs checkpoint timeout

p99: latency max connections vs checkpoint timeout

p99: latency max connections vs effective cache

p99: latency shared buffers vs checkpoint timeout

p99: latency shared buffers vs effective cache

p99: latency shared buffers vs max connections

p99: latency shared buffers vs work mem

p99: latency work mem vs checkpoint timeout

p99: latency work mem vs effective cache

p99: latency work mem vs max connections

throughput: effective cache vs checkpoint timeout

throughput: max connections vs checkpoint timeout

throughput: max connections vs effective cache

throughput: shared buffers vs checkpoint timeout

throughput: shared buffers vs effective cache

throughput: shared buffers vs max connections

throughput: shared buffers vs work mem

throughput: work mem vs checkpoint timeout

throughput: work mem vs effective cache

throughput: work mem vs max connections

Full Analysis Output

doe analyze
=== Main Effects: throughput ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
effective_cache      -1413.5000     294.0442            35.6%
max_connections      -1230.5000     294.0442            31.0%
wal_level             -603.2500     294.0442            15.2%
shared_buffers         394.7500     294.0442            10.0%
work_mem               271.7500     294.0442             6.9%
checkpoint_timeout     -51.2500     294.0442             1.3%

=== Interaction Effects: throughput ===
Factor A             Factor B              Interaction   % Contribution
------------------------------------------------------------------------
shared_buffers       wal_level              -1413.5000            15.7%
work_mem             checkpoint_timeout      1413.5000            15.7%
shared_buffers       work_mem               -1230.5000            13.7%
wal_level            checkpoint_timeout      1230.5000            13.7%
shared_buffers       effective_cache         -603.2500             6.7%
max_connections      checkpoint_timeout       603.2500             6.7%
work_mem             max_connections          394.7500             4.4%
effective_cache      wal_level                394.7500             4.4%
shared_buffers       checkpoint_timeout       351.5000             3.9%
work_mem             wal_level               -351.5000             3.9%
max_connections      effective_cache         -351.5000             3.9%
shared_buffers       max_connections          271.7500             3.0%
effective_cache      checkpoint_timeout      -271.7500             3.0%
work_mem             effective_cache           51.2500             0.6%
max_connections      wal_level                 51.2500             0.6%

=== Summary Statistics: throughput ===

shared_buffers:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1024                8  4834.6250  1034.5819  2904.0000  5971.0000
  256                 8  5229.3750  1343.5052  3016.0000  6654.0000

work_mem:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  4                   8  4896.1250  1347.2024  2904.0000  6654.0000
  64                  8  5167.8750  1052.2628  3016.0000  6099.0000

max_connections:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  200                 8  5647.2500   842.9372  3920.0000  6654.0000
  50                  8  4416.7500  1178.4099  2904.0000  6099.0000

effective_cache:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  4096                8  5738.7500   791.8100  4142.0000  6654.0000
  512                 8  4325.2500  1093.5004  2904.0000  5781.0000

wal_level:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  minimal             8  5333.6250  1274.1940  2904.0000  6654.0000
  replica             8  4730.3750  1064.3736  3016.0000  5971.0000

checkpoint_timeout:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  60                  8  5057.6250  1232.4647  3016.0000  6654.0000
  900                 8  5006.3750  1201.6407  2904.0000  6099.0000

=== Main Effects: p99_latency ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
effective_cache          5.2825       1.0780            47.6%
checkpoint_timeout      -2.2000       1.0780            19.8%
max_connections          2.1100       1.0780            19.0%
work_mem                -1.1600       1.0780            10.4%
wal_level               -0.2525       1.0780             2.3%
shared_buffers           0.1025       1.0780             0.9%

=== Interaction Effects: p99_latency ===
Factor A             Factor B              Interaction   % Contribution
------------------------------------------------------------------------
shared_buffers       wal_level                  5.2825            17.2%
work_mem             checkpoint_timeout        -5.2825            17.2%
shared_buffers       checkpoint_timeout        -2.8550             9.3%
work_mem             wal_level                  2.8550             9.3%
max_connections      effective_cache            2.8550             9.3%
work_mem             effective_cache            2.2000             7.1%
max_connections      wal_level                  2.2000             7.1%
shared_buffers       work_mem                   2.1100             6.9%
wal_level            checkpoint_timeout        -2.1100             6.9%
shared_buffers       max_connections           -1.1600             3.8%
effective_cache      checkpoint_timeout         1.1600             3.8%
shared_buffers       effective_cache           -0.2525             0.8%
max_connections      checkpoint_timeout         0.2525             0.8%
work_mem             max_connections            0.1025             0.3%
effective_cache      wal_level                  0.1025             0.3%

=== Summary Statistics: p99_latency ===

shared_buffers:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1024                8    12.1450     4.2941     7.1000    21.1100
  256                 8    12.2475     4.6257     7.5000    20.6700

work_mem:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  4                   8    12.7763     4.1295     7.9700    21.1100
  64                  8    11.6163     4.6926     7.1000    20.6700

max_connections:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  200                 8    11.1412     2.8838     7.1000    15.5500
  50                  8    13.2513     5.3835     7.5000    21.1100

effective_cache:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  4096                8     9.5550     2.2131     7.1000    12.6600
  512                 8    14.8375     4.3588     7.9700    21.1100

wal_level:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  minimal             8    12.3225     4.1826     7.5000    21.1100
  replica             8    12.0700     4.7236     7.1000    20.6700

checkpoint_timeout:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  60                  8    13.2963     3.4572     8.7100    20.6700
  900                 8    11.0962     5.0124     7.1000    21.1100

Optimization Recommendations

doe optimize
=== Optimization: throughput ===
Direction: maximize

Best observed run: #4
  shared_buffers = 1024
  work_mem = 4
  max_connections = 50
  effective_cache = 4096
  wal_level = replica
  checkpoint_timeout = 60
  Value: 6654.0

RSM Model (linear, R² = 0.69):
  Coefficients:
    intercept:  +5032.0000
    shared_buffers:  -529.1250
    work_mem:  -160.6250
    max_connections:  -28.0000
    effective_cache:  -341.2500
    wal_level:  -5.6250
    checkpoint_timeout:  -686.8750
  Predicted optimum:
    shared_buffers = 256
    work_mem = 64
    max_connections = 50
    effective_cache = 512
    wal_level = replica
    checkpoint_timeout = 60
    Predicted value: 6451.0000

Factor importance:
  1. checkpoint_timeout  (effect: -1373.8, contribution: 39.2%)
  2. shared_buffers  (effect: 1058.2, contribution: 30.2%)
  3. effective_cache  (effect: 682.5, contribution: 19.5%)
  4. work_mem  (effect: -321.2, contribution: 9.2%)
  5. max_connections  (effect: 56.0, contribution: 1.6%)
  6. wal_level  (effect: -11.2, contribution: 0.3%)

=== Optimization: p99_latency ===
Direction: minimize

Best observed run: #14
  shared_buffers = 256
  work_mem = 4
  max_connections = 200
  effective_cache = 4096
  wal_level = minimal
  checkpoint_timeout = 60
  Value: 7.1

RSM Model (linear, R² = 0.77):
  Coefficients:
    intercept:  +12.1963
    shared_buffers:  +1.7000
    work_mem:  +0.9225
    max_connections:  -0.3212
    effective_cache:  +2.1300
    wal_level:  +0.6612
    checkpoint_timeout:  +2.1337
  Predicted optimum:
    shared_buffers = 1024
    work_mem = 64
    max_connections = 200
    effective_cache = 4096
    wal_level = replica
    checkpoint_timeout = 900
    Predicted value: 19.4225

Factor importance:
  1. checkpoint_timeout  (effect: 4.3, contribution: 27.1%)
  2. effective_cache  (effect: -4.3, contribution: 27.1%)
  3. shared_buffers  (effect: -3.4, contribution: 21.6%)
  4. work_mem  (effect: 1.8, contribution: 11.7%)
  5. wal_level  (effect: 1.3, contribution: 8.4%)
  6. max_connections  (effect: 0.6, contribution: 4.1%)

Multi-Objective Optimization

When responses compete, Derringer–Suich desirability finds the best compromise. Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.

Overall Desirability

D = 0.8766

Per-Response Desirability

Response	Weight	Desirability	Predicted	Dir
`throughput`	1.5	0.8701	6305.59 0.8701 6305.59 tps	↑
`p99_latency`	1.0	0.8866	8.15 0.8866 8.15 ms	↓

Recommended Settings

Factor	Value
`shared_buffers`	994.8 MB
`work_mem`	55.71 MB
`max_connections`	88.9
`effective_cache`	3854 MB
`wal_level`	minimal
`checkpoint_timeout`	829.7 sec

Source: from RSM model prediction

Trade-off Summary

Sacrifice = how much worse than single-objective best.

Response	Predicted	Best Observed	Sacrifice
`p99_latency`	8.15	7.10	+1.05

Top 3 Runs by Desirability

Run	D	Factor Settings
#14	0.8514	shared_buffers=1024, work_mem=4, max_connections=50, effective_cache=4096, wal_level=replica, checkpoint_timeout=60
#10	0.8140	shared_buffers=1024, work_mem=64, max_connections=200, effective_cache=512, wal_level=minimal, checkpoint_timeout=60

Model Quality

Response	R²	Type
`p99_latency`	0.2893	linear

Full Multi-Objective Output

doe optimize --multi
============================================================
MULTI-OBJECTIVE OPTIMIZATION
Method: Derringer-Suich Desirability Function
============================================================

Overall desirability: D = 0.8766

Response                  Weight Desirability    Predicted  Direction
---------------------------------------------------------------------
throughput                   1.5       0.8701     6305.59 tps   ↑
p99_latency                  1.0       0.8866        8.15 ms   ↓

Recommended settings:
  shared_buffers = 994.8 MB
  work_mem = 55.71 MB
  max_connections = 88.9
  effective_cache = 3854 MB
  wal_level = minimal
  checkpoint_timeout = 829.7 sec
  (from RSM model prediction)

Trade-off summary:
  throughput: 6305.59 (best observed: 6654.00, sacrifice: +348.41)
  p99_latency: 8.15 (best observed: 7.10, sacrifice: +1.05)

Model quality:
  throughput: R² = 0.4324 (linear)
  p99_latency: R² = 0.2893 (linear)

Top 3 observed runs by overall desirability:
  1. Run #6 (D=0.8618): shared_buffers=1024, work_mem=64, max_connections=200, effective_cache=512, wal_level=minimal, checkpoint_timeout=60
  2. Run #14 (D=0.8514): shared_buffers=1024, work_mem=4, max_connections=50, effective_cache=4096, wal_level=replica, checkpoint_timeout=60
  3. Run #10 (D=0.8140): shared_buffers=1024, work_mem=64, max_connections=200, effective_cache=512, wal_level=minimal, checkpoint_timeout=60