← All Use Cases
📊
Fractional Factorial Design

Data Replication Lag

Fractional factorial of 5 replication parameters for lag and failover readiness

Summary

This experiment investigates data replication lag. Fractional factorial of 5 replication parameters for lag and failover readiness.

The design varies 5 factors: sync mode, ranging from async to semi_sync, binlog batch size (txns), ranging from 1 to 100, parallel workers (threads), ranging from 1 to 16, network buffer kb (KB), ranging from 64 to 1024, and compression, ranging from off to on. The goal is to optimize 2 responses: replication lag ms (ms) (minimize) and failover ready pct (%) (maximize). Fixed conditions held constant across all runs include engine = mysql_8, gtid = on.

A fractional factorial design reduces the number of runs from 32 to 8 by deliberately confounding higher-order interactions. This is ideal for screening — identifying which of the 5 factors matter most before investing in a full study.

Key Findings

For replication lag ms, the most influential factors were parallel workers (34.2%), binlog batch size (19.8%), sync mode (16.7%). The best observed value was 0.0 (at sync mode = semi_sync, binlog batch size = 100, parallel workers = 16).

For failover ready pct, the most influential factors were parallel workers (45.3%), compression (24.9%), network buffer kb (23.9%). The best observed value was 100.9 (at sync mode = semi_sync, binlog batch size = 100, parallel workers = 16).

Recommended Next Steps

Experimental Setup

Factors

FactorLowHighUnit
sync_modeasyncsemi_sync
binlog_batch_size1100txns
parallel_workers116threads
network_buffer_kb641024KB
compressionoffon

Fixed: engine = mysql_8, gtid = on

Responses

ResponseDirectionUnit
replication_lag_ms↓ minimizems
failover_ready_pct↑ maximize%

Configuration

use_cases/44_data_replication_lag/config.json
{ "metadata": { "name": "Data Replication Lag", "description": "Fractional factorial of 5 replication parameters for lag and failover readiness" }, "factors": [ { "name": "sync_mode", "levels": [ "async", "semi_sync" ], "type": "categorical", "unit": "" }, { "name": "binlog_batch_size", "levels": [ "1", "100" ], "type": "continuous", "unit": "txns" }, { "name": "parallel_workers", "levels": [ "1", "16" ], "type": "continuous", "unit": "threads" }, { "name": "network_buffer_kb", "levels": [ "64", "1024" ], "type": "continuous", "unit": "KB" }, { "name": "compression", "levels": [ "off", "on" ], "type": "categorical", "unit": "" } ], "fixed_factors": { "engine": "mysql_8", "gtid": "on" }, "responses": [ { "name": "replication_lag_ms", "optimize": "minimize", "unit": "ms" }, { "name": "failover_ready_pct", "optimize": "maximize", "unit": "%" } ], "settings": { "operation": "fractional_factorial", "test_script": "use_cases/44_data_replication_lag/sim.sh" } }

Experimental Matrix

The Fractional Factorial Design produces 8 runs. Each row is one experiment with specific factor settings.

Runsync_modebinlog_batch_sizeparallel_workersnetwork_buffer_kbcompression
1async1001664off
2semi_sync1164off
3semi_sync10011024off
4semi_sync100161024on
5async100164on
6semi_sync11664on
7async111024on
8async1161024off

Step-by-Step Workflow

1

Preview the design

Terminal
$ doe info --config use_cases/44_data_replication_lag/config.json
2

Generate the runner script

Terminal
$ doe generate --config use_cases/44_data_replication_lag/config.json \ --output use_cases/44_data_replication_lag/results/run.sh --seed 42
3

Execute the experiments

Terminal
$ bash use_cases/44_data_replication_lag/results/run.sh
4

Analyze results

Terminal
$ doe analyze --config use_cases/44_data_replication_lag/config.json
5

Get optimization recommendations

Terminal
$ doe optimize --config use_cases/44_data_replication_lag/config.json
6

Multi-objective optimization

With 2 competing responses, use --multi to find the best compromise via Derringer–Suich desirability.

Terminal
$ doe optimize --config use_cases/44_data_replication_lag/config.json --multi
7

Generate the HTML report

Terminal
$ doe report --config use_cases/44_data_replication_lag/config.json \ --output use_cases/44_data_replication_lag/results/report.html

Features Exercised

FeatureValue
Design typefractional_factorial
Factor typescontinuous (3), categorical (2)
Arg styledouble-dash
Responses2 (replication_lag_ms ↓, failover_ready_pct ↑)
Total runs8

Analysis Results

Generated from actual experiment runs using the DOE Helper Tool.

Response: replication_lag_ms

Top factors: parallel_workers (34.2%), binlog_batch_size (19.8%), sync_mode (16.7%).

ANOVA

SourceDFSSMSFp-value
SourceDFSSMSFp-value
sync_mode113944.500013944.50000.7080.4383
binlog_batch_size119800.500019800.50001.0060.3619
parallel_workers158824.500058824.50002.9890.1444
network_buffer_kb113122.000013122.00000.6670.4513
compression18712.00008712.00000.4430.5353
sync_mode*binlog_batch_size113122.000013122.00000.6670.4513
sync_mode*parallel_workers18712.00008712.00000.4430.5353
sync_mode*network_buffer_kb119800.500019800.50001.0060.3619
sync_mode*compression158824.500058824.50002.9890.1444
binlog_batch_size*parallel_workers1140450.0000140450.00007.1360.0443
binlog_batch_size*network_buffer_kb113944.500013944.50000.7080.4383
binlog_batch_size*compression10.50000.50000.0000.9962
parallel_workers*network_buffer_kb10.50000.50000.0000.9962
parallel_workers*compression113944.500013944.50000.7080.4383
network_buffer_kb*compression1140450.0000140450.00007.1360.0443
Error(LenthPSE)598415.000019683.0000
Total7254854.000036407.7143

Pareto Chart

Pareto chart for replication_lag_ms

Main Effects Plot

Main effects plot for replication_lag_ms

Normal Probability Plot of Effects

Normal probability plot for replication_lag_ms

Half-Normal Plot of Effects

Half-normal plot for replication_lag_ms

Model Diagnostics

Model diagnostics for replication_lag_ms

Response: failover_ready_pct

Top factors: parallel_workers (45.3%), compression (24.9%), network_buffer_kb (23.9%).

ANOVA

SourceDFSSMSFp-value
SourceDFSSMSFp-value
sync_mode12.10122.10120.0320.8650
binlog_batch_size10.03120.03120.0000.9834
parallel_workers1157.5312157.53122.4030.1818
network_buffer_kb143.711243.71120.6670.4513
compression147.531247.53120.7250.4334
sync_mode*binlog_batch_size143.711343.71130.6670.4513
sync_mode*parallel_workers147.531347.53130.7250.4334
sync_mode*network_buffer_kb10.03130.03130.0000.9834
sync_mode*compression1157.5313157.53132.4030.1818
binlog_batch_size*parallel_workers1116.2812116.28121.7730.2404
binlog_batch_size*network_buffer_kb12.10132.10130.0320.8650
binlog_batch_size*compression11.20121.20120.0180.8976
parallel_workers*network_buffer_kb11.20131.20130.0180.8976
parallel_workers*compression12.10132.10130.0320.8650
network_buffer_kb*compression1116.2812116.28121.7730.2404
Error(LenthPSE)5327.834465.5669
Total7368.388852.6270

Pareto Chart

Pareto chart for failover_ready_pct

Main Effects Plot

Main effects plot for failover_ready_pct

Normal Probability Plot of Effects

Normal probability plot for failover_ready_pct

Half-Normal Plot of Effects

Half-normal plot for failover_ready_pct

Model Diagnostics

Model diagnostics for failover_ready_pct

Response Surface Plots

3D surfaces fitted with quadratic RSM. Red dots are observed data points.

failover ready pct binlog batch size vs network buffer kb

RSM surface: failover ready pct binlog batch size vs network buffer kb

failover ready pct binlog batch size vs parallel workers

RSM surface: failover ready pct binlog batch size vs parallel workers

failover ready pct parallel workers vs network buffer kb

RSM surface: failover ready pct parallel workers vs network buffer kb

replication lag ms binlog batch size vs network buffer kb

RSM surface: replication lag ms binlog batch size vs network buffer kb

replication lag ms binlog batch size vs parallel workers

RSM surface: replication lag ms binlog batch size vs parallel workers

replication lag ms parallel workers vs network buffer kb

RSM surface: replication lag ms parallel workers vs network buffer kb

Multi-Objective Optimization

When responses compete, Derringer–Suich desirability finds the best compromise. Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.

Overall Desirability
D = 0.9545

Per-Response Desirability

ResponseWeightDesirabilityPredictedDir
replication_lag_ms 1.0
0.9545
0.00 0.9545 0.00 ms
failover_ready_pct 1.5
0.9545
100.90 0.9545 100.90 %

Recommended Settings

FactorValue
sync_modeasync
binlog_batch_size100 txns
parallel_workers16 threads
network_buffer_kb64 KB
compressionoff

Source: from observed run #1

Trade-off Summary

Sacrifice = how much worse than single-objective best.

ResponsePredictedBest ObservedSacrifice
failover_ready_pct100.90100.90+0.00

Top 3 Runs by Desirability

RunDFactor Settings
#80.8543sync_mode=semi_sync, binlog_batch_size=100, parallel_workers=1, network_buffer_kb=1024, compression=off
#40.7457sync_mode=semi_sync, binlog_batch_size=100, parallel_workers=16, network_buffer_kb=1024, compression=on

Model Quality

ResponseType
failover_ready_pct0.7414linear

Full Multi-Objective Output

doe optimize --multi
============================================================ MULTI-OBJECTIVE OPTIMIZATION Method: Derringer-Suich Desirability Function ============================================================ Overall desirability: D = 0.9545 Response Weight Desirability Predicted Direction --------------------------------------------------------------------- replication_lag_ms 1.0 0.9545 0.00 ms ↓ failover_ready_pct 1.5 0.9545 100.90 % ↑ Recommended settings: sync_mode = async binlog_batch_size = 100 txns parallel_workers = 16 threads network_buffer_kb = 64 KB compression = off (from observed run #1) Trade-off summary: replication_lag_ms: 0.00 (best observed: 0.00, sacrifice: +0.00) failover_ready_pct: 100.90 (best observed: 100.90, sacrifice: +0.00) Model quality: replication_lag_ms: R² = 0.6482 (linear) failover_ready_pct: R² = 0.7414 (linear) Top 3 observed runs by overall desirability: 1. Run #1 (D=0.9545): sync_mode=async, binlog_batch_size=100, parallel_workers=16, network_buffer_kb=64, compression=off 2. Run #8 (D=0.8543): sync_mode=semi_sync, binlog_batch_size=100, parallel_workers=1, network_buffer_kb=1024, compression=off 3. Run #4 (D=0.7457): sync_mode=semi_sync, binlog_batch_size=100, parallel_workers=16, network_buffer_kb=1024, compression=on

Full Analysis Output

doe analyze
=== Main Effects: replication_lag_ms === Factor Effect Std Error % Contribution -------------------------------------------------------------- parallel_workers -171.5000 67.4608 34.2% binlog_batch_size 99.5000 67.4608 19.8% sync_mode -83.5000 67.4608 16.7% network_buffer_kb 81.0000 67.4608 16.2% compression 66.0000 67.4608 13.2% === ANOVA Table: replication_lag_ms === Source DF SS MS F p-value ----------------------------------------------------------------------------- sync_mode 1 13944.5000 13944.5000 0.708 0.4383 binlog_batch_size 1 19800.5000 19800.5000 1.006 0.3619 parallel_workers 1 58824.5000 58824.5000 2.989 0.1444 network_buffer_kb 1 13122.0000 13122.0000 0.667 0.4513 compression 1 8712.0000 8712.0000 0.443 0.5353 sync_mode*binlog_batch_size 1 13122.0000 13122.0000 0.667 0.4513 sync_mode*parallel_workers 1 8712.0000 8712.0000 0.443 0.5353 sync_mode*network_buffer_kb 1 19800.5000 19800.5000 1.006 0.3619 sync_mode*compression 1 58824.5000 58824.5000 2.989 0.1444 binlog_batch_size*parallel_workers 1 140450.0000 140450.0000 7.136 0.0443 binlog_batch_size*network_buffer_kb 1 13944.5000 13944.5000 0.708 0.4383 binlog_batch_size*compression 1 0.5000 0.5000 0.000 0.9962 parallel_workers*network_buffer_kb 1 0.5000 0.5000 0.000 0.9962 parallel_workers*compression 1 13944.5000 13944.5000 0.708 0.4383 network_buffer_kb*compression 1 140450.0000 140450.0000 7.136 0.0443 Error (Lenth PSE) 5 98415.0000 19683.0000 Total 7 254854.0000 36407.7143 Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design) === Interaction Effects: replication_lag_ms === Factor A Factor B Interaction % Contribution ------------------------------------------------------------------------ binlog_batch_size parallel_workers -265.0000 23.7% network_buffer_kb compression 265.0000 23.7% sync_mode compression -171.5000 15.4% sync_mode network_buffer_kb -99.5000 8.9% binlog_batch_size network_buffer_kb 83.5000 7.5% parallel_workers compression -83.5000 7.5% sync_mode binlog_batch_size -81.0000 7.3% sync_mode parallel_workers 66.0000 5.9% binlog_batch_size compression -0.5000 0.0% parallel_workers network_buffer_kb 0.5000 0.0% === Summary Statistics: replication_lag_ms === sync_mode: Level N Mean Std Min Max ------------------------------------------------------------ async 4 259.7500 230.1787 99.0000 601.0000 semi_sync 4 176.2500 165.2904 0.0000 371.0000 binlog_batch_size: Level N Mean Std Min Max ------------------------------------------------------------ 1 4 168.2500 66.2590 87.0000 247.0000 100 4 267.7500 271.9576 0.0000 601.0000 parallel_workers: Level N Mean Std Min Max ------------------------------------------------------------ 1 4 303.7500 232.1571 87.0000 601.0000 16 4 132.2500 106.9871 0.0000 247.0000 network_buffer_kb: Level N Mean Std Min Max ------------------------------------------------------------ 1024 4 177.5000 152.1414 0.0000 371.0000 64 4 258.5000 239.6463 87.0000 601.0000 compression: Level N Mean Std Min Max ------------------------------------------------------------ off 4 185.0000 131.1488 87.0000 371.0000 on 4 251.0000 254.6514 0.0000 601.0000 === Main Effects: failover_ready_pct === Factor Effect Std Error % Contribution -------------------------------------------------------------- parallel_workers 8.8750 2.5648 45.3% compression -4.8750 2.5648 24.9% network_buffer_kb -4.6750 2.5648 23.9% sync_mode 1.0250 2.5648 5.2% binlog_batch_size 0.1250 2.5648 0.6% === ANOVA Table: failover_ready_pct === Source DF SS MS F p-value ----------------------------------------------------------------------------- sync_mode 1 2.1012 2.1012 0.032 0.8650 binlog_batch_size 1 0.0312 0.0312 0.000 0.9834 parallel_workers 1 157.5312 157.5312 2.403 0.1818 network_buffer_kb 1 43.7112 43.7112 0.667 0.4513 compression 1 47.5312 47.5312 0.725 0.4334 sync_mode*binlog_batch_size 1 43.7113 43.7113 0.667 0.4513 sync_mode*parallel_workers 1 47.5313 47.5313 0.725 0.4334 sync_mode*network_buffer_kb 1 0.0313 0.0313 0.000 0.9834 sync_mode*compression 1 157.5313 157.5313 2.403 0.1818 binlog_batch_size*parallel_workers 1 116.2812 116.2812 1.773 0.2404 binlog_batch_size*network_buffer_kb 1 2.1013 2.1013 0.032 0.8650 binlog_batch_size*compression 1 1.2012 1.2012 0.018 0.8976 parallel_workers*network_buffer_kb 1 1.2013 1.2013 0.018 0.8976 parallel_workers*compression 1 2.1013 2.1013 0.032 0.8650 network_buffer_kb*compression 1 116.2812 116.2812 1.773 0.2404 Error (Lenth PSE) 5 327.8344 65.5669 Total 7 368.3888 52.6270 Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design) === Interaction Effects: failover_ready_pct === Factor A Factor B Interaction % Contribution ------------------------------------------------------------------------ sync_mode compression 8.8750 23.7% binlog_batch_size parallel_workers 7.6250 20.4% network_buffer_kb compression -7.6250 20.4% sync_mode parallel_workers -4.8750 13.0% sync_mode binlog_batch_size 4.6750 12.5% binlog_batch_size network_buffer_kb -1.0250 2.7% parallel_workers compression 1.0250 2.7% binlog_batch_size compression 0.7750 2.1% parallel_workers network_buffer_kb -0.7750 2.1% sync_mode network_buffer_kb -0.1250 0.3% === Summary Statistics: failover_ready_pct === sync_mode: Level N Mean Std Min Max ------------------------------------------------------------ async 4 91.2750 9.2500 78.7000 99.3000 semi_sync 4 92.3000 6.0443 87.7000 100.9000 binlog_batch_size: Level N Mean Std Min Max ------------------------------------------------------------ 1 4 91.7250 3.9500 87.7000 97.0000 100 4 91.8500 10.3529 78.7000 100.9000 parallel_workers: Level N Mean Std Min Max ------------------------------------------------------------ 1 4 87.3500 5.9518 78.7000 92.1000 16 4 96.2250 5.9044 87.7000 100.9000 network_buffer_kb: Level N Mean Std Min Max ------------------------------------------------------------ 1024 4 94.1250 5.8312 88.5000 100.9000 64 4 89.4500 8.6153 78.7000 99.3000 compression: Level N Mean Std Min Max ------------------------------------------------------------ off 4 94.2250 4.8562 88.5000 99.3000 on 4 89.3500 9.1307 78.7000 100.9000

Optimization Recommendations

doe optimize
=== Optimization: replication_lag_ms === Direction: minimize Best observed run: #1 sync_mode = semi_sync binlog_batch_size = 100 parallel_workers = 16 network_buffer_kb = 1024 compression = on Value: 0.0 RSM Model (linear, R² = 0.4952, Adj R² = -0.7669): Coefficients: intercept +218.0000 sync_mode -88.7500 binlog_batch_size -40.5000 parallel_workers -7.0000 network_buffer_kb +2.7500 compression -78.7500 Predicted optimum (from linear model, at observed points): sync_mode = async binlog_batch_size = 1 parallel_workers = 16 network_buffer_kb = 1024 compression = off Predicted value: 421.7500 Surface optimum (via L-BFGS-B, linear model): sync_mode = semi_sync binlog_batch_size = 100 parallel_workers = 16 network_buffer_kb = 64 compression = on Predicted value: 0.2500 Model quality: Weak fit — consider adding center points or using a different design. Factor importance: 1. sync_mode (effect: -177.5, contribution: 40.8%) 2. compression (effect: -157.5, contribution: 36.2%) 3. binlog_batch_size (effect: -81.0, contribution: 18.6%) 4. parallel_workers (effect: -14.0, contribution: 3.2%) 5. network_buffer_kb (effect: -5.5, contribution: 1.3%) === Optimization: failover_ready_pct === Direction: maximize Best observed run: #1 sync_mode = semi_sync binlog_batch_size = 100 parallel_workers = 16 network_buffer_kb = 1024 compression = on Value: 100.9 RSM Model (linear, R² = 0.6654, Adj R² = -0.1712): Coefficients: intercept +91.7875 sync_mode +2.6375 binlog_batch_size +2.3375 parallel_workers -1.3375 network_buffer_kb +2.1875 compression +3.4125 Predicted optimum (from linear model, at observed points): sync_mode = semi_sync binlog_batch_size = 100 parallel_workers = 16 network_buffer_kb = 1024 compression = on Predicted value: 101.0250 Surface optimum (via L-BFGS-B, linear model): sync_mode = semi_sync binlog_batch_size = 100 parallel_workers = 1 network_buffer_kb = 1024 compression = on Predicted value: 103.7000 Model quality: Moderate fit — use predictions directionally, not precisely. Factor importance: 1. compression (effect: 6.8, contribution: 28.6%) 2. sync_mode (effect: 5.3, contribution: 22.1%) 3. binlog_batch_size (effect: 4.7, contribution: 19.6%) 4. network_buffer_kb (effect: -4.4, contribution: 18.4%) 5. parallel_workers (effect: -2.7, contribution: 11.2%)
← All Use Cases Next: Time-Series Downsampling →