Summary
This experiment investigates data replication lag. Fractional factorial of 5 replication parameters for lag and failover readiness.
The design varies 5 factors: sync mode, ranging from async to semi_sync, binlog batch size (txns), ranging from 1 to 100, parallel workers (threads), ranging from 1 to 16, network buffer kb (KB), ranging from 64 to 1024, and compression, ranging from off to on. The goal is to optimize 2 responses: replication lag ms (ms) (minimize) and failover ready pct (%) (maximize). Fixed conditions held constant across all runs include engine = mysql_8, gtid = on.
A fractional factorial design reduces the number of runs from 32 to 8 by deliberately confounding higher-order interactions. This is ideal for screening — identifying which of the 5 factors matter most before investing in a full study.
Key Findings
For replication lag ms, the most influential factors were parallel workers (34.2%), binlog batch size (19.8%), sync mode (16.7%). The best observed value was 0.0 (at sync mode = semi_sync, binlog batch size = 100, parallel workers = 16).
For failover ready pct, the most influential factors were parallel workers (45.3%), compression (24.9%), network buffer kb (23.9%). The best observed value was 100.9 (at sync mode = semi_sync, binlog batch size = 100, parallel workers = 16).
Recommended Next Steps
- Follow up with a response surface design (CCD or Box-Behnken) on the top 3–4 factors to model curvature and find the true optimum.
- Consider whether any fixed factors should be varied in a future study.
- The screening results can guide factor reduction — drop factors contributing less than 5% and re-run with a smaller, more focused design.
Experimental Setup
Factors
| Factor | Low | High | Unit |
sync_mode | async | semi_sync | |
binlog_batch_size | 1 | 100 | txns |
parallel_workers | 1 | 16 | threads |
network_buffer_kb | 64 | 1024 | KB |
compression | off | on | |
Fixed: engine = mysql_8, gtid = on
Responses
| Response | Direction | Unit |
replication_lag_ms | ↓ minimize | ms |
failover_ready_pct | ↑ maximize | % |
Configuration
{
"metadata": {
"name": "Data Replication Lag",
"description": "Fractional factorial of 5 replication parameters for lag and failover readiness"
},
"factors": [
{
"name": "sync_mode",
"levels": [
"async",
"semi_sync"
],
"type": "categorical",
"unit": ""
},
{
"name": "binlog_batch_size",
"levels": [
"1",
"100"
],
"type": "continuous",
"unit": "txns"
},
{
"name": "parallel_workers",
"levels": [
"1",
"16"
],
"type": "continuous",
"unit": "threads"
},
{
"name": "network_buffer_kb",
"levels": [
"64",
"1024"
],
"type": "continuous",
"unit": "KB"
},
{
"name": "compression",
"levels": [
"off",
"on"
],
"type": "categorical",
"unit": ""
}
],
"fixed_factors": {
"engine": "mysql_8",
"gtid": "on"
},
"responses": [
{
"name": "replication_lag_ms",
"optimize": "minimize",
"unit": "ms"
},
{
"name": "failover_ready_pct",
"optimize": "maximize",
"unit": "%"
}
],
"settings": {
"operation": "fractional_factorial",
"test_script": "use_cases/44_data_replication_lag/sim.sh"
}
}
Experimental Matrix
The Fractional Factorial Design produces 8 runs. Each row is one experiment with specific factor settings.
| Run | sync_mode | binlog_batch_size | parallel_workers | network_buffer_kb | compression |
| 1 | async | 100 | 16 | 64 | off |
| 2 | semi_sync | 1 | 1 | 64 | off |
| 3 | semi_sync | 100 | 1 | 1024 | off |
| 4 | semi_sync | 100 | 16 | 1024 | on |
| 5 | async | 100 | 1 | 64 | on |
| 6 | semi_sync | 1 | 16 | 64 | on |
| 7 | async | 1 | 1 | 1024 | on |
| 8 | async | 1 | 16 | 1024 | off |
Step-by-Step Workflow
1
Preview the design
$ doe info --config use_cases/44_data_replication_lag/config.json
2
Generate the runner script
$ doe generate --config use_cases/44_data_replication_lag/config.json \
--output use_cases/44_data_replication_lag/results/run.sh --seed 42
3
Execute the experiments
$ bash use_cases/44_data_replication_lag/results/run.sh
4
Analyze results
$ doe analyze --config use_cases/44_data_replication_lag/config.json
5
Get optimization recommendations
$ doe optimize --config use_cases/44_data_replication_lag/config.json
6
Multi-objective optimization
With 2 competing responses, use --multi to find the best compromise via Derringer–Suich desirability.
$ doe optimize --config use_cases/44_data_replication_lag/config.json --multi
7
Generate the HTML report
$ doe report --config use_cases/44_data_replication_lag/config.json \
--output use_cases/44_data_replication_lag/results/report.html
Features Exercised
| Feature | Value |
| Design type | fractional_factorial |
| Factor types | continuous (3), categorical (2) |
| Arg style | double-dash |
| Responses | 2 (replication_lag_ms ↓, failover_ready_pct ↑) |
| Total runs | 8 |
Analysis Results
Generated from actual experiment runs using the DOE Helper Tool.
Response: replication_lag_ms
Top factors: parallel_workers (34.2%), binlog_batch_size (19.8%), sync_mode (16.7%).
ANOVA
| Source | DF | SS | MS | F | p-value |
| Source | DF | SS | MS | F | p-value |
| sync_mode | 1 | 13944.5000 | 13944.5000 | 0.708 | 0.4383 |
| binlog_batch_size | 1 | 19800.5000 | 19800.5000 | 1.006 | 0.3619 |
| parallel_workers | 1 | 58824.5000 | 58824.5000 | 2.989 | 0.1444 |
| network_buffer_kb | 1 | 13122.0000 | 13122.0000 | 0.667 | 0.4513 |
| compression | 1 | 8712.0000 | 8712.0000 | 0.443 | 0.5353 |
| sync_mode*binlog_batch_size | 1 | 13122.0000 | 13122.0000 | 0.667 | 0.4513 |
| sync_mode*parallel_workers | 1 | 8712.0000 | 8712.0000 | 0.443 | 0.5353 |
| sync_mode*network_buffer_kb | 1 | 19800.5000 | 19800.5000 | 1.006 | 0.3619 |
| sync_mode*compression | 1 | 58824.5000 | 58824.5000 | 2.989 | 0.1444 |
| binlog_batch_size*parallel_workers | 1 | 140450.0000 | 140450.0000 | 7.136 | 0.0443 |
| binlog_batch_size*network_buffer_kb | 1 | 13944.5000 | 13944.5000 | 0.708 | 0.4383 |
| binlog_batch_size*compression | 1 | 0.5000 | 0.5000 | 0.000 | 0.9962 |
| parallel_workers*network_buffer_kb | 1 | 0.5000 | 0.5000 | 0.000 | 0.9962 |
| parallel_workers*compression | 1 | 13944.5000 | 13944.5000 | 0.708 | 0.4383 |
| network_buffer_kb*compression | 1 | 140450.0000 | 140450.0000 | 7.136 | 0.0443 |
| Error | (Lenth | PSE) | 5 | 98415.0000 | 19683.0000 |
| Total | 7 | 254854.0000 | 36407.7143 | | |
Pareto Chart
Main Effects Plot
Normal Probability Plot of Effects
Half-Normal Plot of Effects
Model Diagnostics
Response: failover_ready_pct
Top factors: parallel_workers (45.3%), compression (24.9%), network_buffer_kb (23.9%).
ANOVA
| Source | DF | SS | MS | F | p-value |
| Source | DF | SS | MS | F | p-value |
| sync_mode | 1 | 2.1012 | 2.1012 | 0.032 | 0.8650 |
| binlog_batch_size | 1 | 0.0312 | 0.0312 | 0.000 | 0.9834 |
| parallel_workers | 1 | 157.5312 | 157.5312 | 2.403 | 0.1818 |
| network_buffer_kb | 1 | 43.7112 | 43.7112 | 0.667 | 0.4513 |
| compression | 1 | 47.5312 | 47.5312 | 0.725 | 0.4334 |
| sync_mode*binlog_batch_size | 1 | 43.7113 | 43.7113 | 0.667 | 0.4513 |
| sync_mode*parallel_workers | 1 | 47.5313 | 47.5313 | 0.725 | 0.4334 |
| sync_mode*network_buffer_kb | 1 | 0.0313 | 0.0313 | 0.000 | 0.9834 |
| sync_mode*compression | 1 | 157.5313 | 157.5313 | 2.403 | 0.1818 |
| binlog_batch_size*parallel_workers | 1 | 116.2812 | 116.2812 | 1.773 | 0.2404 |
| binlog_batch_size*network_buffer_kb | 1 | 2.1013 | 2.1013 | 0.032 | 0.8650 |
| binlog_batch_size*compression | 1 | 1.2012 | 1.2012 | 0.018 | 0.8976 |
| parallel_workers*network_buffer_kb | 1 | 1.2013 | 1.2013 | 0.018 | 0.8976 |
| parallel_workers*compression | 1 | 2.1013 | 2.1013 | 0.032 | 0.8650 |
| network_buffer_kb*compression | 1 | 116.2812 | 116.2812 | 1.773 | 0.2404 |
| Error | (Lenth | PSE) | 5 | 327.8344 | 65.5669 |
| Total | 7 | 368.3888 | 52.6270 | | |
Pareto Chart
Main Effects Plot
Normal Probability Plot of Effects
Half-Normal Plot of Effects
Model Diagnostics
Response Surface Plots
3D surfaces fitted with quadratic RSM. Red dots are observed data points.
failover ready pct binlog batch size vs network buffer kb
failover ready pct binlog batch size vs parallel workers
failover ready pct parallel workers vs network buffer kb
replication lag ms binlog batch size vs network buffer kb
replication lag ms binlog batch size vs parallel workers
replication lag ms parallel workers vs network buffer kb
Multi-Objective Optimization
When responses compete, Derringer–Suich desirability finds the best compromise.
Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.
Overall Desirability
D = 0.9545
Per-Response Desirability
| Response | Weight | Desirability | Predicted | Dir |
replication_lag_ms |
1.0 |
|
0.00 0.9545 0.00 ms |
↓ |
failover_ready_pct |
1.5 |
|
100.90 0.9545 100.90 % |
↑ |
Recommended Settings
| Factor | Value |
sync_mode | async |
binlog_batch_size | 100 txns |
parallel_workers | 16 threads |
network_buffer_kb | 64 KB |
compression | off |
Source: from observed run #1
Trade-off Summary
Sacrifice = how much worse than single-objective best.
| Response | Predicted | Best Observed | Sacrifice |
failover_ready_pct | 100.90 | 100.90 | +0.00 |
Top 3 Runs by Desirability
| Run | D | Factor Settings |
| #8 | 0.8543 | sync_mode=semi_sync, binlog_batch_size=100, parallel_workers=1, network_buffer_kb=1024, compression=off |
| #4 | 0.7457 | sync_mode=semi_sync, binlog_batch_size=100, parallel_workers=16, network_buffer_kb=1024, compression=on |
Model Quality
| Response | R² | Type |
failover_ready_pct | 0.7414 | linear |
Full Multi-Objective Output
============================================================
MULTI-OBJECTIVE OPTIMIZATION
Method: Derringer-Suich Desirability Function
============================================================
Overall desirability: D = 0.9545
Response Weight Desirability Predicted Direction
---------------------------------------------------------------------
replication_lag_ms 1.0 0.9545 0.00 ms ↓
failover_ready_pct 1.5 0.9545 100.90 % ↑
Recommended settings:
sync_mode = async
binlog_batch_size = 100 txns
parallel_workers = 16 threads
network_buffer_kb = 64 KB
compression = off
(from observed run #1)
Trade-off summary:
replication_lag_ms: 0.00 (best observed: 0.00, sacrifice: +0.00)
failover_ready_pct: 100.90 (best observed: 100.90, sacrifice: +0.00)
Model quality:
replication_lag_ms: R² = 0.6482 (linear)
failover_ready_pct: R² = 0.7414 (linear)
Top 3 observed runs by overall desirability:
1. Run #1 (D=0.9545): sync_mode=async, binlog_batch_size=100, parallel_workers=16, network_buffer_kb=64, compression=off
2. Run #8 (D=0.8543): sync_mode=semi_sync, binlog_batch_size=100, parallel_workers=1, network_buffer_kb=1024, compression=off
3. Run #4 (D=0.7457): sync_mode=semi_sync, binlog_batch_size=100, parallel_workers=16, network_buffer_kb=1024, compression=on
Full Analysis Output
=== Main Effects: replication_lag_ms ===
Factor Effect Std Error % Contribution
--------------------------------------------------------------
parallel_workers -171.5000 67.4608 34.2%
binlog_batch_size 99.5000 67.4608 19.8%
sync_mode -83.5000 67.4608 16.7%
network_buffer_kb 81.0000 67.4608 16.2%
compression 66.0000 67.4608 13.2%
=== ANOVA Table: replication_lag_ms ===
Source DF SS MS F p-value
-----------------------------------------------------------------------------
sync_mode 1 13944.5000 13944.5000 0.708 0.4383
binlog_batch_size 1 19800.5000 19800.5000 1.006 0.3619
parallel_workers 1 58824.5000 58824.5000 2.989 0.1444
network_buffer_kb 1 13122.0000 13122.0000 0.667 0.4513
compression 1 8712.0000 8712.0000 0.443 0.5353
sync_mode*binlog_batch_size 1 13122.0000 13122.0000 0.667 0.4513
sync_mode*parallel_workers 1 8712.0000 8712.0000 0.443 0.5353
sync_mode*network_buffer_kb 1 19800.5000 19800.5000 1.006 0.3619
sync_mode*compression 1 58824.5000 58824.5000 2.989 0.1444
binlog_batch_size*parallel_workers 1 140450.0000 140450.0000 7.136 0.0443
binlog_batch_size*network_buffer_kb 1 13944.5000 13944.5000 0.708 0.4383
binlog_batch_size*compression 1 0.5000 0.5000 0.000 0.9962
parallel_workers*network_buffer_kb 1 0.5000 0.5000 0.000 0.9962
parallel_workers*compression 1 13944.5000 13944.5000 0.708 0.4383
network_buffer_kb*compression 1 140450.0000 140450.0000 7.136 0.0443
Error (Lenth PSE) 5 98415.0000 19683.0000
Total 7 254854.0000 36407.7143
Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design)
=== Interaction Effects: replication_lag_ms ===
Factor A Factor B Interaction % Contribution
------------------------------------------------------------------------
binlog_batch_size parallel_workers -265.0000 23.7%
network_buffer_kb compression 265.0000 23.7%
sync_mode compression -171.5000 15.4%
sync_mode network_buffer_kb -99.5000 8.9%
binlog_batch_size network_buffer_kb 83.5000 7.5%
parallel_workers compression -83.5000 7.5%
sync_mode binlog_batch_size -81.0000 7.3%
sync_mode parallel_workers 66.0000 5.9%
binlog_batch_size compression -0.5000 0.0%
parallel_workers network_buffer_kb 0.5000 0.0%
=== Summary Statistics: replication_lag_ms ===
sync_mode:
Level N Mean Std Min Max
------------------------------------------------------------
async 4 259.7500 230.1787 99.0000 601.0000
semi_sync 4 176.2500 165.2904 0.0000 371.0000
binlog_batch_size:
Level N Mean Std Min Max
------------------------------------------------------------
1 4 168.2500 66.2590 87.0000 247.0000
100 4 267.7500 271.9576 0.0000 601.0000
parallel_workers:
Level N Mean Std Min Max
------------------------------------------------------------
1 4 303.7500 232.1571 87.0000 601.0000
16 4 132.2500 106.9871 0.0000 247.0000
network_buffer_kb:
Level N Mean Std Min Max
------------------------------------------------------------
1024 4 177.5000 152.1414 0.0000 371.0000
64 4 258.5000 239.6463 87.0000 601.0000
compression:
Level N Mean Std Min Max
------------------------------------------------------------
off 4 185.0000 131.1488 87.0000 371.0000
on 4 251.0000 254.6514 0.0000 601.0000
=== Main Effects: failover_ready_pct ===
Factor Effect Std Error % Contribution
--------------------------------------------------------------
parallel_workers 8.8750 2.5648 45.3%
compression -4.8750 2.5648 24.9%
network_buffer_kb -4.6750 2.5648 23.9%
sync_mode 1.0250 2.5648 5.2%
binlog_batch_size 0.1250 2.5648 0.6%
=== ANOVA Table: failover_ready_pct ===
Source DF SS MS F p-value
-----------------------------------------------------------------------------
sync_mode 1 2.1012 2.1012 0.032 0.8650
binlog_batch_size 1 0.0312 0.0312 0.000 0.9834
parallel_workers 1 157.5312 157.5312 2.403 0.1818
network_buffer_kb 1 43.7112 43.7112 0.667 0.4513
compression 1 47.5312 47.5312 0.725 0.4334
sync_mode*binlog_batch_size 1 43.7113 43.7113 0.667 0.4513
sync_mode*parallel_workers 1 47.5313 47.5313 0.725 0.4334
sync_mode*network_buffer_kb 1 0.0313 0.0313 0.000 0.9834
sync_mode*compression 1 157.5313 157.5313 2.403 0.1818
binlog_batch_size*parallel_workers 1 116.2812 116.2812 1.773 0.2404
binlog_batch_size*network_buffer_kb 1 2.1013 2.1013 0.032 0.8650
binlog_batch_size*compression 1 1.2012 1.2012 0.018 0.8976
parallel_workers*network_buffer_kb 1 1.2013 1.2013 0.018 0.8976
parallel_workers*compression 1 2.1013 2.1013 0.032 0.8650
network_buffer_kb*compression 1 116.2812 116.2812 1.773 0.2404
Error (Lenth PSE) 5 327.8344 65.5669
Total 7 368.3888 52.6270
Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design)
=== Interaction Effects: failover_ready_pct ===
Factor A Factor B Interaction % Contribution
------------------------------------------------------------------------
sync_mode compression 8.8750 23.7%
binlog_batch_size parallel_workers 7.6250 20.4%
network_buffer_kb compression -7.6250 20.4%
sync_mode parallel_workers -4.8750 13.0%
sync_mode binlog_batch_size 4.6750 12.5%
binlog_batch_size network_buffer_kb -1.0250 2.7%
parallel_workers compression 1.0250 2.7%
binlog_batch_size compression 0.7750 2.1%
parallel_workers network_buffer_kb -0.7750 2.1%
sync_mode network_buffer_kb -0.1250 0.3%
=== Summary Statistics: failover_ready_pct ===
sync_mode:
Level N Mean Std Min Max
------------------------------------------------------------
async 4 91.2750 9.2500 78.7000 99.3000
semi_sync 4 92.3000 6.0443 87.7000 100.9000
binlog_batch_size:
Level N Mean Std Min Max
------------------------------------------------------------
1 4 91.7250 3.9500 87.7000 97.0000
100 4 91.8500 10.3529 78.7000 100.9000
parallel_workers:
Level N Mean Std Min Max
------------------------------------------------------------
1 4 87.3500 5.9518 78.7000 92.1000
16 4 96.2250 5.9044 87.7000 100.9000
network_buffer_kb:
Level N Mean Std Min Max
------------------------------------------------------------
1024 4 94.1250 5.8312 88.5000 100.9000
64 4 89.4500 8.6153 78.7000 99.3000
compression:
Level N Mean Std Min Max
------------------------------------------------------------
off 4 94.2250 4.8562 88.5000 99.3000
on 4 89.3500 9.1307 78.7000 100.9000
Optimization Recommendations
=== Optimization: replication_lag_ms ===
Direction: minimize
Best observed run: #1
sync_mode = semi_sync
binlog_batch_size = 100
parallel_workers = 16
network_buffer_kb = 1024
compression = on
Value: 0.0
RSM Model (linear, R² = 0.4952, Adj R² = -0.7669):
Coefficients:
intercept +218.0000
sync_mode -88.7500
binlog_batch_size -40.5000
parallel_workers -7.0000
network_buffer_kb +2.7500
compression -78.7500
Predicted optimum (from linear model, at observed points):
sync_mode = async
binlog_batch_size = 1
parallel_workers = 16
network_buffer_kb = 1024
compression = off
Predicted value: 421.7500
Surface optimum (via L-BFGS-B, linear model):
sync_mode = semi_sync
binlog_batch_size = 100
parallel_workers = 16
network_buffer_kb = 64
compression = on
Predicted value: 0.2500
Model quality: Weak fit — consider adding center points or using a different design.
Factor importance:
1. sync_mode (effect: -177.5, contribution: 40.8%)
2. compression (effect: -157.5, contribution: 36.2%)
3. binlog_batch_size (effect: -81.0, contribution: 18.6%)
4. parallel_workers (effect: -14.0, contribution: 3.2%)
5. network_buffer_kb (effect: -5.5, contribution: 1.3%)
=== Optimization: failover_ready_pct ===
Direction: maximize
Best observed run: #1
sync_mode = semi_sync
binlog_batch_size = 100
parallel_workers = 16
network_buffer_kb = 1024
compression = on
Value: 100.9
RSM Model (linear, R² = 0.6654, Adj R² = -0.1712):
Coefficients:
intercept +91.7875
sync_mode +2.6375
binlog_batch_size +2.3375
parallel_workers -1.3375
network_buffer_kb +2.1875
compression +3.4125
Predicted optimum (from linear model, at observed points):
sync_mode = semi_sync
binlog_batch_size = 100
parallel_workers = 16
network_buffer_kb = 1024
compression = on
Predicted value: 101.0250
Surface optimum (via L-BFGS-B, linear model):
sync_mode = semi_sync
binlog_batch_size = 100
parallel_workers = 1
network_buffer_kb = 1024
compression = on
Predicted value: 103.7000
Model quality: Moderate fit — use predictions directionally, not precisely.
Factor importance:
1. compression (effect: 6.8, contribution: 28.6%)
2. sync_mode (effect: 5.3, contribution: 22.1%)
3. binlog_batch_size (effect: 4.7, contribution: 19.6%)
4. network_buffer_kb (effect: -4.4, contribution: 18.4%)
5. parallel_workers (effect: -2.7, contribution: 11.2%)