← All Use Cases
📊
Box-Behnken Design

Stream Processing Windowing

Box-Behnken design to tune Flink window size, watermark delay, and parallelism for latency and accuracy

Summary

This experiment investigates stream processing windowing. Box-Behnken design to tune Flink window size, watermark delay, and parallelism for latency and accuracy.

The design varies 3 factors: window size s (s), ranging from 5 to 120, watermark delay s (s), ranging from 1 to 30, and parallelism (slots), ranging from 4 to 32. The goal is to optimize 2 responses: end to end latency ms (ms) (minimize) and result accuracy (%) (maximize). Fixed conditions held constant across all runs include checkpoint interval ms = 10000, state backend = rocksdb.

A Box-Behnken design was chosen because it efficiently fits quadratic models with 3 continuous factors while avoiding extreme corner combinations — requiring only 15 runs instead of the 8 needed for a full factorial at two levels.

Quadratic response surface models were fitted to capture potential curvature and factor interactions. The RSM contour plots below visualize how pairs of factors jointly affect each response.

Key Findings

For end to end latency ms, the most influential factors were parallelism (39.9%), window size s (36.6%), watermark delay s (23.5%). The best observed value was 126.0 (at window size s = 62.5, watermark delay s = 15.5, parallelism = 18).

For result accuracy, the most influential factors were parallelism (38.5%), window size s (36.4%), watermark delay s (25.1%). The best observed value was 97.9 (at window size s = 62.5, watermark delay s = 15.5, parallelism = 18).

Recommended Next Steps

Experimental Setup

Factors

FactorLowHighUnit
window_size_s5120s
watermark_delay_s130s
parallelism432slots

Fixed: checkpoint_interval_ms = 10000, state_backend = rocksdb

Responses

ResponseDirectionUnit
end_to_end_latency_ms↓ minimizems
result_accuracy↑ maximize%

Configuration

use_cases/39_stream_processing_windowing/config.json
{ "metadata": { "name": "Stream Processing Windowing", "description": "Box-Behnken design to tune Flink window size, watermark delay, and parallelism for latency and accuracy" }, "factors": [ { "name": "window_size_s", "levels": [ "5", "120" ], "type": "continuous", "unit": "s" }, { "name": "watermark_delay_s", "levels": [ "1", "30" ], "type": "continuous", "unit": "s" }, { "name": "parallelism", "levels": [ "4", "32" ], "type": "continuous", "unit": "slots" } ], "fixed_factors": { "checkpoint_interval_ms": "10000", "state_backend": "rocksdb" }, "responses": [ { "name": "end_to_end_latency_ms", "optimize": "minimize", "unit": "ms" }, { "name": "result_accuracy", "optimize": "maximize", "unit": "%" } ], "settings": { "operation": "box_behnken", "test_script": "use_cases/39_stream_processing_windowing/sim.sh" } }

Experimental Matrix

The Box-Behnken Design produces 15 runs. Each row is one experiment with specific factor settings.

Runwindow_size_swatermark_delay_sparallelism
162.514
262.515.518
312015.532
412015.54
562.515.518
662.515.518
7515.532
8120118
962.5132
101203018
11515.54
1262.53032
135118
1453018
1562.5304

Step-by-Step Workflow

1

Preview the design

Terminal
$ doe info --config use_cases/39_stream_processing_windowing/config.json
2

Generate the runner script

Terminal
$ doe generate --config use_cases/39_stream_processing_windowing/config.json \ --output use_cases/39_stream_processing_windowing/results/run.sh --seed 42
3

Execute the experiments

Terminal
$ bash use_cases/39_stream_processing_windowing/results/run.sh
4

Analyze results

Terminal
$ doe analyze --config use_cases/39_stream_processing_windowing/config.json
5

Get optimization recommendations

Terminal
$ doe optimize --config use_cases/39_stream_processing_windowing/config.json
6

Multi-objective optimization

With 2 competing responses, use --multi to find the best compromise via Derringer–Suich desirability.

Terminal
$ doe optimize --config use_cases/39_stream_processing_windowing/config.json --multi
7

Generate the HTML report

Terminal
$ doe report --config use_cases/39_stream_processing_windowing/config.json \ --output use_cases/39_stream_processing_windowing/results/report.html

Features Exercised

FeatureValue
Design typebox_behnken
Factor typescontinuous (all 3)
Arg styledouble-dash
Responses2 (end_to_end_latency_ms ↓, result_accuracy ↑)
Total runs15

Analysis Results

Generated from actual experiment runs using the DOE Helper Tool.

Response: end_to_end_latency_ms

Top factors: parallelism (39.9%), window_size_s (36.6%), watermark_delay_s (23.5%).

ANOVA

SourceDFSSMSFp-value
SourceDFSSMSFp-value
window_size_s2345764.4048172882.20241.4370.2930
watermark_delay_s2145040.154872520.07740.6030.5704
parallelism2310956.1548155478.07741.2920.3264
LackofFit6314296.619052382.7698
PureError2240648.0000
Error8554944.6190120324.0000
Total141356705.333396907.5238

Pareto Chart

Pareto chart for end_to_end_latency_ms

Main Effects Plot

Main effects plot for end_to_end_latency_ms

Normal Probability Plot of Effects

Normal probability plot for end_to_end_latency_ms

Half-Normal Plot of Effects

Half-normal plot for end_to_end_latency_ms

Model Diagnostics

Model diagnostics for end_to_end_latency_ms

Response: result_accuracy

Top factors: parallelism (38.5%), window_size_s (36.4%), watermark_delay_s (25.1%).

ANOVA

SourceDFSSMSFp-value
SourceDFSSMSFp-value
window_size_s257.033328.51660.7420.5063
watermark_delay_s225.384712.69230.3300.7281
parallelism255.405027.70250.7210.5155
LackofFit693.546315.5911
PureError276.8800
Error8170.426338.4400
Total14308.249322.0178

Pareto Chart

Pareto chart for result_accuracy

Main Effects Plot

Main effects plot for result_accuracy

Normal Probability Plot of Effects

Normal probability plot for result_accuracy

Half-Normal Plot of Effects

Half-normal plot for result_accuracy

Model Diagnostics

Model diagnostics for result_accuracy

Response Surface Plots

3D surfaces fitted with quadratic RSM. Red dots are observed data points.

end to end latency ms watermark delay s vs parallelism

RSM surface: end to end latency ms watermark delay s vs parallelism

end to end latency ms window size s vs parallelism

RSM surface: end to end latency ms window size s vs parallelism

end to end latency ms window size s vs watermark delay s

RSM surface: end to end latency ms window size s vs watermark delay s

result accuracy watermark delay s vs parallelism

RSM surface: result accuracy watermark delay s vs parallelism

result accuracy window size s vs parallelism

RSM surface: result accuracy window size s vs parallelism

result accuracy window size s vs watermark delay s

RSM surface: result accuracy window size s vs watermark delay s

Multi-Objective Optimization

When responses compete, Derringer–Suich desirability finds the best compromise. Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.

Overall Desirability
D = 0.7094

Per-Response Desirability

ResponseWeightDesirabilityPredictedDir
end_to_end_latency_ms 1.0
0.5755
568.00 0.5755 568.00 ms
result_accuracy 1.5
0.8155
95.30 0.8155 95.30 %

Recommended Settings

FactorValue
window_size_s120 s
watermark_delay_s15.5 s
parallelism4 slots

Source: from observed run #12

Trade-off Summary

Sacrifice = how much worse than single-objective best.

ResponsePredictedBest ObservedSacrifice
result_accuracy95.3097.90+2.60

Top 3 Runs by Desirability

RunDFactor Settings
#20.6524window_size_s=62.5, watermark_delay_s=1, parallelism=32
#60.6443window_size_s=62.5, watermark_delay_s=30, parallelism=4

Model Quality

ResponseType
result_accuracy0.0390linear

Full Multi-Objective Output

doe optimize --multi
============================================================ MULTI-OBJECTIVE OPTIMIZATION Method: Derringer-Suich Desirability Function ============================================================ Overall desirability: D = 0.7094 Response Weight Desirability Predicted Direction --------------------------------------------------------------------- end_to_end_latency_ms 1.0 0.5755 568.00 ms ↓ result_accuracy 1.5 0.8155 95.30 % ↑ Recommended settings: window_size_s = 120 s watermark_delay_s = 15.5 s parallelism = 4 slots (from observed run #12) Trade-off summary: end_to_end_latency_ms: 568.00 (best observed: 126.00, sacrifice: +442.00) result_accuracy: 95.30 (best observed: 97.90, sacrifice: +2.60) Model quality: end_to_end_latency_ms: R² = 0.0720 (linear) result_accuracy: R² = 0.0390 (linear) Top 3 observed runs by overall desirability: 1. Run #12 (D=0.7094): window_size_s=120, watermark_delay_s=15.5, parallelism=4 2. Run #2 (D=0.6524): window_size_s=62.5, watermark_delay_s=1, parallelism=32 3. Run #6 (D=0.6443): window_size_s=62.5, watermark_delay_s=30, parallelism=4

Full Analysis Output

doe analyze
=== Main Effects: end_to_end_latency_ms === Factor Effect Std Error % Contribution -------------------------------------------------------------- parallelism 394.2500 80.3772 39.9% window_size_s 361.9643 80.3772 36.6% watermark_delay_s 232.5357 80.3772 23.5% === ANOVA Table: end_to_end_latency_ms === Source DF SS MS F p-value ----------------------------------------------------------------------------- window_size_s 2 345764.4048 172882.2024 1.437 0.2930 watermark_delay_s 2 145040.1548 72520.0774 0.603 0.5704 parallelism 2 310956.1548 155478.0774 1.292 0.3264 Lack of Fit 6 314296.6190 52382.7698 0.435 0.8183 Pure Error 2 240648.0000 120324.0000 Error 8 554944.6190 120324.0000 Total 14 1356705.3333 96907.5238 === Summary Statistics: end_to_end_latency_ms === window_size_s: Level N Mean Std Min Max ------------------------------------------------------------ 120 4 293.7500 179.9007 126.0000 510.0000 5 4 588.7500 373.7007 170.0000 1065.0000 62.5 7 655.7143 287.1966 366.0000 1186.0000 watermark_delay_s: Level N Mean Std Min Max ------------------------------------------------------------ 1 4 406.7500 219.6988 126.0000 647.0000 15.5 7 639.2857 363.9692 170.0000 1186.0000 30 4 504.5000 301.1207 167.0000 900.0000 parallelism: Level N Mean Std Min Max ------------------------------------------------------------ 18 7 538.7143 353.2344 126.0000 1186.0000 32 4 346.5000 128.4199 170.0000 478.0000 4 4 740.7500 287.3017 488.0000 1065.0000 === Main Effects: result_accuracy === Factor Effect Std Error % Contribution -------------------------------------------------------------- parallelism 4.8500 1.2116 38.5% window_size_s 4.5857 1.2116 36.4% watermark_delay_s 3.1571 1.2116 25.1% === ANOVA Table: result_accuracy === Source DF SS MS F p-value ----------------------------------------------------------------------------- window_size_s 2 57.0333 28.5166 0.742 0.5063 watermark_delay_s 2 25.3847 12.6923 0.330 0.7281 parallelism 2 55.4050 27.7025 0.721 0.5155 Lack of Fit 6 93.5463 15.5911 0.406 0.8346 Pure Error 2 76.8800 38.4400 Error 8 170.4263 38.4400 Total 14 308.2493 22.0178 === Summary Statistics: result_accuracy === window_size_s: Level N Mean Std Min Max ------------------------------------------------------------ 120 4 87.8000 5.0020 80.9000 92.6000 5 4 89.6250 4.9675 84.8000 94.3000 62.5 7 92.3857 4.1257 86.1000 97.9000 watermark_delay_s: Level N Mean Std Min Max ------------------------------------------------------------ 1 4 90.4750 2.7597 87.8000 93.5000 15.5 7 91.5571 4.8483 84.8000 97.9000 30 4 88.4000 6.2976 80.9000 95.2000 parallelism: Level N Mean Std Min Max ------------------------------------------------------------ 18 7 89.6286 6.0753 80.9000 97.9000 32 4 88.7000 2.8925 84.8000 91.6000 4 4 93.5500 1.4480 92.1000 95.2000

Optimization Recommendations

doe optimize
=== Optimization: end_to_end_latency_ms === Direction: minimize Best observed run: #7 window_size_s = 62.5 watermark_delay_s = 15.5 parallelism = 18 Value: 126.0 RSM Model (linear, R² = 0.0841, Adj R² = -0.1657): Coefficients: intercept +541.3333 window_size_s -103.2500 watermark_delay_s -1.5000 parallelism -60.0000 RSM Model (quadratic, R² = 0.3227, Adj R² = -0.8963): Coefficients: intercept +792.3333 window_size_s -103.2500 watermark_delay_s -1.5000 parallelism -60.0000 window_size_s*watermark_delay_s -6.2500 window_size_s*parallelism +47.7500 watermark_delay_s*parallelism +139.2500 window_size_s^2 -146.5417 watermark_delay_s^2 -164.0417 parallelism^2 -160.0417 Curvature analysis: watermark_delay_s coef=-164.0417 concave (has a maximum) parallelism coef=-160.0417 concave (has a maximum) window_size_s coef=-146.5417 concave (has a maximum) Notable interactions: watermark_delay_s*parallelism coef=+139.2500 (synergistic) window_size_s*parallelism coef=+47.7500 (synergistic) window_size_s*watermark_delay_s coef=-6.2500 (antagonistic) Predicted optimum (from linear model, at observed points): window_size_s = 5 watermark_delay_s = 15.5 parallelism = 4 Predicted value: 704.5833 Surface optimum (via L-BFGS-B, linear model): window_size_s = 120 watermark_delay_s = 30 parallelism = 32 Predicted value: 376.5833 Model quality: Weak fit — consider adding center points or using a different design. Factor importance: 1. window_size_s (effect: 226.6, contribution: 39.9%) 2. parallelism (effect: 197.9, contribution: 34.8%) 3. watermark_delay_s (effect: 143.6, contribution: 25.3%) === Optimization: result_accuracy === Direction: maximize Best observed run: #10 window_size_s = 62.5 watermark_delay_s = 15.5 parallelism = 18 Value: 97.9 RSM Model (linear, R² = 0.2671, Adj R² = 0.0673): Coefficients: intercept +90.4267 window_size_s -1.5125 watermark_delay_s -2.3250 parallelism -1.6125 RSM Model (quadratic, R² = 0.5704, Adj R² = -0.2030): Coefficients: intercept +93.3333 window_size_s -1.5125 watermark_delay_s -2.3250 parallelism -1.6125 window_size_s*watermark_delay_s +0.3750 window_size_s*parallelism -1.2500 watermark_delay_s*parallelism +2.9750 window_size_s^2 -3.5417 watermark_delay_s^2 -1.4167 parallelism^2 -0.4917 Curvature analysis: window_size_s coef=-3.5417 concave (has a maximum) watermark_delay_s coef=-1.4167 concave (has a maximum) parallelism coef=-0.4917 concave (has a maximum) Notable interactions: watermark_delay_s*parallelism coef=+2.9750 (synergistic) window_size_s*parallelism coef=-1.2500 (antagonistic) window_size_s*watermark_delay_s coef=+0.3750 (synergistic) Predicted optimum (from linear model, at observed points): window_size_s = 62.5 watermark_delay_s = 1 parallelism = 4 Predicted value: 94.3642 Surface optimum (via L-BFGS-B, linear model): window_size_s = 5 watermark_delay_s = 1 parallelism = 4 Predicted value: 95.8767 Model quality: Weak fit — consider adding center points or using a different design. Factor importance: 1. window_size_s (effect: 4.9, contribution: 38.4%) 2. watermark_delay_s (effect: 4.6, contribution: 36.3%) 3. parallelism (effect: 3.2, contribution: 25.2%)
← Previous: Data Lake Partitioning Next: ETL Batch Size Tuning →