Summary
This experiment investigates stream processing windowing. Box-Behnken design to tune Flink window size, watermark delay, and parallelism for latency and accuracy.
The design varies 3 factors: window size s (s), ranging from 5 to 120, watermark delay s (s), ranging from 1 to 30, and parallelism (slots), ranging from 4 to 32. The goal is to optimize 2 responses: end to end latency ms (ms) (minimize) and result accuracy (%) (maximize). Fixed conditions held constant across all runs include checkpoint interval ms = 10000, state backend = rocksdb.
A Box-Behnken design was chosen because it efficiently fits quadratic models with 3 continuous factors while avoiding extreme corner combinations — requiring only 15 runs instead of the 8 needed for a full factorial at two levels.
Quadratic response surface models were fitted to capture potential curvature and factor interactions. The RSM contour plots below visualize how pairs of factors jointly affect each response.
Key Findings
For end to end latency ms, the most influential factors were parallelism (39.9%), window size s (36.6%), watermark delay s (23.5%). The best observed value was 126.0 (at window size s = 62.5, watermark delay s = 15.5, parallelism = 18).
For result accuracy, the most influential factors were parallelism (38.5%), window size s (36.4%), watermark delay s (25.1%). The best observed value was 97.9 (at window size s = 62.5, watermark delay s = 15.5, parallelism = 18).
Recommended Next Steps
- Run confirmation experiments at the predicted optimal settings to validate the model.
- Consider whether any fixed factors should be varied in a future study.
Experimental Setup
Factors
| Factor | Low | High | Unit |
window_size_s | 5 | 120 | s |
watermark_delay_s | 1 | 30 | s |
parallelism | 4 | 32 | slots |
Fixed: checkpoint_interval_ms = 10000, state_backend = rocksdb
Responses
| Response | Direction | Unit |
end_to_end_latency_ms | ↓ minimize | ms |
result_accuracy | ↑ maximize | % |
Configuration
{
"metadata": {
"name": "Stream Processing Windowing",
"description": "Box-Behnken design to tune Flink window size, watermark delay, and parallelism for latency and accuracy"
},
"factors": [
{
"name": "window_size_s",
"levels": [
"5",
"120"
],
"type": "continuous",
"unit": "s"
},
{
"name": "watermark_delay_s",
"levels": [
"1",
"30"
],
"type": "continuous",
"unit": "s"
},
{
"name": "parallelism",
"levels": [
"4",
"32"
],
"type": "continuous",
"unit": "slots"
}
],
"fixed_factors": {
"checkpoint_interval_ms": "10000",
"state_backend": "rocksdb"
},
"responses": [
{
"name": "end_to_end_latency_ms",
"optimize": "minimize",
"unit": "ms"
},
{
"name": "result_accuracy",
"optimize": "maximize",
"unit": "%"
}
],
"settings": {
"operation": "box_behnken",
"test_script": "use_cases/39_stream_processing_windowing/sim.sh"
}
}
Experimental Matrix
The Box-Behnken Design produces 15 runs. Each row is one experiment with specific factor settings.
| Run | window_size_s | watermark_delay_s | parallelism |
| 1 | 62.5 | 1 | 4 |
| 2 | 62.5 | 15.5 | 18 |
| 3 | 120 | 15.5 | 32 |
| 4 | 120 | 15.5 | 4 |
| 5 | 62.5 | 15.5 | 18 |
| 6 | 62.5 | 15.5 | 18 |
| 7 | 5 | 15.5 | 32 |
| 8 | 120 | 1 | 18 |
| 9 | 62.5 | 1 | 32 |
| 10 | 120 | 30 | 18 |
| 11 | 5 | 15.5 | 4 |
| 12 | 62.5 | 30 | 32 |
| 13 | 5 | 1 | 18 |
| 14 | 5 | 30 | 18 |
| 15 | 62.5 | 30 | 4 |
Step-by-Step Workflow
1
Preview the design
$ doe info --config use_cases/39_stream_processing_windowing/config.json
2
Generate the runner script
$ doe generate --config use_cases/39_stream_processing_windowing/config.json \
--output use_cases/39_stream_processing_windowing/results/run.sh --seed 42
3
Execute the experiments
$ bash use_cases/39_stream_processing_windowing/results/run.sh
4
Analyze results
$ doe analyze --config use_cases/39_stream_processing_windowing/config.json
5
Get optimization recommendations
$ doe optimize --config use_cases/39_stream_processing_windowing/config.json
6
Multi-objective optimization
With 2 competing responses, use --multi to find the best compromise via Derringer–Suich desirability.
$ doe optimize --config use_cases/39_stream_processing_windowing/config.json --multi
7
Generate the HTML report
$ doe report --config use_cases/39_stream_processing_windowing/config.json \
--output use_cases/39_stream_processing_windowing/results/report.html
Features Exercised
| Feature | Value |
| Design type | box_behnken |
| Factor types | continuous (all 3) |
| Arg style | double-dash |
| Responses | 2 (end_to_end_latency_ms ↓, result_accuracy ↑) |
| Total runs | 15 |
Analysis Results
Generated from actual experiment runs using the DOE Helper Tool.
Response: end_to_end_latency_ms
Top factors: parallelism (39.9%), window_size_s (36.6%), watermark_delay_s (23.5%).
ANOVA
| Source | DF | SS | MS | F | p-value |
| Source | DF | SS | MS | F | p-value |
| window_size_s | 2 | 345764.4048 | 172882.2024 | 1.437 | 0.2930 |
| watermark_delay_s | 2 | 145040.1548 | 72520.0774 | 0.603 | 0.5704 |
| parallelism | 2 | 310956.1548 | 155478.0774 | 1.292 | 0.3264 |
| Lack | of | Fit | 6 | 314296.6190 | 52382.7698 |
| Pure | Error | 2 | 240648.0000 | | |
| Error | 8 | 554944.6190 | 120324.0000 | | |
| Total | 14 | 1356705.3333 | 96907.5238 | | |
Pareto Chart
Main Effects Plot
Normal Probability Plot of Effects
Half-Normal Plot of Effects
Model Diagnostics
Response: result_accuracy
Top factors: parallelism (38.5%), window_size_s (36.4%), watermark_delay_s (25.1%).
ANOVA
| Source | DF | SS | MS | F | p-value |
| Source | DF | SS | MS | F | p-value |
| window_size_s | 2 | 57.0333 | 28.5166 | 0.742 | 0.5063 |
| watermark_delay_s | 2 | 25.3847 | 12.6923 | 0.330 | 0.7281 |
| parallelism | 2 | 55.4050 | 27.7025 | 0.721 | 0.5155 |
| Lack | of | Fit | 6 | 93.5463 | 15.5911 |
| Pure | Error | 2 | 76.8800 | | |
| Error | 8 | 170.4263 | 38.4400 | | |
| Total | 14 | 308.2493 | 22.0178 | | |
Pareto Chart
Main Effects Plot
Normal Probability Plot of Effects
Half-Normal Plot of Effects
Model Diagnostics
Response Surface Plots
3D surfaces fitted with quadratic RSM. Red dots are observed data points.
end to end latency ms watermark delay s vs parallelism
end to end latency ms window size s vs parallelism
end to end latency ms window size s vs watermark delay s
result accuracy watermark delay s vs parallelism
result accuracy window size s vs parallelism
result accuracy window size s vs watermark delay s
Multi-Objective Optimization
When responses compete, Derringer–Suich desirability finds the best compromise.
Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.
Overall Desirability
D = 0.7094
Per-Response Desirability
| Response | Weight | Desirability | Predicted | Dir |
end_to_end_latency_ms |
1.0 |
|
568.00 0.5755 568.00 ms |
↓ |
result_accuracy |
1.5 |
|
95.30 0.8155 95.30 % |
↑ |
Recommended Settings
| Factor | Value |
window_size_s | 120 s |
watermark_delay_s | 15.5 s |
parallelism | 4 slots |
Source: from observed run #12
Trade-off Summary
Sacrifice = how much worse than single-objective best.
| Response | Predicted | Best Observed | Sacrifice |
result_accuracy | 95.30 | 97.90 | +2.60 |
Top 3 Runs by Desirability
| Run | D | Factor Settings |
| #2 | 0.6524 | window_size_s=62.5, watermark_delay_s=1, parallelism=32 |
| #6 | 0.6443 | window_size_s=62.5, watermark_delay_s=30, parallelism=4 |
Model Quality
| Response | R² | Type |
result_accuracy | 0.0390 | linear |
Full Multi-Objective Output
============================================================
MULTI-OBJECTIVE OPTIMIZATION
Method: Derringer-Suich Desirability Function
============================================================
Overall desirability: D = 0.7094
Response Weight Desirability Predicted Direction
---------------------------------------------------------------------
end_to_end_latency_ms 1.0 0.5755 568.00 ms ↓
result_accuracy 1.5 0.8155 95.30 % ↑
Recommended settings:
window_size_s = 120 s
watermark_delay_s = 15.5 s
parallelism = 4 slots
(from observed run #12)
Trade-off summary:
end_to_end_latency_ms: 568.00 (best observed: 126.00, sacrifice: +442.00)
result_accuracy: 95.30 (best observed: 97.90, sacrifice: +2.60)
Model quality:
end_to_end_latency_ms: R² = 0.0720 (linear)
result_accuracy: R² = 0.0390 (linear)
Top 3 observed runs by overall desirability:
1. Run #12 (D=0.7094): window_size_s=120, watermark_delay_s=15.5, parallelism=4
2. Run #2 (D=0.6524): window_size_s=62.5, watermark_delay_s=1, parallelism=32
3. Run #6 (D=0.6443): window_size_s=62.5, watermark_delay_s=30, parallelism=4
Full Analysis Output
=== Main Effects: end_to_end_latency_ms ===
Factor Effect Std Error % Contribution
--------------------------------------------------------------
parallelism 394.2500 80.3772 39.9%
window_size_s 361.9643 80.3772 36.6%
watermark_delay_s 232.5357 80.3772 23.5%
=== ANOVA Table: end_to_end_latency_ms ===
Source DF SS MS F p-value
-----------------------------------------------------------------------------
window_size_s 2 345764.4048 172882.2024 1.437 0.2930
watermark_delay_s 2 145040.1548 72520.0774 0.603 0.5704
parallelism 2 310956.1548 155478.0774 1.292 0.3264
Lack of Fit 6 314296.6190 52382.7698 0.435 0.8183
Pure Error 2 240648.0000 120324.0000
Error 8 554944.6190 120324.0000
Total 14 1356705.3333 96907.5238
=== Summary Statistics: end_to_end_latency_ms ===
window_size_s:
Level N Mean Std Min Max
------------------------------------------------------------
120 4 293.7500 179.9007 126.0000 510.0000
5 4 588.7500 373.7007 170.0000 1065.0000
62.5 7 655.7143 287.1966 366.0000 1186.0000
watermark_delay_s:
Level N Mean Std Min Max
------------------------------------------------------------
1 4 406.7500 219.6988 126.0000 647.0000
15.5 7 639.2857 363.9692 170.0000 1186.0000
30 4 504.5000 301.1207 167.0000 900.0000
parallelism:
Level N Mean Std Min Max
------------------------------------------------------------
18 7 538.7143 353.2344 126.0000 1186.0000
32 4 346.5000 128.4199 170.0000 478.0000
4 4 740.7500 287.3017 488.0000 1065.0000
=== Main Effects: result_accuracy ===
Factor Effect Std Error % Contribution
--------------------------------------------------------------
parallelism 4.8500 1.2116 38.5%
window_size_s 4.5857 1.2116 36.4%
watermark_delay_s 3.1571 1.2116 25.1%
=== ANOVA Table: result_accuracy ===
Source DF SS MS F p-value
-----------------------------------------------------------------------------
window_size_s 2 57.0333 28.5166 0.742 0.5063
watermark_delay_s 2 25.3847 12.6923 0.330 0.7281
parallelism 2 55.4050 27.7025 0.721 0.5155
Lack of Fit 6 93.5463 15.5911 0.406 0.8346
Pure Error 2 76.8800 38.4400
Error 8 170.4263 38.4400
Total 14 308.2493 22.0178
=== Summary Statistics: result_accuracy ===
window_size_s:
Level N Mean Std Min Max
------------------------------------------------------------
120 4 87.8000 5.0020 80.9000 92.6000
5 4 89.6250 4.9675 84.8000 94.3000
62.5 7 92.3857 4.1257 86.1000 97.9000
watermark_delay_s:
Level N Mean Std Min Max
------------------------------------------------------------
1 4 90.4750 2.7597 87.8000 93.5000
15.5 7 91.5571 4.8483 84.8000 97.9000
30 4 88.4000 6.2976 80.9000 95.2000
parallelism:
Level N Mean Std Min Max
------------------------------------------------------------
18 7 89.6286 6.0753 80.9000 97.9000
32 4 88.7000 2.8925 84.8000 91.6000
4 4 93.5500 1.4480 92.1000 95.2000
Optimization Recommendations
=== Optimization: end_to_end_latency_ms ===
Direction: minimize
Best observed run: #7
window_size_s = 62.5
watermark_delay_s = 15.5
parallelism = 18
Value: 126.0
RSM Model (linear, R² = 0.0841, Adj R² = -0.1657):
Coefficients:
intercept +541.3333
window_size_s -103.2500
watermark_delay_s -1.5000
parallelism -60.0000
RSM Model (quadratic, R² = 0.3227, Adj R² = -0.8963):
Coefficients:
intercept +792.3333
window_size_s -103.2500
watermark_delay_s -1.5000
parallelism -60.0000
window_size_s*watermark_delay_s -6.2500
window_size_s*parallelism +47.7500
watermark_delay_s*parallelism +139.2500
window_size_s^2 -146.5417
watermark_delay_s^2 -164.0417
parallelism^2 -160.0417
Curvature analysis:
watermark_delay_s coef=-164.0417 concave (has a maximum)
parallelism coef=-160.0417 concave (has a maximum)
window_size_s coef=-146.5417 concave (has a maximum)
Notable interactions:
watermark_delay_s*parallelism coef=+139.2500 (synergistic)
window_size_s*parallelism coef=+47.7500 (synergistic)
window_size_s*watermark_delay_s coef=-6.2500 (antagonistic)
Predicted optimum (from linear model, at observed points):
window_size_s = 5
watermark_delay_s = 15.5
parallelism = 4
Predicted value: 704.5833
Surface optimum (via L-BFGS-B, linear model):
window_size_s = 120
watermark_delay_s = 30
parallelism = 32
Predicted value: 376.5833
Model quality: Weak fit — consider adding center points or using a different design.
Factor importance:
1. window_size_s (effect: 226.6, contribution: 39.9%)
2. parallelism (effect: 197.9, contribution: 34.8%)
3. watermark_delay_s (effect: 143.6, contribution: 25.3%)
=== Optimization: result_accuracy ===
Direction: maximize
Best observed run: #10
window_size_s = 62.5
watermark_delay_s = 15.5
parallelism = 18
Value: 97.9
RSM Model (linear, R² = 0.2671, Adj R² = 0.0673):
Coefficients:
intercept +90.4267
window_size_s -1.5125
watermark_delay_s -2.3250
parallelism -1.6125
RSM Model (quadratic, R² = 0.5704, Adj R² = -0.2030):
Coefficients:
intercept +93.3333
window_size_s -1.5125
watermark_delay_s -2.3250
parallelism -1.6125
window_size_s*watermark_delay_s +0.3750
window_size_s*parallelism -1.2500
watermark_delay_s*parallelism +2.9750
window_size_s^2 -3.5417
watermark_delay_s^2 -1.4167
parallelism^2 -0.4917
Curvature analysis:
window_size_s coef=-3.5417 concave (has a maximum)
watermark_delay_s coef=-1.4167 concave (has a maximum)
parallelism coef=-0.4917 concave (has a maximum)
Notable interactions:
watermark_delay_s*parallelism coef=+2.9750 (synergistic)
window_size_s*parallelism coef=-1.2500 (antagonistic)
window_size_s*watermark_delay_s coef=+0.3750 (synergistic)
Predicted optimum (from linear model, at observed points):
window_size_s = 62.5
watermark_delay_s = 1
parallelism = 4
Predicted value: 94.3642
Surface optimum (via L-BFGS-B, linear model):
window_size_s = 5
watermark_delay_s = 1
parallelism = 4
Predicted value: 95.8767
Model quality: Weak fit — consider adding center points or using a different design.
Factor importance:
1. window_size_s (effect: 4.9, contribution: 38.4%)
2. watermark_delay_s (effect: 4.6, contribution: 36.3%)
3. parallelism (effect: 3.2, contribution: 25.2%)