Stream Processing Windowing

Summary

This experiment investigates stream processing windowing. Box-Behnken design to tune Flink window size, watermark delay, and parallelism for latency and accuracy.

The design varies 3 factors: window size s (s), ranging from 5 to 120, watermark delay s (s), ranging from 1 to 30, and parallelism (slots), ranging from 4 to 32. The goal is to optimize 2 responses: end to end latency ms (ms) (minimize) and result accuracy (%) (maximize). Fixed conditions held constant across all runs include checkpoint interval ms = 10000, state backend = rocksdb.

A Box-Behnken design was chosen because it efficiently fits quadratic models with 3 continuous factors while avoiding extreme corner combinations — requiring only 15 runs instead of the 8 needed for a full factorial at two levels.

Quadratic response surface models were fitted to capture potential curvature and factor interactions. The RSM contour plots below visualize how pairs of factors jointly affect each response.

Key Findings

For end to end latency ms, the most influential factors were parallelism (39.9%), window size s (36.6%), watermark delay s (23.5%). The best observed value was 126.0 (at window size s = 62.5, watermark delay s = 15.5, parallelism = 18).

For result accuracy, the most influential factors were parallelism (38.5%), window size s (36.4%), watermark delay s (25.1%). The best observed value was 97.9 (at window size s = 62.5, watermark delay s = 15.5, parallelism = 18).

Recommended Next Steps

Run confirmation experiments at the predicted optimal settings to validate the model.
Consider whether any fixed factors should be varied in a future study.

Experimental Setup

Factors

Factor	Low	High	Unit
`window_size_s`	5	120	s
`watermark_delay_s`	1	30	s
`parallelism`	4	32	slots

Fixed: checkpoint_interval_ms = 10000, state_backend = rocksdb

Responses

Response	Direction	Unit
`end_to_end_latency_ms`	↓ minimize	ms
`result_accuracy`	↑ maximize	%

Configuration

use_cases/39_stream_processing_windowing/config.json

{
  "metadata": {
    "name": "Stream Processing Windowing",
    "description": "Box-Behnken design to tune Flink window size, watermark delay, and parallelism for latency and accuracy"
  },
  "factors": [
    {
      "name": "window_size_s",
      "levels": [
        "5",
        "120"
      ],
      "type": "continuous",
      "unit": "s"
    },
    {
      "name": "watermark_delay_s",
      "levels": [
        "1",
        "30"
      ],
      "type": "continuous",
      "unit": "s"
    },
    {
      "name": "parallelism",
      "levels": [
        "4",
        "32"
      ],
      "type": "continuous",
      "unit": "slots"
    }
  ],
  "fixed_factors": {
    "checkpoint_interval_ms": "10000",
    "state_backend": "rocksdb"
  },
  "responses": [
    {
      "name": "end_to_end_latency_ms",
      "optimize": "minimize",
      "unit": "ms"
    },
    {
      "name": "result_accuracy",
      "optimize": "maximize",
      "unit": "%"
    }
  ],
  "settings": {
    "operation": "box_behnken",
    "test_script": "use_cases/39_stream_processing_windowing/sim.sh"
  }
}

Experimental Matrix

The Box-Behnken Design produces 15 runs. Each row is one experiment with specific factor settings.

Run	`window_size_s`	`watermark_delay_s`	`parallelism`
1	62.5	1	4
2	62.5	15.5	18
3	120	15.5	32
4	120	15.5	4
5	62.5	15.5	18
6	62.5	15.5	18
7	5	15.5	32
8	120	1	18
9	62.5	1	32
10	120	30	18
11	5	15.5	4
12	62.5	30	32
13	5	1	18
14	5	30	18
15	62.5	30	4

Step-by-Step Workflow

1

Preview the design

Terminal

$ doe info --config use_cases/39_stream_processing_windowing/config.json

2

Generate the runner script

Terminal

$ doe generate --config use_cases/39_stream_processing_windowing/config.json \
    --output use_cases/39_stream_processing_windowing/results/run.sh --seed 42

3

Execute the experiments

Terminal

$ bash use_cases/39_stream_processing_windowing/results/run.sh

4

Analyze results

Terminal

$ doe analyze --config use_cases/39_stream_processing_windowing/config.json

5

Get optimization recommendations

Terminal

$ doe optimize --config use_cases/39_stream_processing_windowing/config.json

6

Multi-objective optimization

With 2 competing responses, use --multi to find the best compromise via Derringer–Suich desirability.

Terminal

$ doe optimize --config use_cases/39_stream_processing_windowing/config.json --multi

7

Generate the HTML report

Terminal

$ doe report --config use_cases/39_stream_processing_windowing/config.json \
    --output use_cases/39_stream_processing_windowing/results/report.html

Features Exercised

Feature	Value
Design type	`box_behnken`
Factor types	`continuous` (all 3)
Arg style	`double-dash`
Responses	2 (end_to_end_latency_ms ↓, result_accuracy ↑)
Total runs	15

Analysis Results

Generated from actual experiment runs using the DOE Helper Tool.

Response: end_to_end_latency_ms

Top factors: parallelism (39.9%), window_size_s (36.6%), watermark_delay_s (23.5%).

ANOVA

Source	DF	SS	MS	F	p-value
Source	DF	SS	MS	F	p-value
window_size_s	2	345764.4048	172882.2024	1.437	0.2930
watermark_delay_s	2	145040.1548	72520.0774	0.603	0.5704
parallelism	2	310956.1548	155478.0774	1.292	0.3264
Lack	of	Fit	6	314296.6190	52382.7698
Pure	Error	2	240648.0000
Error	8	554944.6190	120324.0000
Total	14	1356705.3333	96907.5238

Pareto Chart

Main Effects Plot

Normal Probability Plot of Effects

Normal probability plot for end_to_end_latency_ms

Half-Normal Plot of Effects

Half-normal plot for end_to_end_latency_ms

Model Diagnostics

Response: result_accuracy

Top factors: parallelism (38.5%), window_size_s (36.4%), watermark_delay_s (25.1%).

ANOVA

Source	DF	SS	MS	F	p-value
Source	DF	SS	MS	F	p-value
window_size_s	2	57.0333	28.5166	0.742	0.5063
watermark_delay_s	2	25.3847	12.6923	0.330	0.7281
parallelism	2	55.4050	27.7025	0.721	0.5155
Lack	of	Fit	6	93.5463	15.5911
Pure	Error	2	76.8800
Error	8	170.4263	38.4400
Total	14	308.2493	22.0178

Pareto Chart

Main Effects Plot

Normal Probability Plot of Effects

Normal probability plot for result_accuracy

Half-Normal Plot of Effects

Model Diagnostics

Response Surface Plots

3D surfaces fitted with quadratic RSM. Red dots are observed data points.

end to end latency ms watermark delay s vs parallelism

end to end latency ms window size s vs parallelism

end to end latency ms window size s vs watermark delay s

result accuracy watermark delay s vs parallelism

result accuracy window size s vs parallelism

result accuracy window size s vs watermark delay s

Multi-Objective Optimization

When responses compete, Derringer–Suich desirability finds the best compromise. Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.

Overall Desirability

D = 0.7094

Per-Response Desirability

Response	Weight	Desirability	Predicted	Dir
`end_to_end_latency_ms`	1.0	0.5755	568.00 0.5755 568.00 ms	↓
`result_accuracy`	1.5	0.8155	95.30 0.8155 95.30 %	↑

Recommended Settings

Factor	Value
`window_size_s`	120 s
`watermark_delay_s`	15.5 s
`parallelism`	4 slots

Source: from observed run #12

Trade-off Summary

Sacrifice = how much worse than single-objective best.

Response	Predicted	Best Observed	Sacrifice
`result_accuracy`	95.30	97.90	+2.60

Top 3 Runs by Desirability

Run	D	Factor Settings
#2	0.6524	window_size_s=62.5, watermark_delay_s=1, parallelism=32
#6	0.6443	window_size_s=62.5, watermark_delay_s=30, parallelism=4

Model Quality

Response	R²	Type
`result_accuracy`	0.0390	linear

Full Multi-Objective Output

doe optimize --multi
============================================================
MULTI-OBJECTIVE OPTIMIZATION
Method: Derringer-Suich Desirability Function
============================================================

Overall desirability: D = 0.7094

Response                  Weight Desirability    Predicted  Direction
---------------------------------------------------------------------
end_to_end_latency_ms        1.0       0.5755      568.00 ms   ↓
result_accuracy              1.5       0.8155       95.30 %   ↑

Recommended settings:
  window_size_s = 120 s
  watermark_delay_s = 15.5 s
  parallelism = 4 slots
  (from observed run #12)

Trade-off summary:
  end_to_end_latency_ms: 568.00 (best observed: 126.00, sacrifice: +442.00)
  result_accuracy: 95.30 (best observed: 97.90, sacrifice: +2.60)

Model quality:
  end_to_end_latency_ms: R² = 0.0720 (linear)
  result_accuracy: R² = 0.0390 (linear)

Top 3 observed runs by overall desirability:
  1. Run #12 (D=0.7094): window_size_s=120, watermark_delay_s=15.5, parallelism=4
  2. Run #2 (D=0.6524): window_size_s=62.5, watermark_delay_s=1, parallelism=32
  3. Run #6 (D=0.6443): window_size_s=62.5, watermark_delay_s=30, parallelism=4

Full Analysis Output

doe analyze
=== Main Effects: end_to_end_latency_ms ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
parallelism            394.2500      80.3772            39.9%
window_size_s          361.9643      80.3772            36.6%
watermark_delay_s      232.5357      80.3772            23.5%

=== ANOVA Table: end_to_end_latency_ms ===
Source                      DF           SS           MS          F    p-value
-----------------------------------------------------------------------------
window_size_s                2  345764.4048  172882.2024      1.437     0.2930
watermark_delay_s            2  145040.1548   72520.0774      0.603     0.5704
parallelism                  2  310956.1548  155478.0774      1.292     0.3264
Lack of Fit                  6  314296.6190   52382.7698      0.435     0.8183
Pure Error                   2  240648.0000  120324.0000
Error                        8  554944.6190  120324.0000
Total                       14 1356705.3333   96907.5238

=== Summary Statistics: end_to_end_latency_ms ===

window_size_s:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  120                 4   293.7500   179.9007   126.0000   510.0000
  5                   4   588.7500   373.7007   170.0000  1065.0000
  62.5                7   655.7143   287.1966   366.0000  1186.0000

watermark_delay_s:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1                   4   406.7500   219.6988   126.0000   647.0000
  15.5                7   639.2857   363.9692   170.0000  1186.0000
  30                  4   504.5000   301.1207   167.0000   900.0000

parallelism:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  18                  7   538.7143   353.2344   126.0000  1186.0000
  32                  4   346.5000   128.4199   170.0000   478.0000
  4                   4   740.7500   287.3017   488.0000  1065.0000

=== Main Effects: result_accuracy ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
parallelism              4.8500       1.2116            38.5%
window_size_s            4.5857       1.2116            36.4%
watermark_delay_s        3.1571       1.2116            25.1%

=== ANOVA Table: result_accuracy ===
Source                      DF           SS           MS          F    p-value
-----------------------------------------------------------------------------
window_size_s                2      57.0333      28.5166      0.742     0.5063
watermark_delay_s            2      25.3847      12.6923      0.330     0.7281
parallelism                  2      55.4050      27.7025      0.721     0.5155
Lack of Fit                  6      93.5463      15.5911      0.406     0.8346
Pure Error                   2      76.8800      38.4400
Error                        8     170.4263      38.4400
Total                       14     308.2493      22.0178

=== Summary Statistics: result_accuracy ===

window_size_s:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  120                 4    87.8000     5.0020    80.9000    92.6000
  5                   4    89.6250     4.9675    84.8000    94.3000
  62.5                7    92.3857     4.1257    86.1000    97.9000

watermark_delay_s:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1                   4    90.4750     2.7597    87.8000    93.5000
  15.5                7    91.5571     4.8483    84.8000    97.9000
  30                  4    88.4000     6.2976    80.9000    95.2000

parallelism:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  18                  7    89.6286     6.0753    80.9000    97.9000
  32                  4    88.7000     2.8925    84.8000    91.6000
  4                   4    93.5500     1.4480    92.1000    95.2000

Optimization Recommendations

doe optimize
=== Optimization: end_to_end_latency_ms ===
Direction: minimize

Best observed run: #7
  window_size_s = 62.5
  watermark_delay_s = 15.5
  parallelism = 18
  Value: 126.0

RSM Model (linear, R² = 0.0841, Adj R² = -0.1657):
  Coefficients:
    intercept                      +541.3333
    window_size_s                  -103.2500
    watermark_delay_s              -1.5000
    parallelism                    -60.0000

RSM Model (quadratic, R² = 0.3227, Adj R² = -0.8963):
  Coefficients:
    intercept                      +792.3333
    window_size_s                  -103.2500
    watermark_delay_s              -1.5000
    parallelism                    -60.0000
    window_size_s*watermark_delay_s -6.2500
    window_size_s*parallelism      +47.7500
    watermark_delay_s*parallelism  +139.2500
    window_size_s^2                -146.5417
    watermark_delay_s^2            -164.0417
    parallelism^2                  -160.0417

  Curvature analysis:
    watermark_delay_s              coef=-164.0417  concave (has a maximum)
    parallelism                    coef=-160.0417  concave (has a maximum)
    window_size_s                  coef=-146.5417  concave (has a maximum)

  Notable interactions:
    watermark_delay_s*parallelism  coef=+139.2500  (synergistic)
    window_size_s*parallelism      coef=+47.7500  (synergistic)
    window_size_s*watermark_delay_s coef=-6.2500  (antagonistic)

  Predicted optimum (from linear model, at observed points):
    window_size_s = 5
    watermark_delay_s = 15.5
    parallelism = 4
    Predicted value: 704.5833

  Surface optimum (via L-BFGS-B, linear model):
    window_size_s = 120
    watermark_delay_s = 30
    parallelism = 32
    Predicted value: 376.5833

  Model quality: Weak fit — consider adding center points or using a different design.

Factor importance:
  1. window_size_s  (effect: 226.6, contribution: 39.9%)
  2. parallelism  (effect: 197.9, contribution: 34.8%)
  3. watermark_delay_s  (effect: 143.6, contribution: 25.3%)

=== Optimization: result_accuracy ===
Direction: maximize

Best observed run: #10
  window_size_s = 62.5
  watermark_delay_s = 15.5
  parallelism = 18
  Value: 97.9

RSM Model (linear, R² = 0.2671, Adj R² = 0.0673):
  Coefficients:
    intercept                      +90.4267
    window_size_s                  -1.5125
    watermark_delay_s              -2.3250
    parallelism                    -1.6125

RSM Model (quadratic, R² = 0.5704, Adj R² = -0.2030):
  Coefficients:
    intercept                      +93.3333
    window_size_s                  -1.5125
    watermark_delay_s              -2.3250
    parallelism                    -1.6125
    window_size_s*watermark_delay_s +0.3750
    window_size_s*parallelism      -1.2500
    watermark_delay_s*parallelism  +2.9750
    window_size_s^2                -3.5417
    watermark_delay_s^2            -1.4167
    parallelism^2                  -0.4917

  Curvature analysis:
    window_size_s                  coef=-3.5417  concave (has a maximum)
    watermark_delay_s              coef=-1.4167  concave (has a maximum)
    parallelism                    coef=-0.4917  concave (has a maximum)

  Notable interactions:
    watermark_delay_s*parallelism  coef=+2.9750  (synergistic)
    window_size_s*parallelism      coef=-1.2500  (antagonistic)
    window_size_s*watermark_delay_s coef=+0.3750  (synergistic)

  Predicted optimum (from linear model, at observed points):
    window_size_s = 62.5
    watermark_delay_s = 1
    parallelism = 4
    Predicted value: 94.3642

  Surface optimum (via L-BFGS-B, linear model):
    window_size_s = 5
    watermark_delay_s = 1
    parallelism = 4
    Predicted value: 95.8767

  Model quality: Weak fit — consider adding center points or using a different design.

Factor importance:
  1. window_size_s  (effect: 4.9, contribution: 38.4%)
  2. watermark_delay_s  (effect: 4.6, contribution: 36.3%)
  3. parallelism  (effect: 3.2, contribution: 25.2%)