Message Queue Consumer Tuning

Summary

This experiment investigates message queue consumer tuning. Latin Hypercube exploration of 4 Kafka consumer parameters for throughput and lag.

The design varies 4 factors: fetch min bytes (bytes), ranging from 1 to 1048576, max poll records (records), ranging from 100 to 5000, num consumers (count), ranging from 1 to 12, and session timeout (ms), ranging from 6000 to 45000. The goal is to optimize 2 responses: throughput mbps (MB/s) (maximize) and consumer lag (records) (minimize). Fixed conditions held constant across all runs include partitions = 12, replication factor = 3.

Latin Hypercube Sampling was used to space 10 runs across the 4-dimensional factor space with good coverage and minimal gaps, making it ideal for computer experiments where the response surface may be complex.

Key Findings

For throughput mbps, the most influential factors were fetch min bytes (25.0%), max poll records (25.0%), num consumers (25.0%). The best observed value was 140.2 (at fetch min bytes = 970069, max poll records = 3331.44, num consumers = 4.31814).

For consumer lag, the most influential factors were fetch min bytes (25.0%), max poll records (25.0%), num consumers (25.0%). The best observed value was 41053.0 (at fetch min bytes = 970069, max poll records = 3331.44, num consumers = 4.31814).

Recommended Next Steps

Consider whether any fixed factors should be varied in a future study.

Experimental Setup

Factors

Factor	Low	High	Unit
`fetch_min_bytes`	1	1048576	bytes
`max_poll_records`	100	5000	records
`num_consumers`	1	12	count
`session_timeout`	6000	45000	ms

Fixed: partitions = 12, replication_factor = 3

Responses

Response	Direction	Unit
`throughput_mbps`	↑ maximize	MB/s
`consumer_lag`	↓ minimize	records

Configuration

use_cases/36_message_queue_consumer/config.json

{
  "metadata": {
    "name": "Message Queue Consumer Tuning",
    "description": "Latin Hypercube exploration of 4 Kafka consumer parameters for throughput and lag"
  },
  "factors": [
    {
      "name": "fetch_min_bytes",
      "levels": [
        "1",
        "1048576"
      ],
      "type": "continuous",
      "unit": "bytes"
    },
    {
      "name": "max_poll_records",
      "levels": [
        "100",
        "5000"
      ],
      "type": "continuous",
      "unit": "records"
    },
    {
      "name": "num_consumers",
      "levels": [
        "1",
        "12"
      ],
      "type": "continuous",
      "unit": "count"
    },
    {
      "name": "session_timeout",
      "levels": [
        "6000",
        "45000"
      ],
      "type": "continuous",
      "unit": "ms"
    }
  ],
  "fixed_factors": {
    "partitions": "12",
    "replication_factor": "3"
  },
  "responses": [
    {
      "name": "throughput_mbps",
      "optimize": "maximize",
      "unit": "MB/s"
    },
    {
      "name": "consumer_lag",
      "optimize": "minimize",
      "unit": "records"
    }
  ],
  "settings": {
    "operation": "latin_hypercube",
    "test_script": "use_cases/36_message_queue_consumer/sim.sh"
  }
}

Experimental Matrix

The Latin Hypercube Design produces 10 runs. Each row is one experiment with specific factor settings.

Run	`fetch_min_bytes`	`max_poll_records`	`num_consumers`	`session_timeout`
1	986235	1075.86	7.47733	41717.7
2	920030	3801.11	8.72204	9906.16
3	830968	2339.27	11.8311	6732.94
4	58843.3	1660.44	1.27799	30255.8
5	141574	4320.66	6.08947	37036.4
6	242682	4643.54	4.16848	14297.4
7	549392	115.015	10.6907	40944.8
8	447946	2984.3	2.8737	23822.4
9	716869	3150.88	8.31272	26845.4
10	343136	1091.19	5.13086	18621.8

Step-by-Step Workflow

1

Preview the design

Terminal

$ doe info --config use_cases/36_message_queue_consumer/config.json

2

Generate the runner script

Terminal

$ doe generate --config use_cases/36_message_queue_consumer/config.json \
    --output use_cases/36_message_queue_consumer/results/run.sh --seed 42

3

Execute the experiments

Terminal

$ bash use_cases/36_message_queue_consumer/results/run.sh

4

Analyze results

Terminal

$ doe analyze --config use_cases/36_message_queue_consumer/config.json

5

Get optimization recommendations

Terminal

$ doe optimize --config use_cases/36_message_queue_consumer/config.json

6

Multi-objective optimization

With 2 competing responses, use --multi to find the best compromise via Derringer–Suich desirability.

Terminal

$ doe optimize --config use_cases/36_message_queue_consumer/config.json --multi

7

Generate the HTML report

Terminal

$ doe report --config use_cases/36_message_queue_consumer/config.json \
    --output use_cases/36_message_queue_consumer/results/report.html

Features Exercised

Feature	Value
Design type	`latin_hypercube`
Factor types	`continuous` (all 4)
Arg style	`double-dash`
Responses	2 (throughput_mbps ↑, consumer_lag ↓)
Total runs	10

Analysis Results

Generated from actual experiment runs using the DOE Helper Tool.

Response: throughput_mbps

Top factors: fetch_min_bytes (25.0%), max_poll_records (25.0%), num_consumers (25.0%).

ANOVA

Source	DF	SS	MS	F	p-value
Source	DF	SS	MS	F	p-value
fetch_min_bytes	9	3545.0040	393.8893
max_poll_records	9	3545.0040	393.8893
num_consumers	9	3545.0040	393.8893
session_timeout	9	3545.0040	393.8893
Error	(Lenth	PSE)	0	0.0000	0.0000
Total	9	3545.0040	393.8893

Pareto Chart

Main Effects Plot

Normal Probability Plot of Effects

Normal probability plot for throughput_mbps

Half-Normal Plot of Effects

Model Diagnostics

Response: consumer_lag

Top factors: fetch_min_bytes (25.0%), max_poll_records (25.0%), num_consumers (25.0%).

ANOVA

Source	DF	SS	MS	F	p-value
Source	DF	SS	MS	F	p-value
fetch_min_bytes	9	752536281.6000	83615142.4000
max_poll_records	9	752536281.6000	83615142.4000
num_consumers	9	752536281.6000	83615142.4000
session_timeout	9	752536281.6000	83615142.4000
Error	(Lenth	PSE)	0	0.0000	0.0000
Total	9	752536281.6000	83615142.4000

Pareto Chart

Main Effects Plot

Normal Probability Plot of Effects

Normal probability plot for consumer_lag

Half-Normal Plot of Effects

Model Diagnostics

Response Surface Plots

3D surfaces fitted with quadratic RSM. Red dots are observed data points.

consumer lag fetch min bytes vs max poll records

consumer lag fetch min bytes vs num consumers

consumer lag fetch min bytes vs session timeout

consumer lag max poll records vs num consumers

consumer lag max poll records vs session timeout

consumer lag num consumers vs session timeout

throughput mbps fetch min bytes vs max poll records

throughput mbps fetch min bytes vs num consumers

throughput mbps fetch min bytes vs session timeout

throughput mbps max poll records vs num consumers

throughput mbps max poll records vs session timeout

throughput mbps num consumers vs session timeout

Multi-Objective Optimization

When responses compete, Derringer–Suich desirability finds the best compromise. Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.

Overall Desirability

D = 1.0000

Per-Response Desirability

Response	Weight	Desirability	Predicted	Dir
`throughput_mbps`	1.5	1.0000	153.79 1.0000 153.79 MB/s	↑
`consumer_lag`	1.0	1.0000	36100.05 1.0000 36100.05 records	↓

Recommended Settings

Factor	Value
`fetch_min_bytes`	9.852e+05 bytes
`max_poll_records`	4770 records
`num_consumers`	9.231 count
`session_timeout`	2.643e+04 ms

Source: from RSM model prediction

Trade-off Summary

Sacrifice = how much worse than single-objective best.

Response	Predicted	Best Observed	Sacrifice
`consumer_lag`	36100.05	41053.00	-4952.95

Top 3 Runs by Desirability

Run	D	Factor Settings
#7	0.7896	fetch_min_bytes=454005, max_poll_records=4632.03, num_consumers=9.17463, session_timeout=41435.9
#4	0.7874	fetch_min_bytes=799475, max_poll_records=311.355, num_consumers=10.5724, session_timeout=18028.1

Model Quality

Response	R²	Type
`consumer_lag`	0.5716	linear

Full Multi-Objective Output

doe optimize --multi
============================================================
MULTI-OBJECTIVE OPTIMIZATION
Method: Derringer-Suich Desirability Function
============================================================

Overall desirability: D = 1.0000

Response                  Weight Desirability    Predicted  Direction
---------------------------------------------------------------------
throughput_mbps              1.5       1.0000      153.79 MB/s   ↑
consumer_lag                 1.0       1.0000    36100.05 records   ↓

Recommended settings:
  fetch_min_bytes = 9.852e+05 bytes
  max_poll_records = 4770 records
  num_consumers = 9.231 count
  session_timeout = 2.643e+04 ms
  (from RSM model prediction)

Trade-off summary:
  throughput_mbps: 153.79 (best observed: 140.20, sacrifice: -13.59)
  consumer_lag: 36100.05 (best observed: 41053.00, sacrifice: -4952.95)

Model quality:
  throughput_mbps: R² = 0.7799 (linear)
  consumer_lag: R² = 0.5716 (linear)

Top 3 observed runs by overall desirability:
  1. Run #8 (D=0.9545): fetch_min_bytes=870780, max_poll_records=885.544, num_consumers=8.09972, session_timeout=40153.6
  2. Run #7 (D=0.7896): fetch_min_bytes=454005, max_poll_records=4632.03, num_consumers=9.17463, session_timeout=41435.9
  3. Run #4 (D=0.7874): fetch_min_bytes=799475, max_poll_records=311.355, num_consumers=10.5724, session_timeout=18028.1

Full Analysis Output

doe analyze
=== Main Effects: throughput_mbps ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
fetch_min_bytes         56.6000       6.2761            25.0%
max_poll_records        56.6000       6.2761            25.0%
num_consumers           56.6000       6.2761            25.0%
session_timeout         56.6000       6.2761            25.0%

=== ANOVA Table: throughput_mbps ===
Source                      DF           SS           MS          F    p-value
-----------------------------------------------------------------------------
fetch_min_bytes              9    3545.0040     393.8893                      
max_poll_records             9    3545.0040     393.8893                      
num_consumers                9    3545.0040     393.8893                      
session_timeout              9    3545.0040     393.8893                      
Error (Lenth PSE)            0       0.0000       0.0000
Total                        9    3545.0040     393.8893
  Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design)

=== Summary Statistics: throughput_mbps ===

fetch_min_bytes:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1.0336e+06          1   119.3000     0.0000   119.3000   119.3000
  15356.2             1   127.7000     0.0000   127.7000   127.7000
  163954              1   140.2000     0.0000   140.2000   140.2000
  299933              1    83.6000     0.0000    83.6000    83.6000
  380045              1    92.5000     0.0000    92.5000    92.5000
  520612              1   127.6000     0.0000   127.6000   127.6000
  609944              1   108.2000     0.0000   108.2000   108.2000
  698210              1    83.7000     0.0000    83.7000    83.7000
  757530              1   112.5000     0.0000   112.5000   112.5000
  892524              1   126.3000     0.0000   126.3000   126.3000

max_poll_records:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1319.92             1   127.6000     0.0000   127.6000   127.6000
  1574.92             1   126.3000     0.0000   126.3000   126.3000
  2085.21             1    83.6000     0.0000    83.6000    83.6000
  220.517             1   108.2000     0.0000   108.2000   108.2000
  2733.33             1   140.2000     0.0000   140.2000   140.2000
  3360.07             1    83.7000     0.0000    83.7000    83.7000
  3813.25             1   127.7000     0.0000   127.7000   127.7000
  4156.63             1   112.5000     0.0000   112.5000   112.5000
  4871.73             1    92.5000     0.0000    92.5000    92.5000
  878.977             1   119.3000     0.0000   119.3000   119.3000

num_consumers:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1.79478             1    83.7000     0.0000    83.7000    83.7000
  10.3309             1    83.6000     0.0000    83.6000    83.6000
  11.5825             1   127.7000     0.0000   127.7000   127.7000
  3.09512             1    92.5000     0.0000    92.5000    92.5000
  4.04136             1   126.3000     0.0000   126.3000   126.3000
  5.16223             1   127.6000     0.0000   127.6000   127.6000
  6.31549             1   119.3000     0.0000   119.3000   119.3000
  7.37684             1   140.2000     0.0000   140.2000   140.2000
  7.69578             1   112.5000     0.0000   112.5000   112.5000
  8.75552             1   108.2000     0.0000   108.2000   108.2000

session_timeout:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  12852               1    83.7000     0.0000    83.7000    83.7000
  16963.7             1   127.6000     0.0000   127.6000   127.6000
  20531.3             1    83.6000     0.0000    83.6000    83.6000
  21757.5             1   108.2000     0.0000   108.2000   108.2000
  25760.8             1   112.5000     0.0000   112.5000   112.5000
  31653.8             1   140.2000     0.0000   140.2000   140.2000
  36484.3             1   127.7000     0.0000   127.7000   127.7000
  39865.3             1    92.5000     0.0000    92.5000    92.5000
  43991               1   119.3000     0.0000   119.3000   119.3000
  7673.13             1   126.3000     0.0000   126.3000   126.3000

=== Main Effects: consumer_lag ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
fetch_min_bytes      26945.0000    2891.6283            25.0%
max_poll_records     26945.0000    2891.6283            25.0%
num_consumers        26945.0000    2891.6283            25.0%
session_timeout      26945.0000    2891.6283            25.0%

=== ANOVA Table: consumer_lag ===
Source                      DF           SS           MS          F    p-value
-----------------------------------------------------------------------------
fetch_min_bytes              9 752536281.6000 83615142.4000                      
max_poll_records             9 752536281.6000 83615142.4000                      
num_consumers                9 752536281.6000 83615142.4000                      
session_timeout              9 752536281.6000 83615142.4000                      
Error (Lenth PSE)            0       0.0000       0.0000
Total                        9 752536281.6000 83615142.4000
  Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design)

=== Summary Statistics: consumer_lag ===

fetch_min_bytes:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1.0336e+06          1 49398.0000     0.0000 49398.0000 49398.0000
  15356.2             1 44250.0000     0.0000 44250.0000 44250.0000
  163954              1 41053.0000     0.0000 41053.0000 41053.0000
  299933              1 67998.0000     0.0000 67998.0000 67998.0000
  380045              1 59790.0000     0.0000 59790.0000 59790.0000
  520612              1 44350.0000     0.0000 44350.0000 44350.0000
  609944              1 47997.0000     0.0000 47997.0000 47997.0000
  698210              1 65248.0000     0.0000 65248.0000 65248.0000
  757530              1 53359.0000     0.0000 53359.0000 53359.0000
  892524              1 53445.0000     0.0000 53445.0000 53445.0000

max_poll_records:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1319.92             1 44350.0000     0.0000 44350.0000 44350.0000
  1574.92             1 53445.0000     0.0000 53445.0000 53445.0000
  2085.21             1 67998.0000     0.0000 67998.0000 67998.0000
  220.517             1 47997.0000     0.0000 47997.0000 47997.0000
  2733.33             1 41053.0000     0.0000 41053.0000 41053.0000
  3360.07             1 65248.0000     0.0000 65248.0000 65248.0000
  3813.25             1 44250.0000     0.0000 44250.0000 44250.0000
  4156.63             1 53359.0000     0.0000 53359.0000 53359.0000
  4871.73             1 59790.0000     0.0000 59790.0000 59790.0000
  878.977             1 49398.0000     0.0000 49398.0000 49398.0000

num_consumers:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1.79478             1 65248.0000     0.0000 65248.0000 65248.0000
  10.3309             1 67998.0000     0.0000 67998.0000 67998.0000
  11.5825             1 44250.0000     0.0000 44250.0000 44250.0000
  3.09512             1 59790.0000     0.0000 59790.0000 59790.0000
  4.04136             1 53445.0000     0.0000 53445.0000 53445.0000
  5.16223             1 44350.0000     0.0000 44350.0000 44350.0000
  6.31549             1 49398.0000     0.0000 49398.0000 49398.0000
  7.37684             1 41053.0000     0.0000 41053.0000 41053.0000
  7.69578             1 53359.0000     0.0000 53359.0000 53359.0000
  8.75552             1 47997.0000     0.0000 47997.0000 47997.0000

session_timeout:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  12852               1 65248.0000     0.0000 65248.0000 65248.0000
  16963.7             1 44350.0000     0.0000 44350.0000 44350.0000
  20531.3             1 67998.0000     0.0000 67998.0000 67998.0000
  21757.5             1 47997.0000     0.0000 47997.0000 47997.0000
  25760.8             1 53359.0000     0.0000 53359.0000 53359.0000
  31653.8             1 41053.0000     0.0000 41053.0000 41053.0000
  36484.3             1 44250.0000     0.0000 44250.0000 44250.0000
  39865.3             1 59790.0000     0.0000 59790.0000 59790.0000
  43991               1 49398.0000     0.0000 49398.0000 49398.0000
  7673.13             1 53445.0000     0.0000 53445.0000 53445.0000

Optimization Recommendations

doe optimize
=== Optimization: throughput_mbps ===
Direction: maximize

Best observed run: #8
  fetch_min_bytes = 970069
  max_poll_records = 3331.44
  num_consumers = 4.31814
  session_timeout = 12633.7
  Value: 140.2

RSM Model (linear, R² = 0.5593, Adj R² = 0.2067):
  Coefficients:
    intercept                      +112.2549
    fetch_min_bytes                +13.0494
    max_poll_records               -18.7241
    num_consumers                  -13.1177
    session_timeout                -1.8402

  Predicted optimum (from linear model, at observed points):
    fetch_min_bytes = 510416
    max_poll_records = 438.719
    num_consumers = 3.95258
    session_timeout = 33998.2
    Predicted value: 133.3187

  Surface optimum (via L-BFGS-B, linear model):
    fetch_min_bytes = 1.04858e+06
    max_poll_records = 100
    num_consumers = 1
    session_timeout = 6000
    Predicted value: 158.9862

  Model quality: Moderate fit — use predictions directionally, not precisely.

Factor importance:
  1. fetch_min_bytes  (effect: 56.6, contribution: 25.0%)
  2. max_poll_records  (effect: 56.6, contribution: 25.0%)
  3. num_consumers  (effect: 56.6, contribution: 25.0%)
  4. session_timeout  (effect: 56.6, contribution: 25.0%)

=== Optimization: consumer_lag ===
Direction: minimize

Best observed run: #8
  fetch_min_bytes = 970069
  max_poll_records = 3331.44
  num_consumers = 4.31814
  session_timeout = 12633.7
  Value: 41053.0

RSM Model (linear, R² = 0.7825, Adj R² = 0.6085):
  Coefficients:
    intercept                      +52652.1380
    fetch_min_bytes                -6703.0612
    max_poll_records               +9372.9667
    num_consumers                  +8354.2386
    session_timeout                +1202.9993

  Predicted optimum (from linear model, at observed points):
    fetch_min_bytes = 80107.4
    max_poll_records = 2501.35
    num_consumers = 11.0078
    session_timeout = 17155.5
    Predicted value: 64477.2543

  Surface optimum (via L-BFGS-B, linear model):
    fetch_min_bytes = 1.04858e+06
    max_poll_records = 100
    num_consumers = 1
    session_timeout = 6000
    Predicted value: 27018.8722

  Model quality: Good fit — general trends are captured, some noise remains.

Factor importance:
  1. fetch_min_bytes  (effect: 26945.0, contribution: 25.0%)
  2. max_poll_records  (effect: 26945.0, contribution: 25.0%)
  3. num_consumers  (effect: 26945.0, contribution: 25.0%)
  4. session_timeout  (effect: 26945.0, contribution: 25.0%)