← All Use Cases
Latin Hypercube Design

Message Queue Consumer Tuning

Latin Hypercube exploration of 4 Kafka consumer parameters for throughput and lag

Summary

This experiment investigates message queue consumer tuning. Latin Hypercube exploration of 4 Kafka consumer parameters for throughput and lag.

The design varies 4 factors: fetch min bytes (bytes), ranging from 1 to 1048576, max poll records (records), ranging from 100 to 5000, num consumers (count), ranging from 1 to 12, and session timeout (ms), ranging from 6000 to 45000. The goal is to optimize 2 responses: throughput mbps (MB/s) (maximize) and consumer lag (records) (minimize). Fixed conditions held constant across all runs include partitions = 12, replication factor = 3.

Latin Hypercube Sampling was used to space 10 runs across the 4-dimensional factor space with good coverage and minimal gaps, making it ideal for computer experiments where the response surface may be complex.

Key Findings

For throughput mbps, the most influential factors were fetch min bytes (25.0%), max poll records (25.0%), num consumers (25.0%). The best observed value was 140.2 (at fetch min bytes = 970069, max poll records = 3331.44, num consumers = 4.31814).

For consumer lag, the most influential factors were fetch min bytes (25.0%), max poll records (25.0%), num consumers (25.0%). The best observed value was 41053.0 (at fetch min bytes = 970069, max poll records = 3331.44, num consumers = 4.31814).

Recommended Next Steps

Experimental Setup

Factors

FactorLowHighUnit
fetch_min_bytes11048576bytes
max_poll_records1005000records
num_consumers112count
session_timeout600045000ms

Fixed: partitions = 12, replication_factor = 3

Responses

ResponseDirectionUnit
throughput_mbps↑ maximizeMB/s
consumer_lag↓ minimizerecords

Configuration

use_cases/36_message_queue_consumer/config.json
{ "metadata": { "name": "Message Queue Consumer Tuning", "description": "Latin Hypercube exploration of 4 Kafka consumer parameters for throughput and lag" }, "factors": [ { "name": "fetch_min_bytes", "levels": [ "1", "1048576" ], "type": "continuous", "unit": "bytes" }, { "name": "max_poll_records", "levels": [ "100", "5000" ], "type": "continuous", "unit": "records" }, { "name": "num_consumers", "levels": [ "1", "12" ], "type": "continuous", "unit": "count" }, { "name": "session_timeout", "levels": [ "6000", "45000" ], "type": "continuous", "unit": "ms" } ], "fixed_factors": { "partitions": "12", "replication_factor": "3" }, "responses": [ { "name": "throughput_mbps", "optimize": "maximize", "unit": "MB/s" }, { "name": "consumer_lag", "optimize": "minimize", "unit": "records" } ], "settings": { "operation": "latin_hypercube", "test_script": "use_cases/36_message_queue_consumer/sim.sh" } }

Experimental Matrix

The Latin Hypercube Design produces 10 runs. Each row is one experiment with specific factor settings.

Runfetch_min_bytesmax_poll_recordsnum_consumerssession_timeout
19862351075.867.4773341717.7
29200303801.118.722049906.16
38309682339.2711.83116732.94
458843.31660.441.2779930255.8
51415744320.666.0894737036.4
62426824643.544.1684814297.4
7549392115.01510.690740944.8
84479462984.32.873723822.4
97168693150.888.3127226845.4
103431361091.195.1308618621.8

Step-by-Step Workflow

1

Preview the design

Terminal
$ doe info --config use_cases/36_message_queue_consumer/config.json
2

Generate the runner script

Terminal
$ doe generate --config use_cases/36_message_queue_consumer/config.json \ --output use_cases/36_message_queue_consumer/results/run.sh --seed 42
3

Execute the experiments

Terminal
$ bash use_cases/36_message_queue_consumer/results/run.sh
4

Analyze results

Terminal
$ doe analyze --config use_cases/36_message_queue_consumer/config.json
5

Get optimization recommendations

Terminal
$ doe optimize --config use_cases/36_message_queue_consumer/config.json
6

Multi-objective optimization

With 2 competing responses, use --multi to find the best compromise via Derringer–Suich desirability.

Terminal
$ doe optimize --config use_cases/36_message_queue_consumer/config.json --multi
7

Generate the HTML report

Terminal
$ doe report --config use_cases/36_message_queue_consumer/config.json \ --output use_cases/36_message_queue_consumer/results/report.html

Features Exercised

FeatureValue
Design typelatin_hypercube
Factor typescontinuous (all 4)
Arg styledouble-dash
Responses2 (throughput_mbps ↑, consumer_lag ↓)
Total runs10

Analysis Results

Generated from actual experiment runs using the DOE Helper Tool.

Response: throughput_mbps

Top factors: fetch_min_bytes (25.0%), max_poll_records (25.0%), num_consumers (25.0%).

ANOVA

SourceDFSSMSFp-value
SourceDFSSMSFp-value
fetch_min_bytes93545.0040393.8893
max_poll_records93545.0040393.8893
num_consumers93545.0040393.8893
session_timeout93545.0040393.8893
Error(LenthPSE)00.00000.0000
Total93545.0040393.8893

Pareto Chart

Pareto chart for throughput_mbps

Main Effects Plot

Main effects plot for throughput_mbps

Normal Probability Plot of Effects

Normal probability plot for throughput_mbps

Half-Normal Plot of Effects

Half-normal plot for throughput_mbps

Model Diagnostics

Model diagnostics for throughput_mbps

Response: consumer_lag

Top factors: fetch_min_bytes (25.0%), max_poll_records (25.0%), num_consumers (25.0%).

ANOVA

SourceDFSSMSFp-value
SourceDFSSMSFp-value
fetch_min_bytes9752536281.600083615142.4000
max_poll_records9752536281.600083615142.4000
num_consumers9752536281.600083615142.4000
session_timeout9752536281.600083615142.4000
Error(LenthPSE)00.00000.0000
Total9752536281.600083615142.4000

Pareto Chart

Pareto chart for consumer_lag

Main Effects Plot

Main effects plot for consumer_lag

Normal Probability Plot of Effects

Normal probability plot for consumer_lag

Half-Normal Plot of Effects

Half-normal plot for consumer_lag

Model Diagnostics

Model diagnostics for consumer_lag

Response Surface Plots

3D surfaces fitted with quadratic RSM. Red dots are observed data points.

consumer lag fetch min bytes vs max poll records

RSM surface: consumer lag fetch min bytes vs max poll records

consumer lag fetch min bytes vs num consumers

RSM surface: consumer lag fetch min bytes vs num consumers

consumer lag fetch min bytes vs session timeout

RSM surface: consumer lag fetch min bytes vs session timeout

consumer lag max poll records vs num consumers

RSM surface: consumer lag max poll records vs num consumers

consumer lag max poll records vs session timeout

RSM surface: consumer lag max poll records vs session timeout

consumer lag num consumers vs session timeout

RSM surface: consumer lag num consumers vs session timeout

throughput mbps fetch min bytes vs max poll records

RSM surface: throughput mbps fetch min bytes vs max poll records

throughput mbps fetch min bytes vs num consumers

RSM surface: throughput mbps fetch min bytes vs num consumers

throughput mbps fetch min bytes vs session timeout

RSM surface: throughput mbps fetch min bytes vs session timeout

throughput mbps max poll records vs num consumers

RSM surface: throughput mbps max poll records vs num consumers

throughput mbps max poll records vs session timeout

RSM surface: throughput mbps max poll records vs session timeout

throughput mbps num consumers vs session timeout

RSM surface: throughput mbps num consumers vs session timeout

Multi-Objective Optimization

When responses compete, Derringer–Suich desirability finds the best compromise. Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.

Overall Desirability
D = 1.0000

Per-Response Desirability

ResponseWeightDesirabilityPredictedDir
throughput_mbps 1.5
1.0000
153.79 1.0000 153.79 MB/s
consumer_lag 1.0
1.0000
36100.05 1.0000 36100.05 records

Recommended Settings

FactorValue
fetch_min_bytes9.852e+05 bytes
max_poll_records4770 records
num_consumers9.231 count
session_timeout2.643e+04 ms

Source: from RSM model prediction

Trade-off Summary

Sacrifice = how much worse than single-objective best.

ResponsePredictedBest ObservedSacrifice
consumer_lag36100.0541053.00-4952.95

Top 3 Runs by Desirability

RunDFactor Settings
#70.7896fetch_min_bytes=454005, max_poll_records=4632.03, num_consumers=9.17463, session_timeout=41435.9
#40.7874fetch_min_bytes=799475, max_poll_records=311.355, num_consumers=10.5724, session_timeout=18028.1

Model Quality

ResponseType
consumer_lag0.5716linear

Full Multi-Objective Output

doe optimize --multi
============================================================ MULTI-OBJECTIVE OPTIMIZATION Method: Derringer-Suich Desirability Function ============================================================ Overall desirability: D = 1.0000 Response Weight Desirability Predicted Direction --------------------------------------------------------------------- throughput_mbps 1.5 1.0000 153.79 MB/s ↑ consumer_lag 1.0 1.0000 36100.05 records ↓ Recommended settings: fetch_min_bytes = 9.852e+05 bytes max_poll_records = 4770 records num_consumers = 9.231 count session_timeout = 2.643e+04 ms (from RSM model prediction) Trade-off summary: throughput_mbps: 153.79 (best observed: 140.20, sacrifice: -13.59) consumer_lag: 36100.05 (best observed: 41053.00, sacrifice: -4952.95) Model quality: throughput_mbps: R² = 0.7799 (linear) consumer_lag: R² = 0.5716 (linear) Top 3 observed runs by overall desirability: 1. Run #8 (D=0.9545): fetch_min_bytes=870780, max_poll_records=885.544, num_consumers=8.09972, session_timeout=40153.6 2. Run #7 (D=0.7896): fetch_min_bytes=454005, max_poll_records=4632.03, num_consumers=9.17463, session_timeout=41435.9 3. Run #4 (D=0.7874): fetch_min_bytes=799475, max_poll_records=311.355, num_consumers=10.5724, session_timeout=18028.1

Full Analysis Output

doe analyze
=== Main Effects: throughput_mbps === Factor Effect Std Error % Contribution -------------------------------------------------------------- fetch_min_bytes 56.6000 6.2761 25.0% max_poll_records 56.6000 6.2761 25.0% num_consumers 56.6000 6.2761 25.0% session_timeout 56.6000 6.2761 25.0% === ANOVA Table: throughput_mbps === Source DF SS MS F p-value ----------------------------------------------------------------------------- fetch_min_bytes 9 3545.0040 393.8893 max_poll_records 9 3545.0040 393.8893 num_consumers 9 3545.0040 393.8893 session_timeout 9 3545.0040 393.8893 Error (Lenth PSE) 0 0.0000 0.0000 Total 9 3545.0040 393.8893 Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design) === Summary Statistics: throughput_mbps === fetch_min_bytes: Level N Mean Std Min Max ------------------------------------------------------------ 1.0336e+06 1 119.3000 0.0000 119.3000 119.3000 15356.2 1 127.7000 0.0000 127.7000 127.7000 163954 1 140.2000 0.0000 140.2000 140.2000 299933 1 83.6000 0.0000 83.6000 83.6000 380045 1 92.5000 0.0000 92.5000 92.5000 520612 1 127.6000 0.0000 127.6000 127.6000 609944 1 108.2000 0.0000 108.2000 108.2000 698210 1 83.7000 0.0000 83.7000 83.7000 757530 1 112.5000 0.0000 112.5000 112.5000 892524 1 126.3000 0.0000 126.3000 126.3000 max_poll_records: Level N Mean Std Min Max ------------------------------------------------------------ 1319.92 1 127.6000 0.0000 127.6000 127.6000 1574.92 1 126.3000 0.0000 126.3000 126.3000 2085.21 1 83.6000 0.0000 83.6000 83.6000 220.517 1 108.2000 0.0000 108.2000 108.2000 2733.33 1 140.2000 0.0000 140.2000 140.2000 3360.07 1 83.7000 0.0000 83.7000 83.7000 3813.25 1 127.7000 0.0000 127.7000 127.7000 4156.63 1 112.5000 0.0000 112.5000 112.5000 4871.73 1 92.5000 0.0000 92.5000 92.5000 878.977 1 119.3000 0.0000 119.3000 119.3000 num_consumers: Level N Mean Std Min Max ------------------------------------------------------------ 1.79478 1 83.7000 0.0000 83.7000 83.7000 10.3309 1 83.6000 0.0000 83.6000 83.6000 11.5825 1 127.7000 0.0000 127.7000 127.7000 3.09512 1 92.5000 0.0000 92.5000 92.5000 4.04136 1 126.3000 0.0000 126.3000 126.3000 5.16223 1 127.6000 0.0000 127.6000 127.6000 6.31549 1 119.3000 0.0000 119.3000 119.3000 7.37684 1 140.2000 0.0000 140.2000 140.2000 7.69578 1 112.5000 0.0000 112.5000 112.5000 8.75552 1 108.2000 0.0000 108.2000 108.2000 session_timeout: Level N Mean Std Min Max ------------------------------------------------------------ 12852 1 83.7000 0.0000 83.7000 83.7000 16963.7 1 127.6000 0.0000 127.6000 127.6000 20531.3 1 83.6000 0.0000 83.6000 83.6000 21757.5 1 108.2000 0.0000 108.2000 108.2000 25760.8 1 112.5000 0.0000 112.5000 112.5000 31653.8 1 140.2000 0.0000 140.2000 140.2000 36484.3 1 127.7000 0.0000 127.7000 127.7000 39865.3 1 92.5000 0.0000 92.5000 92.5000 43991 1 119.3000 0.0000 119.3000 119.3000 7673.13 1 126.3000 0.0000 126.3000 126.3000 === Main Effects: consumer_lag === Factor Effect Std Error % Contribution -------------------------------------------------------------- fetch_min_bytes 26945.0000 2891.6283 25.0% max_poll_records 26945.0000 2891.6283 25.0% num_consumers 26945.0000 2891.6283 25.0% session_timeout 26945.0000 2891.6283 25.0% === ANOVA Table: consumer_lag === Source DF SS MS F p-value ----------------------------------------------------------------------------- fetch_min_bytes 9 752536281.6000 83615142.4000 max_poll_records 9 752536281.6000 83615142.4000 num_consumers 9 752536281.6000 83615142.4000 session_timeout 9 752536281.6000 83615142.4000 Error (Lenth PSE) 0 0.0000 0.0000 Total 9 752536281.6000 83615142.4000 Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design) === Summary Statistics: consumer_lag === fetch_min_bytes: Level N Mean Std Min Max ------------------------------------------------------------ 1.0336e+06 1 49398.0000 0.0000 49398.0000 49398.0000 15356.2 1 44250.0000 0.0000 44250.0000 44250.0000 163954 1 41053.0000 0.0000 41053.0000 41053.0000 299933 1 67998.0000 0.0000 67998.0000 67998.0000 380045 1 59790.0000 0.0000 59790.0000 59790.0000 520612 1 44350.0000 0.0000 44350.0000 44350.0000 609944 1 47997.0000 0.0000 47997.0000 47997.0000 698210 1 65248.0000 0.0000 65248.0000 65248.0000 757530 1 53359.0000 0.0000 53359.0000 53359.0000 892524 1 53445.0000 0.0000 53445.0000 53445.0000 max_poll_records: Level N Mean Std Min Max ------------------------------------------------------------ 1319.92 1 44350.0000 0.0000 44350.0000 44350.0000 1574.92 1 53445.0000 0.0000 53445.0000 53445.0000 2085.21 1 67998.0000 0.0000 67998.0000 67998.0000 220.517 1 47997.0000 0.0000 47997.0000 47997.0000 2733.33 1 41053.0000 0.0000 41053.0000 41053.0000 3360.07 1 65248.0000 0.0000 65248.0000 65248.0000 3813.25 1 44250.0000 0.0000 44250.0000 44250.0000 4156.63 1 53359.0000 0.0000 53359.0000 53359.0000 4871.73 1 59790.0000 0.0000 59790.0000 59790.0000 878.977 1 49398.0000 0.0000 49398.0000 49398.0000 num_consumers: Level N Mean Std Min Max ------------------------------------------------------------ 1.79478 1 65248.0000 0.0000 65248.0000 65248.0000 10.3309 1 67998.0000 0.0000 67998.0000 67998.0000 11.5825 1 44250.0000 0.0000 44250.0000 44250.0000 3.09512 1 59790.0000 0.0000 59790.0000 59790.0000 4.04136 1 53445.0000 0.0000 53445.0000 53445.0000 5.16223 1 44350.0000 0.0000 44350.0000 44350.0000 6.31549 1 49398.0000 0.0000 49398.0000 49398.0000 7.37684 1 41053.0000 0.0000 41053.0000 41053.0000 7.69578 1 53359.0000 0.0000 53359.0000 53359.0000 8.75552 1 47997.0000 0.0000 47997.0000 47997.0000 session_timeout: Level N Mean Std Min Max ------------------------------------------------------------ 12852 1 65248.0000 0.0000 65248.0000 65248.0000 16963.7 1 44350.0000 0.0000 44350.0000 44350.0000 20531.3 1 67998.0000 0.0000 67998.0000 67998.0000 21757.5 1 47997.0000 0.0000 47997.0000 47997.0000 25760.8 1 53359.0000 0.0000 53359.0000 53359.0000 31653.8 1 41053.0000 0.0000 41053.0000 41053.0000 36484.3 1 44250.0000 0.0000 44250.0000 44250.0000 39865.3 1 59790.0000 0.0000 59790.0000 59790.0000 43991 1 49398.0000 0.0000 49398.0000 49398.0000 7673.13 1 53445.0000 0.0000 53445.0000 53445.0000

Optimization Recommendations

doe optimize
=== Optimization: throughput_mbps === Direction: maximize Best observed run: #8 fetch_min_bytes = 970069 max_poll_records = 3331.44 num_consumers = 4.31814 session_timeout = 12633.7 Value: 140.2 RSM Model (linear, R² = 0.5593, Adj R² = 0.2067): Coefficients: intercept +112.2549 fetch_min_bytes +13.0494 max_poll_records -18.7241 num_consumers -13.1177 session_timeout -1.8402 Predicted optimum (from linear model, at observed points): fetch_min_bytes = 510416 max_poll_records = 438.719 num_consumers = 3.95258 session_timeout = 33998.2 Predicted value: 133.3187 Surface optimum (via L-BFGS-B, linear model): fetch_min_bytes = 1.04858e+06 max_poll_records = 100 num_consumers = 1 session_timeout = 6000 Predicted value: 158.9862 Model quality: Moderate fit — use predictions directionally, not precisely. Factor importance: 1. fetch_min_bytes (effect: 56.6, contribution: 25.0%) 2. max_poll_records (effect: 56.6, contribution: 25.0%) 3. num_consumers (effect: 56.6, contribution: 25.0%) 4. session_timeout (effect: 56.6, contribution: 25.0%) === Optimization: consumer_lag === Direction: minimize Best observed run: #8 fetch_min_bytes = 970069 max_poll_records = 3331.44 num_consumers = 4.31814 session_timeout = 12633.7 Value: 41053.0 RSM Model (linear, R² = 0.7825, Adj R² = 0.6085): Coefficients: intercept +52652.1380 fetch_min_bytes -6703.0612 max_poll_records +9372.9667 num_consumers +8354.2386 session_timeout +1202.9993 Predicted optimum (from linear model, at observed points): fetch_min_bytes = 80107.4 max_poll_records = 2501.35 num_consumers = 11.0078 session_timeout = 17155.5 Predicted value: 64477.2543 Surface optimum (via L-BFGS-B, linear model): fetch_min_bytes = 1.04858e+06 max_poll_records = 100 num_consumers = 1 session_timeout = 6000 Predicted value: 27018.8722 Model quality: Good fit — general trends are captured, some noise remains. Factor importance: 1. fetch_min_bytes (effect: 26945.0, contribution: 25.0%) 2. max_poll_records (effect: 26945.0, contribution: 25.0%) 3. num_consumers (effect: 26945.0, contribution: 25.0%) 4. session_timeout (effect: 26945.0, contribution: 25.0%)
← All Use Cases Next: Spark Shuffle Optimization →