Summary
This experiment investigates query engine join strategy. Plackett-Burman screening of 6 query engine parameters for join performance and memory pressure.
The design varies 6 factors: join algorithm, ranging from hash to sort_merge, hash table mb (MB), ranging from 256 to 4096, sort buffer mb (MB), ranging from 64 to 512, broadcast threshold mb (MB), ranging from 10 to 256, adaptive execution, ranging from off to on, and partitions (count), ranging from 50 to 400. The goal is to optimize 2 responses: query time s (s) (minimize) and peak memory gb (GB) (minimize). Fixed conditions held constant across all runs include engine = spark_sql, data size gb = 100.
A Plackett-Burman screening design was used to efficiently test 6 factors in only 8 runs. This design assumes interactions are negligible and focuses on identifying the most influential main effects.
Key Findings
For query time s, the most influential factors were hash table mb (46.4%), sort buffer mb (16.4%), adaptive execution (16.2%). The best observed value was 34.5 (at join algorithm = hash, hash table mb = 256, sort buffer mb = 512).
For peak memory gb, the most influential factors were join algorithm (29.9%), hash table mb (28.1%), broadcast threshold mb (16.3%). The best observed value was 1.8 (at join algorithm = hash, hash table mb = 256, sort buffer mb = 512).
Recommended Next Steps
- Follow up with a response surface design (CCD or Box-Behnken) on the top 3–4 factors to model curvature and find the true optimum.
- Consider whether any fixed factors should be varied in a future study.
- The screening results can guide factor reduction — drop factors contributing less than 5% and re-run with a smaller, more focused design.
Experimental Setup
Factors
| Factor | Low | High | Unit |
join_algorithm | hash | sort_merge | |
hash_table_mb | 256 | 4096 | MB |
sort_buffer_mb | 64 | 512 | MB |
broadcast_threshold_mb | 10 | 256 | MB |
adaptive_execution | off | on | |
partitions | 50 | 400 | count |
Fixed: engine = spark_sql, data_size_gb = 100
Responses
| Response | Direction | Unit |
query_time_s | ↓ minimize | s |
peak_memory_gb | ↓ minimize | GB |
Configuration
{
"metadata": {
"name": "Query Engine Join Strategy",
"description": "Plackett-Burman screening of 6 query engine parameters for join performance and memory pressure"
},
"factors": [
{
"name": "join_algorithm",
"levels": [
"hash",
"sort_merge"
],
"type": "categorical",
"unit": ""
},
{
"name": "hash_table_mb",
"levels": [
"256",
"4096"
],
"type": "continuous",
"unit": "MB"
},
{
"name": "sort_buffer_mb",
"levels": [
"64",
"512"
],
"type": "continuous",
"unit": "MB"
},
{
"name": "broadcast_threshold_mb",
"levels": [
"10",
"256"
],
"type": "continuous",
"unit": "MB"
},
{
"name": "adaptive_execution",
"levels": [
"off",
"on"
],
"type": "categorical",
"unit": ""
},
{
"name": "partitions",
"levels": [
"50",
"400"
],
"type": "continuous",
"unit": "count"
}
],
"fixed_factors": {
"engine": "spark_sql",
"data_size_gb": "100"
},
"responses": [
{
"name": "query_time_s",
"optimize": "minimize",
"unit": "s"
},
{
"name": "peak_memory_gb",
"optimize": "minimize",
"unit": "GB"
}
],
"settings": {
"operation": "plackett_burman",
"test_script": "use_cases/42_query_engine_join_strategy/sim.sh"
}
}
Experimental Matrix
The Plackett-Burman Design produces 8 runs. Each row is one experiment with specific factor settings.
| Run | join_algorithm | hash_table_mb | sort_buffer_mb | broadcast_threshold_mb | adaptive_execution | partitions |
| 1 | sort_merge | 4096 | 512 | 10 | off | 50 |
| 2 | hash | 256 | 512 | 256 | off | 50 |
| 3 | hash | 4096 | 64 | 256 | off | 400 |
| 4 | sort_merge | 4096 | 512 | 256 | on | 400 |
| 5 | hash | 4096 | 64 | 10 | on | 50 |
| 6 | sort_merge | 256 | 64 | 256 | on | 50 |
| 7 | hash | 256 | 512 | 10 | on | 400 |
| 8 | sort_merge | 256 | 64 | 10 | off | 400 |
Step-by-Step Workflow
1
Preview the design
$ doe info --config use_cases/42_query_engine_join_strategy/config.json
2
Generate the runner script
$ doe generate --config use_cases/42_query_engine_join_strategy/config.json \
--output use_cases/42_query_engine_join_strategy/results/run.sh --seed 42
3
Execute the experiments
$ bash use_cases/42_query_engine_join_strategy/results/run.sh
4
Analyze results
$ doe analyze --config use_cases/42_query_engine_join_strategy/config.json
5
Get optimization recommendations
$ doe optimize --config use_cases/42_query_engine_join_strategy/config.json
6
Multi-objective optimization
With 2 competing responses, use --multi to find the best compromise via Derringer–Suich desirability.
$ doe optimize --config use_cases/42_query_engine_join_strategy/config.json --multi
7
Generate the HTML report
$ doe report --config use_cases/42_query_engine_join_strategy/config.json \
--output use_cases/42_query_engine_join_strategy/results/report.html
Features Exercised
| Feature | Value |
| Design type | plackett_burman |
| Factor types | continuous (4), categorical (2) |
| Arg style | double-dash |
| Responses | 2 (query_time_s ↓, peak_memory_gb ↓) |
| Total runs | 8 |
Analysis Results
Generated from actual experiment runs using the DOE Helper Tool.
Response: query_time_s
Top factors: hash_table_mb (46.4%), sort_buffer_mb (16.4%), adaptive_execution (16.2%).
ANOVA
| Source | DF | SS | MS | F | p-value |
| Source | DF | SS | MS | F | p-value |
| join_algorithm | 1 | 72.6012 | 72.6012 | 0.565 | 0.4767 |
| hash_table_mb | 1 | 810.0312 | 810.0312 | 6.305 | 0.0403 |
| sort_buffer_mb | 1 | 101.5312 | 101.5312 | 0.790 | 0.4035 |
| broadcast_threshold_mb | 1 | 3.2512 | 3.2512 | 0.025 | 0.8781 |
| adaptive_execution | 1 | 98.7012 | 98.7012 | 0.768 | 0.4098 |
| partitions | 1 | 6.3012 | 6.3012 | 0.049 | 0.8311 |
| join_algorithm*hash_table_mb | 1 | 101.5312 | 101.5312 | 0.790 | 0.4035 |
| join_algorithm*sort_buffer_mb | 1 | 810.0312 | 810.0312 | 6.305 | 0.0403 |
| join_algorithm*broadcast_threshold_mb | 1 | 98.7012 | 98.7012 | 0.768 | 0.4098 |
| join_algorithm*adaptive_execution | 1 | 3.2512 | 3.2512 | 0.025 | 0.8781 |
| join_algorithm*partitions | 1 | 162.9012 | 162.9012 | 1.268 | 0.2973 |
| hash_table_mb*sort_buffer_mb | 1 | 72.6012 | 72.6012 | 0.565 | 0.4767 |
| hash_table_mb*broadcast_threshold_mb | 1 | 6.3013 | 6.3013 | 0.049 | 0.8311 |
| hash_table_mb*adaptive_execution | 1 | 162.9013 | 162.9013 | 1.268 | 0.2973 |
| hash_table_mb*partitions | 1 | 3.2513 | 3.2513 | 0.025 | 0.8781 |
| sort_buffer_mb*broadcast_threshold_mb | 1 | 162.9012 | 162.9012 | 1.268 | 0.2973 |
| sort_buffer_mb*adaptive_execution | 1 | 6.3013 | 6.3013 | 0.049 | 0.8311 |
| sort_buffer_mb*partitions | 1 | 98.7012 | 98.7012 | 0.768 | 0.4098 |
| broadcast_threshold_mb*adaptive_execution | 1 | 72.6012 | 72.6012 | 0.565 | 0.4767 |
| broadcast_threshold_mb*partitions | 1 | 810.0313 | 810.0313 | 6.305 | 0.0403 |
| adaptive_execution*partitions | 1 | 101.5313 | 101.5313 | 0.790 | 0.4035 |
| Error | (Lenth | PSE) | 7 | 899.3381 | 128.4769 |
| Total | 7 | 1255.3187 | 179.3312 | | |
Pareto Chart
Main Effects Plot
Normal Probability Plot of Effects
Half-Normal Plot of Effects
Model Diagnostics
Response: peak_memory_gb
Top factors: join_algorithm (29.9%), hash_table_mb (28.1%), broadcast_threshold_mb (16.3%).
ANOVA
| Source | DF | SS | MS | F | p-value |
| Source | DF | SS | MS | F | p-value |
| join_algorithm | 1 | 170.2012 | 170.2012 | 9.262 | 0.0188 |
| hash_table_mb | 1 | 150.5113 | 150.5113 | 8.190 | 0.0243 |
| sort_buffer_mb | 1 | 6.3013 | 6.3013 | 0.343 | 0.5765 |
| broadcast_threshold_mb | 1 | 50.5013 | 50.5013 | 2.748 | 0.1413 |
| adaptive_execution | 1 | 12.2512 | 12.2512 | 0.667 | 0.4411 |
| partitions | 1 | 27.7512 | 27.7512 | 1.510 | 0.2588 |
| join_algorithm*hash_table_mb | 1 | 6.3012 | 6.3012 | 0.343 | 0.5765 |
| join_algorithm*sort_buffer_mb | 1 | 150.5112 | 150.5112 | 8.190 | 0.0243 |
| join_algorithm*broadcast_threshold_mb | 1 | 12.2512 | 12.2512 | 0.667 | 0.4411 |
| join_algorithm*adaptive_execution | 1 | 50.5012 | 50.5012 | 2.748 | 0.1413 |
| join_algorithm*partitions | 1 | 0.1513 | 0.1513 | 0.008 | 0.9303 |
| hash_table_mb*sort_buffer_mb | 1 | 170.2012 | 170.2012 | 9.262 | 0.0188 |
| hash_table_mb*broadcast_threshold_mb | 1 | 27.7512 | 27.7512 | 1.510 | 0.2588 |
| hash_table_mb*adaptive_execution | 1 | 0.1512 | 0.1512 | 0.008 | 0.9303 |
| hash_table_mb*partitions | 1 | 50.5012 | 50.5012 | 2.748 | 0.1413 |
| sort_buffer_mb*broadcast_threshold_mb | 1 | 0.1512 | 0.1512 | 0.008 | 0.9303 |
| sort_buffer_mb*adaptive_execution | 1 | 27.7512 | 27.7512 | 1.510 | 0.2588 |
| sort_buffer_mb*partitions | 1 | 12.2512 | 12.2512 | 0.667 | 0.4411 |
| broadcast_threshold_mb*adaptive_execution | 1 | 170.2012 | 170.2012 | 9.262 | 0.0188 |
| broadcast_threshold_mb*partitions | 1 | 150.5112 | 150.5112 | 8.190 | 0.0243 |
| adaptive_execution*partitions | 1 | 6.3013 | 6.3013 | 0.343 | 0.5765 |
| Error | (Lenth | PSE) | 7 | 128.6381 | 18.3769 |
| Total | 7 | 417.6687 | 59.6670 | | |
Pareto Chart
Main Effects Plot
Normal Probability Plot of Effects
Half-Normal Plot of Effects
Model Diagnostics
Response Surface Plots
3D surfaces fitted with quadratic RSM. Red dots are observed data points.
peak memory gb broadcast threshold mb vs partitions
peak memory gb hash table mb vs broadcast threshold mb
peak memory gb hash table mb vs partitions
peak memory gb hash table mb vs sort buffer mb
peak memory gb sort buffer mb vs broadcast threshold mb
peak memory gb sort buffer mb vs partitions
query time s broadcast threshold mb vs partitions
query time s hash table mb vs broadcast threshold mb
query time s hash table mb vs partitions
query time s hash table mb vs sort buffer mb
query time s sort buffer mb vs broadcast threshold mb
query time s sort buffer mb vs partitions
Multi-Objective Optimization
When responses compete, Derringer–Suich desirability finds the best compromise.
Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.
Overall Desirability
D = 0.8408
Per-Response Desirability
| Response | Weight | Desirability | Predicted | Dir |
query_time_s |
1.5 |
|
31.15 1.0000 31.15 s |
↓ |
peak_memory_gb |
1.0 |
|
9.92 0.6482 9.92 GB |
↓ |
Recommended Settings
| Factor | Value |
join_algorithm | sort_merge |
hash_table_mb | 4065 MB |
sort_buffer_mb | 73.95 MB |
broadcast_threshold_mb | 50.15 MB |
adaptive_execution | on |
partitions | 60.19 count |
Source: from RSM model prediction
Trade-off Summary
Sacrifice = how much worse than single-objective best.
| Response | Predicted | Best Observed | Sacrifice |
peak_memory_gb | 9.92 | 1.80 | +8.12 |
Top 3 Runs by Desirability
| Run | D | Factor Settings |
| #5 | 0.7174 | join_algorithm=sort_merge, hash_table_mb=4096, sort_buffer_mb=512, broadcast_threshold_mb=10, adaptive_execution=off, partitions=50 |
| #4 | 0.6940 | join_algorithm=sort_merge, hash_table_mb=256, sort_buffer_mb=64, broadcast_threshold_mb=10, adaptive_execution=off, partitions=400 |
Model Quality
| Response | R² | Type |
peak_memory_gb | 0.9547 | linear |
Full Multi-Objective Output
============================================================
MULTI-OBJECTIVE OPTIMIZATION
Method: Derringer-Suich Desirability Function
============================================================
Overall desirability: D = 0.8408
Response Weight Desirability Predicted Direction
---------------------------------------------------------------------
query_time_s 1.5 1.0000 31.15 s ↓
peak_memory_gb 1.0 0.6482 9.92 GB ↓
Recommended settings:
join_algorithm = sort_merge
hash_table_mb = 4065 MB
sort_buffer_mb = 73.95 MB
broadcast_threshold_mb = 50.15 MB
adaptive_execution = on
partitions = 60.19 count
(from RSM model prediction)
Trade-off summary:
query_time_s: 31.15 (best observed: 34.50, sacrifice: -3.35)
peak_memory_gb: 9.92 (best observed: 1.80, sacrifice: +8.12)
Model quality:
query_time_s: R² = 0.9931 (linear)
peak_memory_gb: R² = 0.9547 (linear)
Top 3 observed runs by overall desirability:
1. Run #7 (D=0.8012): join_algorithm=hash, hash_table_mb=4096, sort_buffer_mb=64, broadcast_threshold_mb=10, adaptive_execution=on, partitions=50
2. Run #5 (D=0.7174): join_algorithm=sort_merge, hash_table_mb=4096, sort_buffer_mb=512, broadcast_threshold_mb=10, adaptive_execution=off, partitions=50
3. Run #4 (D=0.6940): join_algorithm=sort_merge, hash_table_mb=256, sort_buffer_mb=64, broadcast_threshold_mb=10, adaptive_execution=off, partitions=400
Full Analysis Output
=== Main Effects: query_time_s ===
Factor Effect Std Error % Contribution
--------------------------------------------------------------
hash_table_mb -20.1250 4.7346 46.4%
sort_buffer_mb 7.1250 4.7346 16.4%
adaptive_execution 7.0250 4.7346 16.2%
join_algorithm 6.0250 4.7346 13.9%
partitions -1.7750 4.7346 4.1%
broadcast_threshold_mb -1.2750 4.7346 2.9%
=== ANOVA Table: query_time_s ===
Source DF SS MS F p-value
-----------------------------------------------------------------------------
join_algorithm 1 72.6012 72.6012 0.565 0.4767
hash_table_mb 1 810.0312 810.0312 6.305 0.0403
sort_buffer_mb 1 101.5312 101.5312 0.790 0.4035
broadcast_threshold_mb 1 3.2512 3.2512 0.025 0.8781
adaptive_execution 1 98.7012 98.7012 0.768 0.4098
partitions 1 6.3012 6.3012 0.049 0.8311
join_algorithm*hash_table_mb 1 101.5312 101.5312 0.790 0.4035
join_algorithm*sort_buffer_mb 1 810.0312 810.0312 6.305 0.0403
join_algorithm*broadcast_threshold_mb 1 98.7012 98.7012 0.768 0.4098
join_algorithm*adaptive_execution 1 3.2512 3.2512 0.025 0.8781
join_algorithm*partitions 1 162.9012 162.9012 1.268 0.2973
hash_table_mb*sort_buffer_mb 1 72.6012 72.6012 0.565 0.4767
hash_table_mb*broadcast_threshold_mb 1 6.3013 6.3013 0.049 0.8311
hash_table_mb*adaptive_execution 1 162.9013 162.9013 1.268 0.2973
hash_table_mb*partitions 1 3.2513 3.2513 0.025 0.8781
sort_buffer_mb*broadcast_threshold_mb 1 162.9012 162.9012 1.268 0.2973
sort_buffer_mb*adaptive_execution 1 6.3013 6.3013 0.049 0.8311
sort_buffer_mb*partitions 1 98.7012 98.7012 0.768 0.4098
broadcast_threshold_mb*adaptive_execution 1 72.6012 72.6012 0.565 0.4767
broadcast_threshold_mb*partitions 1 810.0313 810.0313 6.305 0.0403
adaptive_execution*partitions 1 101.5313 101.5313 0.790 0.4035
Error (Lenth PSE) 7 899.3381 128.4769
Total 7 1255.3187 179.3312
Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design)
=== Interaction Effects: query_time_s ===
Factor A Factor B Interaction % Contribution
------------------------------------------------------------------------
join_algorithm sort_buffer_mb 20.1250 17.7%
broadcast_threshold_mb partitions 20.1250 17.7%
join_algorithm partitions 9.0250 7.9%
hash_table_mb adaptive_execution -9.0250 7.9%
sort_buffer_mb broadcast_threshold_mb 9.0250 7.9%
join_algorithm hash_table_mb -7.1250 6.3%
adaptive_execution partitions 7.1250 6.3%
join_algorithm broadcast_threshold_mb 7.0250 6.2%
sort_buffer_mb partitions 7.0250 6.2%
hash_table_mb sort_buffer_mb -6.0250 5.3%
broadcast_threshold_mb adaptive_execution 6.0250 5.3%
hash_table_mb broadcast_threshold_mb 1.7750 1.6%
sort_buffer_mb adaptive_execution -1.7750 1.6%
join_algorithm adaptive_execution -1.2750 1.1%
hash_table_mb partitions 1.2750 1.1%
=== Summary Statistics: query_time_s ===
join_algorithm:
Level N Mean Std Min Max
------------------------------------------------------------
hash 4 42.8500 10.8709 35.1000 58.9000
sort_merge 4 48.8750 16.6151 34.5000 69.0000
hash_table_mb:
Level N Mean Std Min Max
------------------------------------------------------------
256 4 55.9250 12.1082 39.8000 69.0000
4096 4 35.8000 1.3491 34.5000 37.6000
sort_buffer_mb:
Level N Mean Std Min Max
------------------------------------------------------------
512 4 42.3000 11.2892 34.5000 58.9000
64 4 49.4250 16.0359 35.1000 69.0000
broadcast_threshold_mb:
Level N Mean Std Min Max
------------------------------------------------------------
10 4 46.5000 12.7046 35.1000 58.9000
256 4 45.2250 15.9984 34.5000 69.0000
adaptive_execution:
Level N Mean Std Min Max
------------------------------------------------------------
off 4 42.3500 9.2324 36.0000 56.0000
on 4 49.3750 17.3292 34.5000 69.0000
partitions:
Level N Mean Std Min Max
------------------------------------------------------------
400 4 46.7500 12.4762 34.5000 58.9000
50 4 44.9750 16.1457 35.1000 69.0000
=== Main Effects: peak_memory_gb ===
Factor Effect Std Error % Contribution
--------------------------------------------------------------
join_algorithm -9.2250 2.7310 29.9%
hash_table_mb 8.6750 2.7310 28.1%
broadcast_threshold_mb 5.0250 2.7310 16.3%
partitions -3.7250 2.7310 12.1%
adaptive_execution -2.4750 2.7310 8.0%
sort_buffer_mb -1.7750 2.7310 5.7%
=== ANOVA Table: peak_memory_gb ===
Source DF SS MS F p-value
-----------------------------------------------------------------------------
join_algorithm 1 170.2012 170.2012 9.262 0.0188
hash_table_mb 1 150.5113 150.5113 8.190 0.0243
sort_buffer_mb 1 6.3013 6.3013 0.343 0.5765
broadcast_threshold_mb 1 50.5013 50.5013 2.748 0.1413
adaptive_execution 1 12.2512 12.2512 0.667 0.4411
partitions 1 27.7512 27.7512 1.510 0.2588
join_algorithm*hash_table_mb 1 6.3012 6.3012 0.343 0.5765
join_algorithm*sort_buffer_mb 1 150.5112 150.5112 8.190 0.0243
join_algorithm*broadcast_threshold_mb 1 12.2512 12.2512 0.667 0.4411
join_algorithm*adaptive_execution 1 50.5012 50.5012 2.748 0.1413
join_algorithm*partitions 1 0.1513 0.1513 0.008 0.9303
hash_table_mb*sort_buffer_mb 1 170.2012 170.2012 9.262 0.0188
hash_table_mb*broadcast_threshold_mb 1 27.7512 27.7512 1.510 0.2588
hash_table_mb*adaptive_execution 1 0.1512 0.1512 0.008 0.9303
hash_table_mb*partitions 1 50.5012 50.5012 2.748 0.1413
sort_buffer_mb*broadcast_threshold_mb 1 0.1512 0.1512 0.008 0.9303
sort_buffer_mb*adaptive_execution 1 27.7512 27.7512 1.510 0.2588
sort_buffer_mb*partitions 1 12.2512 12.2512 0.667 0.4411
broadcast_threshold_mb*adaptive_execution 1 170.2012 170.2012 9.262 0.0188
broadcast_threshold_mb*partitions 1 150.5112 150.5112 8.190 0.0243
adaptive_execution*partitions 1 6.3013 6.3013 0.343 0.5765
Error (Lenth PSE) 7 128.6381 18.3769
Total 7 417.6687 59.6670
Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design)
=== Interaction Effects: peak_memory_gb ===
Factor A Factor B Interaction % Contribution
------------------------------------------------------------------------
hash_table_mb sort_buffer_mb 9.2250 14.7%
broadcast_threshold_mb adaptive_execution -9.2250 14.7%
join_algorithm sort_buffer_mb -8.6750 13.9%
broadcast_threshold_mb partitions -8.6750 13.9%
join_algorithm adaptive_execution 5.0250 8.0%
hash_table_mb partitions -5.0250 8.0%
hash_table_mb broadcast_threshold_mb 3.7250 5.9%
sort_buffer_mb adaptive_execution -3.7250 5.9%
join_algorithm broadcast_threshold_mb -2.4750 4.0%
sort_buffer_mb partitions -2.4750 4.0%
join_algorithm hash_table_mb 1.7750 2.8%
adaptive_execution partitions -1.7750 2.8%
join_algorithm partitions 0.2750 0.4%
hash_table_mb adaptive_execution -0.2750 0.4%
sort_buffer_mb broadcast_threshold_mb 0.2750 0.4%
=== Summary Statistics: peak_memory_gb ===
join_algorithm:
Level N Mean Std Min Max
------------------------------------------------------------
hash 4 16.7000 6.3209 11.5000 25.9000
sort_merge 4 7.4750 6.5220 1.8000 15.7000
hash_table_mb:
Level N Mean Std Min Max
------------------------------------------------------------
256 4 7.7500 6.5200 1.8000 15.0000
4096 4 16.4250 6.8222 9.7000 25.9000
sort_buffer_mb:
Level N Mean Std Min Max
------------------------------------------------------------
512 4 12.9750 2.8535 9.7000 15.7000
64 4 11.2000 11.3569 1.8000 25.9000
broadcast_threshold_mb:
Level N Mean Std Min Max
------------------------------------------------------------
10 4 9.5750 4.9755 2.7000 14.4000
256 4 14.6000 9.8810 1.8000 25.9000
adaptive_execution:
Level N Mean Std Min Max
------------------------------------------------------------
off 4 13.3250 9.7804 2.7000 25.9000
on 4 10.8500 6.2836 1.8000 15.7000
partitions:
Level N Mean Std Min Max
------------------------------------------------------------
400 4 13.9500 9.6338 2.7000 25.9000
50 4 10.2250 6.0961 1.8000 15.0000
Optimization Recommendations
=== Optimization: query_time_s ===
Direction: minimize
Best observed run: #4
join_algorithm = hash
hash_table_mb = 256
sort_buffer_mb = 512
broadcast_threshold_mb = 256
adaptive_execution = off
partitions = 50
Value: 34.5
RSM Model (linear, R² = 0.9985, Adj R² = 0.9894):
Coefficients:
intercept +45.8625
join_algorithm -3.4125
hash_table_mb +2.2125
sort_buffer_mb +4.6875
broadcast_threshold_mb -4.3375
adaptive_execution +8.8875
partitions +4.5125
Predicted optimum (from linear model, at observed points):
join_algorithm = hash
hash_table_mb = 256
sort_buffer_mb = 512
broadcast_threshold_mb = 10
adaptive_execution = on
partitions = 400
Predicted value: 69.4875
Surface optimum (via L-BFGS-B, linear model):
join_algorithm = sort_merge
hash_table_mb = 256
sort_buffer_mb = 64
broadcast_threshold_mb = 256
adaptive_execution = off
partitions = 50
Predicted value: 17.8125
Model quality: Excellent fit — surface predictions are reliable.
Factor importance:
1. adaptive_execution (effect: 17.8, contribution: 31.7%)
2. sort_buffer_mb (effect: -9.4, contribution: 16.7%)
3. partitions (effect: -9.0, contribution: 16.1%)
4. broadcast_threshold_mb (effect: -8.7, contribution: 15.5%)
5. join_algorithm (effect: -6.8, contribution: 12.2%)
6. hash_table_mb (effect: 4.4, contribution: 7.9%)
=== Optimization: peak_memory_gb ===
Direction: minimize
Best observed run: #8
join_algorithm = hash
hash_table_mb = 256
sort_buffer_mb = 512
broadcast_threshold_mb = 10
adaptive_execution = on
partitions = 400
Value: 1.8
RSM Model (linear, R² = 0.9083, Adj R² = 0.3584):
Coefficients:
intercept +12.0875
join_algorithm +0.5625
hash_table_mb +1.6875
sort_buffer_mb -1.0875
broadcast_threshold_mb +4.7875
adaptive_execution -4.4875
partitions +0.1375
Predicted optimum (from linear model, at observed points):
join_algorithm = hash
hash_table_mb = 4096
sort_buffer_mb = 64
broadcast_threshold_mb = 256
adaptive_execution = off
partitions = 400
Predicted value: 23.7125
Surface optimum (via L-BFGS-B, linear model):
join_algorithm = hash
hash_table_mb = 256
sort_buffer_mb = 512
broadcast_threshold_mb = 10
adaptive_execution = on
partitions = 50
Predicted value: -0.6625
Model quality: Excellent fit — surface predictions are reliable.
Factor importance:
1. broadcast_threshold_mb (effect: 9.6, contribution: 37.5%)
2. adaptive_execution (effect: -9.0, contribution: 35.2%)
3. hash_table_mb (effect: 3.4, contribution: 13.2%)
4. sort_buffer_mb (effect: 2.2, contribution: 8.5%)
5. join_algorithm (effect: 1.1, contribution: 4.4%)
6. partitions (effect: -0.3, contribution: 1.1%)