Feature Store Freshness

Summary

This experiment investigates feature store freshness. Latin Hypercube of 4 feature store parameters for serving latency and feature freshness.

The design varies 4 factors: materialization interval m (min), ranging from 1 to 60, cache ttl s (s), ranging from 10 to 300, batch size (rows), ranging from 100 to 10000, and online replicas (count), ranging from 1 to 6. The goal is to optimize 2 responses: serving latency ms (ms) (minimize) and freshness lag min (min) (minimize). Fixed conditions held constant across all runs include offline store = s3_parquet, online store = redis.

Latin Hypercube Sampling was used to space 10 runs across the 4-dimensional factor space with good coverage and minimal gaps, making it ideal for computer experiments where the response surface may be complex.

Key Findings

For serving latency ms, the most influential factors were materialization interval m (25.0%), cache ttl s (25.0%), batch size (25.0%). The best observed value was 1.1 (at materialization interval m = 39.8455, cache ttl s = 221.076, batch size = 7226.38).

For freshness lag min, the most influential factors were materialization interval m (25.0%), cache ttl s (25.0%), batch size (25.0%). The best observed value was -0.2 (at materialization interval m = 11.1795, cache ttl s = 210.907, batch size = 4240.79).

Recommended Next Steps

Consider whether any fixed factors should be varied in a future study.

Experimental Setup

Factors

Factor	Low	High	Unit
`materialization_interval_m`	1	60	min
`cache_ttl_s`	10	300	s
`batch_size`	100	10000	rows
`online_replicas`	1	6	count

Fixed: offline_store = s3_parquet, online_store = redis

Responses

Response	Direction	Unit
`serving_latency_ms`	↓ minimize	ms
`freshness_lag_min`	↓ minimize	min

Configuration

use_cases/46_feature_store_freshness/config.json

{
  "metadata": {
    "name": "Feature Store Freshness",
    "description": "Latin Hypercube of 4 feature store parameters for serving latency and feature freshness"
  },
  "factors": [
    {
      "name": "materialization_interval_m",
      "levels": [
        "1",
        "60"
      ],
      "type": "continuous",
      "unit": "min"
    },
    {
      "name": "cache_ttl_s",
      "levels": [
        "10",
        "300"
      ],
      "type": "continuous",
      "unit": "s"
    },
    {
      "name": "batch_size",
      "levels": [
        "100",
        "10000"
      ],
      "type": "continuous",
      "unit": "rows"
    },
    {
      "name": "online_replicas",
      "levels": [
        "1",
        "6"
      ],
      "type": "continuous",
      "unit": "count"
    }
  ],
  "fixed_factors": {
    "offline_store": "s3_parquet",
    "online_store": "redis"
  },
  "responses": [
    {
      "name": "serving_latency_ms",
      "optimize": "minimize",
      "unit": "ms"
    },
    {
      "name": "freshness_lag_min",
      "optimize": "minimize",
      "unit": "min"
    }
  ],
  "settings": {
    "operation": "latin_hypercube",
    "test_script": "use_cases/46_feature_store_freshness/sim.sh"
  }
}

Experimental Matrix

The Latin Hypercube Design produces 10 runs. Each row is one experiment with specific factor settings.

Run	`materialization_interval_m`	`cache_ttl_s`	`batch_size`	`online_replicas`
1	23.7626	34.7302	6312.9	4.85806
2	4.65909	173.938	5056.2	1.34696
3	37.3373	138.469	9952.39	3.76598
4	50.6236	103.511	854.404	4.06335
5	48.107	277.393	2532.27	2.40346
6	33.8919	56.4223	4505.17	3.13861
7	11.0284	191.75	3222.72	5.92413
8	28.8017	252.505	1183.33	5.25669
9	17.3975	240.757	7101.59	1.55125
10	55.9749	74.9791	8960.08	2.81131

Step-by-Step Workflow

1

Preview the design

Terminal

$ doe info --config use_cases/46_feature_store_freshness/config.json

2

Generate the runner script

Terminal

$ doe generate --config use_cases/46_feature_store_freshness/config.json \
    --output use_cases/46_feature_store_freshness/results/run.sh --seed 42

3

Execute the experiments

Terminal

$ bash use_cases/46_feature_store_freshness/results/run.sh

4

Analyze results

Terminal

$ doe analyze --config use_cases/46_feature_store_freshness/config.json

5

Get optimization recommendations

Terminal

$ doe optimize --config use_cases/46_feature_store_freshness/config.json

6

Multi-objective optimization

With 2 competing responses, use --multi to find the best compromise via Derringer–Suich desirability.

Terminal

$ doe optimize --config use_cases/46_feature_store_freshness/config.json --multi

7

Generate the HTML report

Terminal

$ doe report --config use_cases/46_feature_store_freshness/config.json \
    --output use_cases/46_feature_store_freshness/results/report.html

Features Exercised

Feature	Value
Design type	`latin_hypercube`
Factor types	`continuous` (all 4)
Arg style	`double-dash`
Responses	2 (serving_latency_ms ↓, freshness_lag_min ↓)
Total runs	10

Analysis Results

Generated from actual experiment runs using the DOE Helper Tool.

Response: serving_latency_ms

Top factors: materialization_interval_m (25.0%), cache_ttl_s (25.0%), batch_size (25.0%).

ANOVA

Source	DF	SS	MS	F	p-value
Source	DF	SS	MS	F	p-value
materialization_interval_m	9	186.1210	20.6801
cache_ttl_s	9	186.1210	20.6801
batch_size	9	186.1210	20.6801
online_replicas	9	186.1210	20.6801
Error	(Lenth	PSE)	0	0.0000	0.0000
Total	9	186.1210	20.6801

Pareto Chart

Main Effects Plot

Normal Probability Plot of Effects

Normal probability plot for serving_latency_ms

Half-Normal Plot of Effects

Model Diagnostics

Response: freshness_lag_min

Top factors: materialization_interval_m (25.0%), cache_ttl_s (25.0%), batch_size (25.0%).

ANOVA

Source	DF	SS	MS	F	p-value
Source	DF	SS	MS	F	p-value
materialization_interval_m	9	143.4810	15.9423
cache_ttl_s	9	143.4810	15.9423
batch_size	9	143.4810	15.9423
online_replicas	9	143.4810	15.9423
Error	(Lenth	PSE)	0	0.0000	0.0000
Total	9	143.4810	15.9423

Pareto Chart

Main Effects Plot

Normal Probability Plot of Effects

Normal probability plot for freshness_lag_min

Half-Normal Plot of Effects

Model Diagnostics

Response Surface Plots

3D surfaces fitted with quadratic RSM. Red dots are observed data points.

freshness lag min batch size vs online replicas

freshness lag min cache ttl s vs batch size

freshness lag min cache ttl s vs online replicas

freshness lag min materialization interval m vs batch size

freshness lag min materialization interval m vs cache ttl s

freshness lag min materialization interval m vs online replicas

serving latency ms batch size vs online replicas

serving latency ms cache ttl s vs batch size

serving latency ms cache ttl s vs online replicas

serving latency ms materialization interval m vs batch size

serving latency ms materialization interval m vs cache ttl s

serving latency ms materialization interval m vs online replicas

Multi-Objective Optimization

When responses compete, Derringer–Suich desirability finds the best compromise. Each response is scaled to a 0–1 desirability, then combined via a weighted geometric mean.

Overall Desirability

D = 0.8846

Per-Response Desirability

Response	Weight	Desirability	Predicted	Dir
`serving_latency_ms`	1.0	0.7893	3.30 0.7893 3.30 ms	↓
`freshness_lag_min`	1.5	0.9545	-0.20 0.9545 -0.20 min	↓

Recommended Settings

Factor	Value
`materialization_interval_m`	50.8848 min
`cache_ttl_s`	135.321 s
`batch_size`	7640.26 rows
`online_replicas`	4.98895 count

Source: from observed run #1

Trade-off Summary

Sacrifice = how much worse than single-objective best.

Response	Predicted	Best Observed	Sacrifice
`freshness_lag_min`	-0.20	-0.20	+0.00

Top 3 Runs by Desirability

Run	D	Factor Settings
#6	0.8827	materialization_interval_m=39.2028, cache_ttl_s=44.4917, batch_size=401.765, online_replicas=2.39833
#3	0.8210	materialization_interval_m=8.10386, cache_ttl_s=185.242, batch_size=6454.55, online_replicas=2.69511

Model Quality

Response	R²	Type
`freshness_lag_min`	0.3923	linear

Full Multi-Objective Output

doe optimize --multi
============================================================
MULTI-OBJECTIVE OPTIMIZATION
Method: Derringer-Suich Desirability Function
============================================================

Overall desirability: D = 0.8846

Response                  Weight Desirability    Predicted  Direction
---------------------------------------------------------------------
serving_latency_ms           1.0       0.7893        3.30 ms   ↓
freshness_lag_min            1.5       0.9545       -0.20 min   ↓

Recommended settings:
  materialization_interval_m = 50.8848 min
  cache_ttl_s = 135.321 s
  batch_size = 7640.26 rows
  online_replicas = 4.98895 count
  (from observed run #1)

Trade-off summary:
  serving_latency_ms: 3.30 (best observed: 1.10, sacrifice: +2.20)
  freshness_lag_min: -0.20 (best observed: -0.20, sacrifice: +0.00)

Model quality:
  serving_latency_ms: R² = 0.1848 (linear)
  freshness_lag_min: R² = 0.3923 (linear)

Top 3 observed runs by overall desirability:
  1. Run #1 (D=0.8846): materialization_interval_m=50.8848, cache_ttl_s=135.321, batch_size=7640.26, online_replicas=4.98895
  2. Run #6 (D=0.8827): materialization_interval_m=39.2028, cache_ttl_s=44.4917, batch_size=401.765, online_replicas=2.39833
  3. Run #3 (D=0.8210): materialization_interval_m=8.10386, cache_ttl_s=185.242, batch_size=6454.55, online_replicas=2.69511

Full Analysis Output

doe analyze
=== Main Effects: serving_latency_ms ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
materialization_interval_m    12.1000       1.4381            25.0%
cache_ttl_s             12.1000       1.4381            25.0%
batch_size              12.1000       1.4381            25.0%
online_replicas         12.1000       1.4381            25.0%

=== ANOVA Table: serving_latency_ms ===
Source                      DF           SS           MS          F    p-value
-----------------------------------------------------------------------------
materialization_interval_m    9     186.1210      20.6801                      
cache_ttl_s                  9     186.1210      20.6801                      
batch_size                   9     186.1210      20.6801                      
online_replicas              9     186.1210      20.6801                      
Error (Lenth PSE)            0       0.0000       0.0000
Total                        9     186.1210      20.6801
  Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design)

=== Summary Statistics: serving_latency_ms ===

materialization_interval_m:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  14.4806             1     7.3000     0.0000     7.3000     7.3000
  2.05638             1     3.3000     0.0000     3.3000     3.3000
  23.0759             1     1.2000     0.0000     1.2000     1.2000
  26.1949             1     3.4000     0.0000     3.4000     3.4000
  31.0427             1    13.2000     0.0000    13.2000    13.2000
  37.6193             1    12.9000     0.0000    12.9000    12.9000
  47.3515             1     1.4000     0.0000     1.4000     1.4000
  49.5084             1     7.8000     0.0000     7.8000     7.8000
  57.5698             1     6.1000     0.0000     6.1000     6.1000
  8.63293             1     1.1000     0.0000     1.1000     1.1000

cache_ttl_s:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  104.795             1     7.8000     0.0000     7.8000     7.8000
  138.348             1    12.9000     0.0000    12.9000    12.9000
  158.475             1     1.2000     0.0000     1.2000     1.2000
  187.818             1     1.1000     0.0000     1.1000     1.1000
  226.288             1     7.3000     0.0000     7.3000     7.3000
  265.043             1     6.1000     0.0000     6.1000     6.1000
  292.057             1    13.2000     0.0000    13.2000    13.2000
  37.0665             1     3.4000     0.0000     3.4000     3.4000
  55.7061             1     1.4000     0.0000     1.4000     1.4000
  73.0513             1     3.3000     0.0000     3.3000     3.3000

batch_size:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1079.5              1    12.9000     0.0000    12.9000    12.9000
  1376.18             1     1.1000     0.0000     1.1000     1.1000
  2621.87             1     3.3000     0.0000     3.3000     3.3000
  3692.19             1     6.1000     0.0000     6.1000     6.1000
  4713.28             1     7.3000     0.0000     7.3000     7.3000
  5078.94             1     1.4000     0.0000     1.4000     1.4000
  6307.98             1    13.2000     0.0000    13.2000    13.2000
  7401.28             1     3.4000     0.0000     3.4000     3.4000
  8882.93             1     7.8000     0.0000     7.8000     7.8000
  9898.7              1     1.2000     0.0000     1.2000     1.2000

online_replicas:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1.31093             1     6.1000     0.0000     6.1000     6.1000
  1.89122             1    13.2000     0.0000    13.2000    13.2000
  2.46586             1     3.3000     0.0000     3.3000     3.3000
  2.64077             1     1.1000     0.0000     1.1000     1.1000
  3.02501             1     1.4000     0.0000     1.4000     1.4000
  3.89754             1     7.8000     0.0000     7.8000     7.8000
  4.30846             1    12.9000     0.0000    12.9000    12.9000
  4.68266             1     3.4000     0.0000     3.4000     3.4000
  5.20503             1     1.2000     0.0000     1.2000     1.2000
  5.59373             1     7.3000     0.0000     7.3000     7.3000

=== Main Effects: freshness_lag_min ===
Factor                   Effect    Std Error   % Contribution
--------------------------------------------------------------
materialization_interval_m    10.9000       1.2626            25.0%
cache_ttl_s             10.9000       1.2626            25.0%
batch_size              10.9000       1.2626            25.0%
online_replicas         10.9000       1.2626            25.0%

=== ANOVA Table: freshness_lag_min ===
Source                      DF           SS           MS          F    p-value
-----------------------------------------------------------------------------
materialization_interval_m    9     143.4810      15.9423                      
cache_ttl_s                  9     143.4810      15.9423                      
batch_size                   9     143.4810      15.9423                      
online_replicas              9     143.4810      15.9423                      
Error (Lenth PSE)            0       0.0000       0.0000
Total                        9     143.4810      15.9423
  Note: Error estimated using Lenth's pseudo-standard-error (unreplicated design)

=== Summary Statistics: freshness_lag_min ===

materialization_interval_m:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  14.4806             1     1.1000     0.0000     1.1000     1.1000
  2.05638             1    -0.2000     0.0000    -0.2000    -0.2000
  23.0759             1     2.6000     0.0000     2.6000     2.6000
  26.1949             1     1.3000     0.0000     1.3000     1.3000
  31.0427             1     4.2000     0.0000     4.2000     4.2000
  37.6193             1     9.5000     0.0000     9.5000     9.5000
  47.3515             1     2.2000     0.0000     2.2000     2.2000
  49.5084             1    10.7000     0.0000    10.7000    10.7000
  57.5698             1     8.7000     0.0000     8.7000     8.7000
  8.63293             1     1.2000     0.0000     1.2000     1.2000

cache_ttl_s:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  104.795             1    10.7000     0.0000    10.7000    10.7000
  138.348             1     9.5000     0.0000     9.5000     9.5000
  158.475             1     2.6000     0.0000     2.6000     2.6000
  187.818             1     1.2000     0.0000     1.2000     1.2000
  226.288             1     1.1000     0.0000     1.1000     1.1000
  265.043             1     8.7000     0.0000     8.7000     8.7000
  292.057             1     4.2000     0.0000     4.2000     4.2000
  37.0665             1     1.3000     0.0000     1.3000     1.3000
  55.7061             1     2.2000     0.0000     2.2000     2.2000
  73.0513             1    -0.2000     0.0000    -0.2000    -0.2000

batch_size:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1079.5              1     9.5000     0.0000     9.5000     9.5000
  1376.18             1     1.2000     0.0000     1.2000     1.2000
  2621.87             1    -0.2000     0.0000    -0.2000    -0.2000
  3692.19             1     8.7000     0.0000     8.7000     8.7000
  4713.28             1     1.1000     0.0000     1.1000     1.1000
  5078.94             1     2.2000     0.0000     2.2000     2.2000
  6307.98             1     4.2000     0.0000     4.2000     4.2000
  7401.28             1     1.3000     0.0000     1.3000     1.3000
  8882.93             1    10.7000     0.0000    10.7000    10.7000
  9898.7              1     2.6000     0.0000     2.6000     2.6000

online_replicas:
  Level               N       Mean        Std        Min        Max
  ------------------------------------------------------------
  1.31093             1     8.7000     0.0000     8.7000     8.7000
  1.89122             1     4.2000     0.0000     4.2000     4.2000
  2.46586             1    -0.2000     0.0000    -0.2000    -0.2000
  2.64077             1     1.2000     0.0000     1.2000     1.2000
  3.02501             1     2.2000     0.0000     2.2000     2.2000
  3.89754             1    10.7000     0.0000    10.7000    10.7000
  4.30846             1     9.5000     0.0000     9.5000     9.5000
  4.68266             1     1.3000     0.0000     1.3000     1.3000
  5.20503             1     2.6000     0.0000     2.6000     2.6000
  5.59373             1     1.1000     0.0000     1.1000     1.1000

Optimization Recommendations

doe optimize
=== Optimization: serving_latency_ms ===
Direction: minimize

Best observed run: #6
  materialization_interval_m = 39.8455
  cache_ttl_s = 221.076
  batch_size = 7226.38
  online_replicas = 5.69694
  Value: 1.1

RSM Model (linear, R² = 0.5680, Adj R² = 0.2224):
  Coefficients:
    intercept                      +5.7965
    materialization_interval_m     +1.2850
    cache_ttl_s                    -2.2339
    batch_size                     +1.1635
    online_replicas                -6.1432

  Predicted optimum (from linear model, at observed points):
    materialization_interval_m = 33.4834
    cache_ttl_s = 96.7479
    batch_size = 5949.18
    online_replicas = 1.32179
    Predicted value: 12.3878

  Surface optimum (via L-BFGS-B, linear model):
    materialization_interval_m = 1
    cache_ttl_s = 300
    batch_size = 100
    online_replicas = 6
    Predicted value: -5.0291

  Model quality: Moderate fit — use predictions directionally, not precisely.

Factor importance:
  1. materialization_interval_m  (effect: 12.1, contribution: 25.0%)
  2. cache_ttl_s  (effect: 12.1, contribution: 25.0%)
  3. batch_size  (effect: 12.1, contribution: 25.0%)
  4. online_replicas  (effect: 12.1, contribution: 25.0%)

=== Optimization: freshness_lag_min ===
Direction: minimize

Best observed run: #1
  materialization_interval_m = 11.1795
  cache_ttl_s = 210.907
  batch_size = 4240.79
  online_replicas = 3.83861
  Value: -0.2

RSM Model (linear, R² = 0.2319, Adj R² = -0.3826):
  Coefficients:
    intercept                      +4.1202
    materialization_interval_m     -0.4132
    cache_ttl_s                    -1.2667
    batch_size                     -2.3067
    online_replicas                -2.1018

  Predicted optimum (from linear model, at observed points):
    materialization_interval_m = 16.6117
    cache_ttl_s = 263.863
    batch_size = 944.75
    online_replicas = 1.65859
    Predicted value: 6.8249

  Surface optimum (via L-BFGS-B, linear model):
    materialization_interval_m = 60
    cache_ttl_s = 300
    batch_size = 10000
    online_replicas = 6
    Predicted value: -1.9681

  Model quality: Weak fit — consider adding center points or using a different design.

Factor importance:
  1. materialization_interval_m  (effect: 10.9, contribution: 25.0%)
  2. cache_ttl_s  (effect: 10.9, contribution: 25.0%)
  3. batch_size  (effect: 10.9, contribution: 25.0%)
  4. online_replicas  (effect: 10.9, contribution: 25.0%)