Fix data pipelines once and for all

This commit is contained in:
Andras Schmelczer 2026-06-10 21:27:32 +01:00
parent 08560476c5
commit 4012e4e047
46 changed files with 4508 additions and 855 deletions

View file

@ -36,6 +36,20 @@ SHRINKAGE_K = 50
# noisy year) without flattening genuine multi-year trends.
TEMPORAL_SMOOTHNESS_LAMBDA = 0.05
# Per-year support scaling for the temporal smoothness penalty. A flat lambda
# is too weak for years with very few repeat-sale pairs: a sector can have
# hundreds of pairs overall (so cell-level n/(n+k) shrinkage barely moves it)
# yet have individual years estimated from 1-2 pairs, producing 2-7x
# single-year index spikes. Each curvature row is therefore scaled by the
# local pair support of its year triple:
# lambda_eff = lambda0 * (1 + SMOOTHNESS_SUPPORT_PAIRS / s)
# where s is the minimum cross-year pair count among the triple's years.
# Well-supported years (s >> SMOOTHNESS_SUPPORT_PAIRS) keep lambda_eff ~
# lambda0 (current behaviour); a year identified by a single pair gets
# ~41x lambda0, pulling its beta strongly toward the local trend through its
# neighbours. Same-year pairs cancel in the design and are not counted.
SMOOTHNESS_SUPPORT_PAIRS = 40
def type_group_expr():
"""Polars expression: Property type -> type_group."""