Cardpulse grades its own homework in public.
Every Cardpulse forecast carries an 80% prediction band. This page shows what fraction of those bands actually contained the realized comp price after the horizon elapsed. No incumbent publishes this. That is the wedge.
loading backtest…
How the backtest is computed.
Every Cardpulse forecast carries an 80% prediction band [p10, p90]. Once a forecast reaches its target date (generated_at + horizon_days), the nightly job joins it to the realized comp price — the median of all ticks in a ±15-day window around the target date.
Each prediction is bucketed: in band (realized inside [p10, p90]), above (realized above p90), below (realized below p10), or no data (no comps in the target window). A perfectly calibrated 80% band lands in-band 80% of the time.
Re-evaluation is append-only: when new comps arrive in an old prediction's target window, a fresh row lands in the forecast_eval hypertable. The latest evaluation per prediction is what this page reports.
Collectibles markets move slow — single comps can be months apart for thin variants. The ±15-day window is the intentional compromise: tight enough to stay honest, wide enough to evaluate cards that trade monthly.
proper scoring rules
Pinball loss at q=0.10 and q=0.90 — the proper scoring rule for quantile forecasts. Lower is better.
Winkler interval score (Bentzien & Friederichs 2014) — width plus miss penalty in one number. Lower is better.
Mean signed error — average of (realized − p50) / p50, signed. Positive means the predictor leaned conservative; negative means it leaned aggressive. Italic on negative throughout the site, never red/green.
nightly cadence
The Arq cron at 02:00 UTC re-evaluates every elapsed prediction made in the last 730 days. Runs unconditionally regardless of what the verdicts look like. Willingness to publish bad numbers is the moat.