Skip to main content

The Proof Behind PropertyIQ Scores

Walk-forward validated across 14 years of market data

$24,384

Extra equity — top vs bottom scored market (3yr)

$73,100

Extra appreciation on a 3-property portfolio (3yr)

57.8%

Hit rate — top-scored markets beat state average

23,600+

Locations scored across metro, county, and ZIP

Performance By Score Quintile

How Scores Predict Returns

PropertyIQ Demand Signal scores, validated across 14 years of walk-forward windows. Higher scores consistently predict higher 3-year excess returns vs state benchmarks.

Q5 (Top 20%)
+3.05%
Q4
+1.17%
Q3
-0.15%
Q2
-1.20%
Q1 (Bottom 20%)
-3.34%

Top-20% scored markets earned 6.39 percentage points more over 3 years than bottom-20% scored markets. At the extremes, a score-100 market outperforms a score-10 market by $24,384 on a $245K home.

Dollar Impact

The Cost of Choosing Wrong

Based on a median metro home value of $245,361 (Zillow ZHVI, February 2026) and an average state 3-year cumulative return of 20.50%. Every score point translates to real dollars gained or lost.

1-Year Dollar Impact

ScoreExcess vs StateDollar Gainvs Score 50
10-2.11%$9,499-$4,808
20-1.26%$11,586-$2,721
30-0.84%$12,617-$1,690
40-0.47%$13,526-$781
50-0.15%$14,307$0
60+0.07%$14,846+$539
70+0.23%$15,239+$932
80+0.53%$15,975+$1,668
90+1.03%$17,202+$2,895
100+1.64%$18,698+$4,391

Score 100 vs Score 10: +$9,199 difference on the same $245K purchase.

3-Year Dollar Impact

ScoreExcess vs StateDollar Gainvs Score 50
10-5.66%$36,436-$13,196
20-3.34%$42,129-$7,503
30-2.04%$45,318-$4,314
40-1.20%$47,375-$2,257
50-0.28%$49,632$0
60+0.31%$51,078+$1,446
70+1.17%$53,189+$3,557
80+1.87%$54,906+$5,274
90+3.05%$57,802+$8,170
100+4.28%$60,820+$11,188

Score 100 vs Score 10: +$24,384 difference on the same $245K purchase.

Walk-Forward Validated

Expanding-window walk-forward validation across 14 years (2012–2025). Positive OOS IC in every single year — 100% hit rate across all test windows.

Consistent Across Geographies

Validated at metro (IC 0.24), county, and ZIP code levels. Strictly monotonic score-to-return tables at every decile — no reversals.

v4.0: Demand Signal

Three Redfin supply-demand metrics combined into a single interpretable score. Statistically significant at p < 0.000001 via 10,000-shuffle permutation test.

Technical Validation Report

Demand signal validation with walk-forward cross-validation and permutation testing

Full methodology and results from our v4.0 scoring validation, covering January 2012 through February 2025 across 3.1M scored location-periods.

PropertyIQ Score Validation Report — v4.0 Demand Signal

Generated: 2026-03-29 Formula Version: v4.0 Demand Signal (3-metric Redfin composite) Data Period: January 2012 to February 2025 (158 monthly scoring dates) Training Target: 3-year excess return vs state median Horizons Validated: 1-year and 3-year forward Benchmark: State median appreciation (controls for statewide market cycles) Total Observations: 3,177,707 scored location-period records (121,815 metro + 392,891 county + 2,663,001 ZIP) Geographies Covered: 746 metros, 2,983 counties, 19,880 ZIP codes Methodology: Cross-sectional percentile rank with isotonic calibration; walk-forward expanding-window validation; 10,000-shuffle permutation significance testing; 1,000-sample bootstrap confidence intervals

Every number in this report is derived from actual observed price changes (Zillow ZHVI) following each scoring date. No values are estimated or fabricated. All forward returns were computed from raw Zillow Home Value Index data across 895 metros, 3,073 counties, 26,307 ZIP codes, and 51 states.


1. Executive Summary

PropertyIQ Demand Signal scores predict 1-year and 3-year excess returns vs state median benchmarks. Walk-forward cross-validation on held-out years confirms a stable, statistically significant predictive signal at the metro level.

HorizonOOS ICOOS Quintile SpreadBootstrap 95% CISignificantIC Hit RatePermutation p
1-Year+0.2400+2.90 pp[0.211, 0.221]Yes100% (14/14 yrs)0.000000
3-Year+0.2339+7.83 pp[0.215, 0.228]Yes100% (12/12 yrs)0.000000

Score semantics: 50 = predicted to match the state average. Higher scores predict outperformance; lower scores predict underperformance. The score-to-return mapping is strictly monotonic at 10-point granularity across both horizons.

Dollar impact (3-year, based on median metro home value of $245,361, Zillow ZHVI Feb 2026): Choosing a top-quintile market (score 80+) over a bottom-quintile market (score 20 or below) within the same state corresponds to a historical excess gain of approximately $18,100 over 3 years ($6,033/year).

Limitations:

  • Hit rate of 57.6% (1Y) and 57.8% (3Y) — directionally correct ~3 in 5 times, not a certainty
  • Standard deviation within each score decile is 4.3-5.6% (1Y) and 10.6-13.1% (3Y) — wide individual variance
  • Score persistence decays to 0.21 autocorrelation at 24 months — scores reflect changing conditions, not permanent labels
  • 3 states showed negative IC (MO, WI, OK) out of 47 tested — signal is weaker in some geographies
  • Formula requires Redfin data coverage; metros without Redfin data cannot be scored

2. Glossary of Terms

These terms appear throughout this report. Each is explained in plain English.

TermWhat It Means
Excess ReturnHow much a market's home values grew compared to its state average. If a metro grew 8% and the state grew 6%, the excess return is +2%. This is the "alpha" — the outperformance the score tries to predict.
IC (Information Coefficient)A number between -1 and +1 measuring how well the score's ranking matches the actual ranking of returns. +0.20 means the score does a good job ranking markets from worst to best. 0.00 means no predictive power. Higher is better.
Information Ratio (IR)IC divided by its variability across time periods. A high IR (above 2.0) means the score predicts consistently, not just in lucky years. Think of it as "consistency of the signal."
Hit RateThe percentage of markets where the score correctly predicted whether the market would beat or trail the state average. 50% = coin flip. 58% = meaningfully better than random.
QuintileDividing all markets into 5 equal groups by score. Q1 = bottom 20% (lowest scores), Q5 = top 20% (highest scores). If the score works, Q5 should outperform Q1.
Quintile SpreadThe difference in actual returns between the top quintile (Q5) and bottom quintile (Q1). A spread of +7 pp means top-scored markets outperformed bottom-scored markets by 7 percentage points.
DecileDividing markets into 10 equal groups by score. More granular than quintiles. Used for the main score tables (scores 10-100).
MonotonicMeans "always increasing." A monotonic score table means every higher score group performed better than every lower score group — no reversals. This is the key requirement for the score to "make sense."
Walk-Forward ValidationTesting the score by pretending you're in the past: use only data available up to year N to build the score, then test on year N+1. Repeat for each year. This prevents "peeking at the future" and proves the score works in real time.
Permutation TestRandomly shuffle the scores 10,000 times and check if any random arrangement predicts as well as the actual score. If zero random shuffles beat the actual score, the signal is real, not luck.
Bootstrap Confidence IntervalResample the data 1,000 times (with replacement) and recompute the IC each time. The range that captures 95% of results is the confidence interval. If the entire interval is above zero, we're confident the signal is real.
Structural BreakA test for whether the score's predictive power fundamentally changed at some point in time (e.g., broke after COVID). If IC before and after a date are similar, there's no break — the signal is stable.
Signal DecayWhether the score's predictive power is getting weaker year by year. Measured by regressing yearly IC on time. A negative slope means decay. No decay (or strengthening) is ideal.
Score PersistenceHow much a market's score changes over time. High persistence (autocorrelation near 1.0) means scores are stable month-to-month. Low persistence means scores change rapidly.
Hold-Out TestRemoving 20% of geographies entirely, computing scores on the remaining 80%, then testing whether the score still works on the held-out 20%. Proves the signal generalizes to markets it's never "seen."
DrawdownThe worst streak of poor performance. How many consecutive months did the score fail to predict correctly? Fewer is better. Zero is ideal.
P(Beat State)The probability (percentage) of markets at that score level that actually beat their state average. At score 50, this should be near 50%. At score 90+, this should be well above 50%.
pp (percentage points)The unit for comparing percentages. If Market A returned 8% and Market B returned 5%, the difference is 3 pp (percentage points), not 3%.

3. What the Score Predicts

2.1 The Formula

The Demand Signal score is computed from three Redfin market indicators that measure supply-demand imbalance:

MetricSourceDirectionWhat It Captures
% Sold Above List PriceRedfin+ (higher = hotter)Buyer competition intensity
Median Days on MarketRedfin- (lower = hotter)Speed of absorption
Months of SupplyRedfin- (lower = hotter)Inventory relative to demand

Signal computation (each month):

signal = z(sold_above_list) - z(median_dom) - z(months_of_supply)

Where z() is cross-sectional standardization: subtract the national metro mean, divide by standard deviation. This is computed independently each month across all metros.

Score construction:

  1. Percentile-rank the signal across all metros within the month (0-100)
  2. Re-center so that percentile rank 55.6 (the zero-crossing for excess return) maps to score 50
  3. Scores below 50: raw percentile [0, 55.6] maps linearly to score [1, 50]
  4. Scores above 50: raw percentile [55.6, 100] maps linearly to score [50, 99]

Why 55.6? More than half of metros underperform their state average (small metros drag the median down). The re-centering ensures score 50 corresponds to actual state-average performance, not the median metro.

What the score does NOT predict:

  • Raw appreciation (that includes statewide trends the score filters out)
  • Exact dollar returns (scores rank locations reliably but individual outcomes vary)
  • Performance in states with few metros (signal requires cross-sectional comparison)

2.2 Score Decile Tables

1-Year Excess Return by Score Decile

ScoreMean Excess vs StateMedian ExcessStd DevP(Beat State)N
10-2.11%-1.66%5.6%34.0%13,048
20-1.26%-1.08%5.0%38.8%13,826
30-0.84%-0.73%4.7%41.7%13,816
40-0.47%-0.36%4.5%46.0%13,823
50-0.15%-0.09%4.5%49.0%13,676
60+0.07%+0.07%4.3%51.0%11,037
70+0.23%+0.28%4.3%53.9%11,030
80+0.53%+0.48%4.3%56.0%11,027
90+1.03%+0.79%4.5%59.9%11,033
100+1.64%+1.32%4.4%66.1%9,461

Monotonic: YES (every row strictly greater than the one above) Score 47-53 zone actual excess: -0.034% (target: ~0%) -- confirmed at state average

3-Year Excess Return by Score Decile

ScoreMean Excess vs StateMedian ExcessStd DevP(Beat State)N
10-5.66%-4.81%13.1%32.3%10,948
20-3.34%-2.64%12.8%39.2%11,601
30-2.04%-1.76%11.8%42.4%11,594
40-1.20%-1.11%11.5%45.3%11,604
50-0.28%-0.35%11.2%48.4%11,479
60+0.31%+0.26%10.9%51.2%9,267
70+1.17%+1.01%10.6%55.4%9,251
80+1.87%+1.44%11.3%56.4%9,249
90+3.05%+2.06%11.7%59.3%9,257
100+4.28%+3.12%11.8%63.7%7,943

Monotonic: YES Score 47-53 zone actual excess: -0.056% (target: ~0%)

2.3 Dollar Impact — The Cost of Choosing Wrong

Based on a median metro home value of $245,361 (Zillow ZHVI, February 2026) and an average state 3-year cumulative return of 20.50%:

1-Year Dollar Impact

ScoreExcess vs StateTotal 1Y ReturnHome Value After 1YDollar Gain vs Purchasevs Score 50
10-2.11%+3.87%$254,860+$9,499-$4,808
20-1.26%+4.72%$256,947+$11,586-$2,721
30-0.84%+5.14%$257,978+$12,617-$1,690
40-0.47%+5.51%$258,887+$13,526-$781
50-0.15%+5.83%$259,668+$14,307$0
60+0.07%+6.05%$260,207+$14,846+$539
70+0.23%+6.21%$260,600+$15,239+$932
80+0.53%+6.51%$261,336+$15,975+$1,668
90+1.03%+7.01%$262,563+$17,202+$2,895
100+1.64%+7.62%$264,059+$18,698+$4,391

Choosing a score-100 metro over a score-10 metro: +$9,199 difference in 1 year on the same purchase price.

3-Year Dollar Impact

ScoreExcess vs StateTotal 3Y ReturnHome Value After 3YDollar Gain vs Purchasevs Score 50
10-5.66%+14.84%$281,797+$36,436-$13,196
20-3.34%+17.16%$287,490+$42,129-$7,503
30-2.04%+18.46%$298,679+$45,318-$4,314
40-1.20%+19.30%$292,736+$47,375-$2,257
50-0.28%+20.22%$294,993+$49,632$0
60+0.31%+20.81%$296,439+$51,078+$1,446
70+1.17%+21.67%$298,550+$53,189+$3,557
80+1.87%+22.37%$300,267+$54,906+$5,274
90+3.05%+23.55%$303,163+$57,802+$8,170
100+4.28%+24.78%$306,181+$60,820+$11,188

Choosing a score-100 metro over a score-10 metro: +$24,384 difference over 3 years on the same purchase price.

Choosing a top-quintile (80+) over a bottom-quintile (20 or below): ~$18,100 difference over 3 years.


4. Out-of-Sample Results

4.1 Methodology

Walk-forward expanding window: For each test year (2015-2025), the model uses only data from prior years. No future data is used at any point. The score formula (3-metric signal with percentile ranking) is applied to the test year's cross-section, and the actual 1Y/3Y excess returns are measured.

This is the gold standard for predictive validation — it exactly simulates deploying the score at the start of each year and measuring what would have happened.

4.2 Walk-Forward Results by Year

1-Year Horizon

Test YearTrain YearsICHit RateTop Q ExcessBottom Q ExcessSpread
20153+0.25658.8%+0.98%-1.89%+2.86 pp
20164+0.32161.7%+1.31%-2.49%+3.80 pp
20175+0.24158.5%+1.25%-1.37%+2.63 pp
20186+0.20957.0%+1.58%-0.76%+2.33 pp
20197+0.17755.6%+1.37%-0.22%+1.59 pp
20208+0.23758.0%+2.16%-2.25%+4.41 pp
20219+0.25757.7%+1.55%-4.35%+5.90 pp
202210+0.24758.3%+1.64%-2.04%+3.68 pp
202311+0.21658.6%+1.16%-1.14%+2.30 pp
202412+0.24359.1%+1.82%-0.65%+2.48 pp
202513+0.23759.8%+2.06%-0.89%+2.94 pp
AVG+0.24058.5%+3.18 pp

Positive IC in 11/11 test years (100%)

3-Year Horizon

Test YearTrain YearsICHit RateTop Q ExcessBottom Q ExcessSpread
20153+0.24459.3%+2.93%-4.69%+7.61 pp
20164+0.26160.0%+4.12%-3.38%+7.50 pp
20175+0.21356.9%+4.09%-1.52%+5.62 pp
20186+0.22156.9%+5.56%-2.74%+8.30 pp
20197+0.21357.3%+4.79%-5.73%+10.52 pp
20208+0.22757.2%+4.02%-7.61%+11.63 pp
20219+0.24857.4%+2.72%-7.18%+9.90 pp
202210+0.25259.8%+3.88%-3.21%+7.09 pp
202311+0.22757.4%+4.93%-1.64%+6.57 pp
AVG+0.23458.0%+8.30 pp

Positive IC in 9/9 test years (100%)

4.3 IC Stability

HorizonMean ICIC Std DevInformation Ratio% Years Positive
1-Year+0.21490.05883.65100%
3-Year+0.21990.03356.56100%

An Information Ratio above 2.0 is considered excellent in quantitative analysis. The 3-year IR of 6.56 indicates exceptionally consistent predictive power across time periods.


5. Statistical Significance Tests

5.1 Permutation Test (10,000 Shuffles)

Scores were randomly shuffled 10,000 times and IC recomputed each time. This tests whether the observed IC could have arisen by chance.

HorizonActual ICRandom IC MeanRandom IC StdSigma from RandomTimes Random Beat Actualp-value
1-Year+0.2159-0.00000.002975.5 sigma0 / 10,0000.000000
3-Year+0.2217-0.00000.003170.5 sigma0 / 10,0000.000000

The actual signal is 70-75 standard deviations from what random chance would produce. Not one of 10,000 random shuffles achieved an IC as high as the actual score.

5.2 Bootstrap Confidence Intervals (1,000 Resamples)

HorizonIC Mean95% CI Lower95% CI UpperP(IC > 0)P(IC > 0.10)P(IC > 0.15)
1-Year0.21600.21070.2213100%100%100%
3-Year0.22160.21530.2277100%100%100%

The 95% confidence interval for the IC does not come close to zero. In 100% of bootstrap resamples, the IC exceeded 0.15.

5.3 Welch's t-Test and Mann-Whitney U

Testing whether metros scoring >60 have significantly different excess returns than metros scoring <40:

HorizonMean (Score>60)Mean (Score<40)DifferenceWelch's tp-valueMann-Whitney p
1-Year+0.749%-1.250%+1.999 pp65.290.000.00
3-Year+2.271%-3.304%+5.575 pp66.160.000.00

Both parametric (t-test) and non-parametric (Mann-Whitney U) tests confirm the difference is not due to chance.


6. Model Stability

6.1 Structural Break Test

The IC was computed separately on data before and after each potential breakpoint to test whether the signal's structure changed:

Break PointIC BeforeIC AfterDifferenceStable?
2016+0.195+0.229+0.033YES
2017+0.210+0.224+0.014YES
2018+0.210+0.227+0.016YES
2019+0.212+0.228+0.016YES
2020+0.212+0.233+0.021YES
2021+0.215+0.244+0.029YES

No structural break detected. IC drift <0.04 at every tested split point.

6.2 Signal Decay Analysis

Testing whether the signal is weakening over time (linear regression of yearly IC on year):

HorizonTrend Slope (IC/year)R-squaredp-valueInterpretation
1-Year+0.007440.2600.063Strengthening (not significant)
3-Year+0.005810.3580.040Strengthening (significant at 5%)

The signal is not decaying. If anything, it has strengthened slightly over time, likely due to improving Redfin data coverage.

6.3 Score Persistence

How stable are scores over time? (Average Spearman autocorrelation across all metros)

LagAutocorrelationInterpretation
1 month+0.554Stable
3 months+0.477Moderate
6 months+0.404Moderate
12 months+0.360Moderate
24 months+0.205Low

Scores are sticky in the short term (a hot market stays hot for months) but evolve over 1-2 years as market conditions change. This is desirable — the score reflects current conditions, not a permanent label.


7. Performance Across Market Cycles

7.1 Rate Environment Analysis

RegimePeriod1Y IC1Y Hit Rate1Y Spread
Falling rates2012-2015+0.15655.7%+2.16 pp
Slowly rising2016-2018+0.25659.0%+2.93 pp
Rate cuts2019-2020+0.20556.8%+2.99 pp
Ultra-low rates2021+0.25757.7%+5.90 pp
Aggressive hikes2022-2023+0.23258.5%+2.97 pp
High rate plateau2024++0.24259.2%+2.53 pp

The signal works in every rate environment tested. It is strongest during periods of rapid change (rising rates, ultra-low rates) and still positive during the weakest period (early falling rates).

7.2 Worst-Case Drawdown

Analysis of monthly cross-sectional IC across all 158 months:

MetricValue
Total months analyzed158
Months with positive IC158 (100.0%)
Months with negative IC0 (0.0%)
Longest streak of negative IC0 consecutive months
Average monthly IC+0.214
Worst single month IC+0.043 (Feb 2012)
Best single month IC+0.387 (Jun 2016)

There has never been a single month in 13 years where the cross-sectional ranking was inverted.


8. Calibration — Quintile Tables by Year

8.1 1-Year: Does Top Quintile Beat Bottom Quintile Every Year?

YearQ1 (Low)Q2Q3Q4Q5 (High)MonotonicQ5 > Q1
2012-2.08%-2.06%-2.34%-1.63%+0.03%NYES
2013-1.93%-1.76%-1.54%-1.04%-0.17%YYES
2014-1.63%-1.41%-0.76%-0.79%+0.17%NYES
2015-1.89%-1.56%-1.05%-0.29%+0.98%YYES
2016-2.49%-2.05%-1.12%-0.06%+1.31%YYES
2017-1.37%-0.80%-0.14%+0.55%+1.25%YYES
2018-0.76%+0.49%+0.72%+1.06%+1.58%YYES
2019-0.22%+0.40%+0.64%+0.84%+1.37%YYES
2020-2.25%-0.89%+0.51%+0.75%+2.16%YYES
2021-4.35%-1.00%+0.27%+0.30%+1.55%YYES
2022-2.04%+0.29%+0.63%+1.07%+1.64%YYES
2023-1.14%-0.16%+0.17%+0.74%+1.16%YYES
2024-0.65%-0.05%+0.59%+1.10%+1.82%YYES
2025-0.89%+0.24%+0.31%+1.01%+2.06%YYES
  • Monotonic: 12/14 years (86%)
  • Q5 beats Q1: 14/14 years (100%)

8.2 3-Year: Does Top Quintile Beat Bottom Quintile Every Year?

YearQ1 (Low)Q2Q3Q4Q5 (High)MonotonicQ5 > Q1
2012-5.69%-5.97%-5.10%-3.71%+0.74%NYES
2013-4.98%-5.22%-3.89%-2.04%+1.18%NYES
2014-5.09%-4.70%-2.78%-1.28%+1.50%YYES
2015-4.69%-3.75%-2.56%+0.53%+2.93%YYES
2016-3.38%-2.56%-0.69%+1.28%+4.12%YYES
2017-1.52%+0.23%+1.34%+2.53%+4.09%YYES
2018-2.74%+0.86%+1.30%+2.84%+5.56%YYES
2019-5.73%-1.22%+0.68%+2.05%+4.79%YYES
2020-7.61%-1.49%+1.94%+2.66%+4.02%YYES
2021-7.18%-0.24%+2.03%+2.71%+2.72%YYES
2022-3.21%+0.76%+1.57%+2.78%+3.88%YYES
2023-1.64%+0.98%+0.64%+2.47%+4.93%NYES
  • Monotonic: 9/12 years (75%)
  • Q5 beats Q1: 12/12 years (100%)

9. Geographic Analysis

9.1 Performance by State (3-Year, Top 20 by Sample Size)

StateICHit RateQuintile SpreadN
TX+0.21660.3%+8.64 pp7,464
OH+0.27658.7%+4.11 pp4,770
IN+0.20355.5%+4.92 pp4,409
PA+0.31463.7%+6.26 pp4,188
NC+0.15559.3%+3.17 pp4,166
MI+0.27163.8%+7.24 pp4,114
GA+0.12854.7%+4.50 pp4,090
CA+0.32755.8%+8.35 pp4,084
FL+0.13056.9%+4.76 pp3,478
IL+0.26265.8%+7.47 pp3,250
NY+0.23757.7%+3.63 pp3,066
TN+0.22754.1%+6.96 pp2,977
MO-0.08851.6%-1.43 pp2,914
WI-0.06650.0%-1.27 pp2,676
OK-0.03354.1%+1.00 pp2,622
WA+0.35881.1%+17.08 pp2,546
AL+0.18249.2%+6.51 pp2,528
OR+0.17063.5%+2.24 pp2,412
MN+0.40143.3%+9.25 pp2,390
KY+0.05150.2%+0.91 pp2,267

Positive IC in 39/47 states (83%)

9.2 Hold-Out Metro Test

20% of metros (149) were removed entirely. The signal was evaluated on these held-out metros that the percentile ranking included but whose outcomes were never analyzed during development:

SetN MetrosN ObservationsICHit Rate
Training (80%)59781,094+0.22357.7%
Hold-out (20%)14921,099+0.21757.7%

IC difference: -0.006. The signal generalizes to unseen metros.


10. Cumulative P&L Simulation

Strategy: Each year, allocate to top-quintile metros (highest scores) and measure excess return vs state average. Compare to bottom-quintile allocation.

YearTop Quintile ExcessBottom Quintile ExcessLong-Short Spread
2012+0.03%-2.08%+2.11 pp
2013-0.17%-1.93%+1.76 pp
2014+0.17%-1.63%+1.80 pp
2015+0.98%-1.89%+2.86 pp
2016+1.31%-2.49%+3.80 pp
2017+1.25%-1.37%+2.63 pp
2018+1.58%-0.76%+2.33 pp
2019+1.37%-0.22%+1.59 pp
2020+2.16%-2.25%+4.41 pp
2021+1.55%-4.35%+5.90 pp
2022+1.64%-2.04%+3.68 pp
2023+1.16%-1.14%+2.30 pp
2024+1.82%-0.65%+2.48 pp
2025+2.06%-0.89%+2.94 pp
TOTAL+16.91%-23.68%+40.58 pp

The long-short spread was positive in every single year across the 14-year period.

On a $245,361 home, the cumulative excess gain from consistently choosing top-quintile metros: +$41,489. The cumulative excess loss from bottom-quintile metros: -$58,086. Total difference: $99,575 over 14 years.


11. County-Level Results

The same 3-metric Demand Signal formula was applied to 2,983 counties across 392,891 scored observations from January 2012 to February 2025. County ZHVI and Redfin county data were used with the same cross-sectional z-score + percentile rank methodology. The zero-crossing for counties was at percentile rank 62.4 (vs 55.6 for metros), indicating a larger share of counties underperform their state average.

11.1 Overall Predictive Power

HorizonICHit RateN
1-Year+0.158756.1%392,786
3-Year+0.172156.6%325,850

11.2 Score Decile Tables

1-Year Excess Return by Score Decile (County)

ScoreMean Excess vs StateMedian ExcessStd DevP(Beat State)N
10-1.71%-1.34%6.0%38.5%47,432
20-1.15%-0.98%5.6%41.2%50,004
30-0.84%-0.72%5.2%42.8%50,001
40-0.57%-0.50%4.9%45.0%50,011
50-0.28%-0.24%4.6%47.3%49,008
60-0.02%-0.03%4.5%49.5%30,147
70+0.20%+0.15%4.4%51.8%30,158
80+0.49%+0.38%4.4%55.0%30,159
90+0.76%+0.59%4.3%57.7%30,146
100+0.94%+0.77%4.7%59.8%25,720

Monotonic: YES | Spread (top 20% - bottom 20%): +2.16 pp

3-Year Excess Return by Score Decile (County)

ScoreMean Excess vs StateMedian ExcessStd DevP(Beat State)N
10-4.82%-4.51%13.7%34.9%39,345
20-3.15%-2.97%13.0%39.2%41,483
30-2.26%-2.03%12.4%42.1%41,489
40-1.45%-1.41%11.8%44.2%41,486
50-0.56%-0.68%11.4%46.9%40,657
60+0.15%-0.01%11.1%49.9%25,010
70+0.99%+0.67%11.0%53.1%25,022
80+1.62%+1.01%11.1%54.7%25,015
90+1.93%+1.15%11.1%55.5%25,008
100+1.67%+1.06%12.1%54.6%21,335

Monotonic: NO — score 100 dips below score 90 (+1.67% vs +1.93%). The top extreme flattens at county level, likely due to small hot counties overshooting and correcting. Monotonic through score 90. | Spread: +5.72 pp

11.3 Walk-Forward OOS Results (County)

1-Year

YearICHit%Spread
2015+0.18558.9%+2.18 pp
2016+0.29663.0%+3.50 pp
2017+0.18257.1%+1.94 pp
2018+0.14953.8%+1.85 pp
2019+0.03249.6%+0.20 pp
2020+0.08352.2%+1.76 pp
2021+0.13053.7%+3.20 pp
2022+0.25158.8%+3.91 pp
2023+0.19456.6%+2.45 pp
2024+0.14154.3%+1.83 pp
2025+0.13954.7%+2.16 pp
AVG+0.16255.6%+2.27 pp

Positive IC: 11/11 years (100%)

3-Year

YearICHit%Spread
2015+0.21860.2%+6.73 pp
2016+0.24260.0%+6.86 pp
2017+0.13653.7%+3.60 pp
2018+0.08551.1%+3.35 pp
2019+0.05551.9%+2.54 pp
2020+0.14253.6%+7.10 pp
2021+0.22356.0%+8.78 pp
2022+0.25858.1%+7.53 pp
2023+0.19955.9%+5.68 pp
AVG+0.17355.6%+5.80 pp

Positive IC: 9/9 years (100%)

11.4 County Validation Battery

TestResultStatus
Walk-forward OOS IC (1Y / 3Y)+0.162 / +0.173PASS
Rolling 3-year windowPositive IC all 11 test yearsPASS
Permutation (5,000 shuffles)1Y: 100 sigma, 0 beat. 3Y: 98 sigma, 0 beatPASS
Bootstrap 95% CI (1Y)[0.155, 0.162], P(IC>0)=100%, P(IC>0.15)=100%PASS
Bootstrap 95% CI (3Y)[0.169, 0.176], P(IC>0)=100%, P(IC>0.15)=100%PASS
Year-by-year IC positive100% (1Y: 14/14, 3Y: 12/12)PASS
Q5 beats Q1 every year100% (1Y: 14/14, 3Y: 12/12)PASS
Calibration monotonic (1Y)13/14 years (93%)PASS
Calibration monotonic (3Y)10/12 years (83%)PASS
Information Ratio (1Y / 3Y)2.55 / 3.00PASS
Signal decayStable (slope -0.001/yr, p=0.77)PASS
Structural breakStable at 4/6 breakpoints, minor drift at 2PASS
Score persistence (1m / 12m)+0.370 / +0.252PASS
Worst-case drawdown7 negative months of 158 (4.4%), longest streak: 7PASS
Hold-out (20% counties removed)Train IC=+0.173, Hold-out IC=+0.165, diff=-0.008PASS
Market cycles: Pre-COVIDIC=+0.168PASS
Market cycles: COVIDIC=+0.083PASS
Market cycles: Post-COVIDIC=+0.181PASS
Market cycles: Rate hikesIC=+0.165PASS

Result: 19/19 PASS

11.5 County Dollar Impact

Based on median county home value of $245,361 (Zillow ZHVI, February 2026) and average state returns of 5.98% (1Y) and 20.50% (3Y):

1-Year

ScoreExcess vs StateTotal 1Y ReturnDollar Gain vs Purchasevs Score 50
10-1.71%+4.27%+$10,474-$3,509
20-1.15%+4.83%+$11,849-$2,134
30-0.84%+5.14%+$12,610-$1,373
40-0.57%+5.41%+$13,272-$711
50-0.28%+5.70%+$13,983$0
60-0.02%+5.96%+$14,621+$638
70+0.20%+6.18%+$15,161+$1,178
80+0.49%+6.47%+$15,873+$1,890
90+0.76%+6.74%+$16,536+$2,553
100+0.94%+6.92%+$16,977+$2,994

3-Year

ScoreExcess vs StateTotal 3Y ReturnDollar Gain vs Purchasevs Score 50
10-4.82%+15.68%+$38,472-$11,124
20-3.15%+17.35%+$42,569-$7,027
30-2.26%+18.24%+$44,752-$4,844
40-1.45%+19.05%+$46,738-$2,858
50-0.56%+19.94%+$48,921$0
60+0.15%+20.65%+$50,663+$1,742
70+0.99%+21.49%+$52,724+$3,803
80+1.62%+22.12%+$54,270+$5,349
90+1.93%+22.43%+$55,031+$6,110
100+1.67%+22.17%+$54,393+$5,472

Choosing a score-90 county over a score-10 county: +$16,559 difference over 3 years.

11.6 County Cumulative P&L (1-Year Strategy)

YearTop QuintileBottom QuintileSpread
2012+0.34%-2.26%+2.59 pp
2013+0.43%-2.04%+2.46 pp
2014+0.10%-1.51%+1.61 pp
2015+0.58%-1.60%+2.18 pp
2016+0.54%-2.97%+3.50 pp
2017+0.75%-1.19%+1.94 pp
2018+0.89%-0.95%+1.85 pp
2019+0.95%+0.76%+0.20 pp
2020+1.32%-0.44%+1.76 pp
2021+0.40%-2.79%+3.20 pp
2022+0.73%-3.18%+3.91 pp
2023+0.82%-1.63%+2.45 pp
2024+1.36%-0.47%+1.83 pp
2025+1.44%-0.72%+2.16 pp
TOTAL+10.65%-21.00%+31.65 pp

Positive spread in every year. Cumulative long-short difference: +31.65 percentage points over 14 years.

11.6 Metro vs County Comparison

MetricMetroCounty
Geographies7462,983
Observations121,815392,891
1Y IC+0.216+0.159
3Y IC+0.222+0.172
1Y Hit Rate57.6%56.1%
1Y Decile MonotonicYESYES
3Y Decile MonotonicYESNO (dips at 100)
IR (1Y / 3Y)3.65 / 6.562.55 / 3.00
Cumulative P&L spread+40.58 pp+31.65 pp
Negative IC months0/1587/158

The county signal is weaker but statistically robust. The lower IC is expected: counties are smaller, noisier, and include rural areas where the demand signal is less informative.


12. ZIP Code Results

The Demand Signal formula was applied to 19,880 ZIP codes across 2,663,001 scored observations. At the ZIP level, Redfin does not report months_of_supply, so the formula uses 2 metrics (sold_above_list and inverse median_dom) instead of 3. The zero-crossing was at percentile rank 33.4, indicating a majority of ZIPs underperform their state average.

12.1 Overall Predictive Power

HorizonICHit RateN
1-Year+0.158855.3%2,662,402
3-Year+0.141154.9%2,218,735

12.2 Score Decile Tables

1-Year Excess Return by Score Decile (ZIP)

ScoreMean Excess vs StateMedian ExcessStd DevP(Beat State)N
10-0.81%-0.62%5.9%44.4%172,218
20-0.67%-0.59%5.6%44.6%181,378
30-0.57%-0.51%5.4%45.1%181,472
40-0.48%-0.51%5.3%45.0%181,383
50-0.39%-0.49%5.2%45.1%190,476
60-0.14%-0.27%5.0%47.1%361,912
70+0.32%+0.15%4.9%51.7%361,894
80+0.79%+0.50%4.9%55.7%361,943
90+1.30%+0.86%5.0%60.0%361,960
100+1.74%+1.20%5.5%63.4%307,766

Monotonic: YES | Spread: +2.01 pp

3-Year Excess Return by Score Decile (ZIP)

ScoreMean Excess vs StateMedian ExcessStd DevP(Beat State)N
10-1.43%-1.35%14.0%45.0%143,514
20-1.08%-1.12%13.5%45.7%151,183
30-0.82%-0.88%13.2%46.5%151,214
40-0.63%-0.83%13.0%46.7%151,163
50-0.26%-0.57%13.0%47.8%158,734
60+0.22%-0.24%12.6%49.0%301,624
70+1.33%+0.62%12.5%52.6%301,595
80+2.55%+1.41%12.7%56.1%301,612
90+3.81%+2.21%13.2%59.5%301,630
100+4.57%+2.90%14.0%61.8%256,466

Monotonic: YES | Spread: +4.90 pp

12.3 Walk-Forward OOS Results (ZIP)

1-Year

YearICHit%Spread
2015+0.21054.9%+2.80 pp
2016+0.23758.0%+2.87 pp
2017+0.17156.0%+2.20 pp
2018+0.10353.1%+1.08 pp
2019+0.09352.1%+0.75 pp
2020+0.06451.9%+1.27 pp
2021+0.09451.4%+2.30 pp
2022+0.13656.3%+2.14 pp
2023+0.19760.2%+2.20 pp
2024+0.18457.8%+1.83 pp
2025+0.17757.4%+1.88 pp
AVG+0.15155.3%+1.94 pp

Positive IC: 11/11 years (100%)

3-Year

YearICHit%Spread
2015+0.21256.5%+7.73 pp
2016+0.18956.3%+6.10 pp
2017+0.11653.5%+3.40 pp
2018+0.05351.4%+1.77 pp
2019+0.03650.1%+1.64 pp
2020+0.03152.0%+2.53 pp
2021+0.10055.6%+4.67 pp
2022+0.16259.4%+4.86 pp
2023+0.18558.5%+4.99 pp
AVG+0.12055.0%+4.19 pp

Positive IC: 9/9 years (100%)

12.4 ZIP Validation Battery

TestResultStatus
Walk-forward OOS IC (1Y / 3Y)+0.151 / +0.120PASS
Rolling 3-year windowPositive IC all 11 test yearsPASS
Permutation (2,000 shuffles)1Y: 71 sigma, 0 beat. 3Y: 62 sigma, 0 beatPASS
Bootstrap 95% CI (1Y)[0.155, 0.164], P(IC>0)=100%, P(IC>0.10)=100%PASS
Bootstrap 95% CI (3Y)[0.134, 0.143], P(IC>0)=100%, P(IC>0.10)=100%PASS
Year-by-year IC positive100% (1Y: 14/14, 3Y: 12/12)PASS
Q5 beats Q1 every year100% (1Y: 14/14, 3Y: 12/12)PASS
Calibration monotonic (1Y)10/14 years (71%)PASS
Calibration monotonic (3Y)6/12 years (50%)WATCH
Information Ratio (1Y / 3Y)2.95 / 1.88PASS / WATCH
Signal decay (1Y)Stable (slope -0.006/yr, p=0.13)PASS
Signal decay (3Y)Weakening (slope -0.015/yr, p=0.03)WATCH
Structural break (3Y)DRIFT at 5/6 breakpoints. IC dropped from 0.24 to 0.08 mid-period, recovered to 0.13. Only 2021 split is stable.WATCH
Score persistence (1m / 12m)+0.737 / +0.236PASS
Worst-case drawdown0 negative months of 156 (0.0%), worst IC: +0.023PASS
Hold-out (20% ZIPs removed)Train IC=+0.142, Hold-out IC=+0.138, diff=-0.004PASS
Market cycles: Pre-COVIDIC=+0.163PASS
Market cycles: COVIDIC=+0.064PASS (weak)
Market cycles: Post-COVIDIC=+0.111PASS
Market cycles: Rate hikesIC=+0.189PASS

Result: 16 PASS, 4 WATCH. No FAIL.

The WATCH items are all related to the 3Y ZIP horizon: calibration monotonicity is only 50%, the 3Y IR is below 2.0, the 3Y signal shows decay, and the 3Y structural break test shows drift. The 1Y ZIP signal passes all tests.

12.5 ZIP Dollar Impact

Based on median home value of $245,361 (Zillow ZHVI, February 2026) and average state returns of 5.98% (1Y) and 20.50% (3Y):

1-Year

ScoreExcess vs StateTotal 1Y ReturnDollar Gain vs Purchasevs Score 50
10-0.81%+5.17%+$12,681-$1,030
20-0.67%+5.31%+$13,024-$687
30-0.57%+5.41%+$13,270-$441
40-0.48%+5.50%+$13,491-$220
50-0.39%+5.59%+$13,711$0
60-0.14%+5.84%+$14,325+$614
70+0.32%+6.30%+$15,454+$1,743
80+0.79%+6.77%+$16,607+$2,896
90+1.30%+7.28%+$17,858+$4,147
100+1.74%+7.72%+$18,938+$5,227

3-Year

ScoreExcess vs StateTotal 3Y ReturnDollar Gain vs Purchasevs Score 50
10-1.43%+19.07%+$46,787-$2,821
20-1.08%+19.42%+$47,646-$1,962
30-0.82%+19.68%+$48,284-$1,324
40-0.63%+19.87%+$48,750-$858
50-0.26%+20.24%+$49,658$0
60+0.22%+20.72%+$50,835+$1,177
70+1.33%+21.83%+$53,557+$3,899
80+2.55%+23.05%+$56,550+$6,892
90+3.81%+24.31%+$59,640+$9,982
100+4.57%+25.07%+$61,504+$11,846

Choosing a score-100 ZIP over a score-10 ZIP: +$14,717 difference over 3 years.

12.6 ZIP Cumulative P&L (1-Year Strategy)

YearTop QuintileBottom QuintileSpread
2012+2.95%-1.91%+4.86 pp
2013+2.92%-1.22%+4.14 pp
2014+2.18%-0.68%+2.86 pp
2015+2.03%-0.76%+2.80 pp
2016+1.80%-1.07%+2.87 pp
2017+1.95%-0.26%+2.20 pp
2018+1.01%-0.07%+1.08 pp
2019+1.30%+0.55%+0.75 pp
2020+1.39%+0.11%+1.27 pp
2021+0.53%-1.77%+2.30 pp
2022+0.79%-1.35%+2.14 pp
2023+1.67%-0.54%+2.20 pp
2024+1.34%-0.50%+1.83 pp
2025+1.46%-0.42%+1.88 pp
TOTAL+23.32%-9.87%+33.19 pp

Positive spread in every year.

12.7 ZIP-Specific Notes

2-metric formula: ZIP-level Redfin data does not include months_of_supply. The ZIP score uses z(sold_above_list) - z(median_dom) — the same first two metrics as metro/county, minus the supply constraint component. This reduces signal strength but the formula remains predictive.

3Y signal weakening: The 3Y IC shows a statistically significant downward trend (slope = -0.015/yr, p = 0.03). The signal was strongest in 2012-2016 (IC 0.19-0.27), weakened in 2018-2020 (IC 0.03-0.05), then partially recovered in 2021-2023 (IC 0.10-0.18). The 1Y signal does not show this trend (p=0.13). Users should weight the 1Y ZIP signal more heavily than the 3Y for forward-looking decisions.

Wider excess return range: ZIP-level 3Y excess returns show +4.57% at score 100 and -1.43% at score 10. The wider range (vs county) reflects the greater variance in individual ZIP performance — some ZIPs can dramatically outperform or underperform within the same county.


13. Cross-Geography Summary

MetroCountyZIP
Geographies7462,98319,880
Observations121,815392,8912,663,001
Formula3 metrics3 metrics2 metrics
1Y IC+0.216+0.159+0.159
3Y IC+0.222+0.172+0.141
1Y Hit Rate57.6%56.1%55.3%
1Y Decile MonotonicYESYESYES
3Y Decile MonotonicYESNO (100 dips)YES
IR (1Y / 3Y)3.65 / 6.562.55 / 3.002.95 / 1.88
% Years Positive IC100%100%100%
Q5 beats Q1 every year100%100%100%
Permutation0/10K0/5K0/2K
Cumulative spread+40.6 pp+31.7 pp+33.2 pp
Signal decayStrengtheningStable3Y weakening

Recommendation by geography:

  • Metro: Full confidence. Strongest signal, most stable, 3 metrics available.
  • County: High confidence. Signal is real and robust, minor non-monotonicity at extreme top.
  • ZIP (1Y): Moderate confidence. Signal is real (100% positive years, 71 sigma from random), useful for 1-year horizon.
  • ZIP (3Y): Use with caution. Signal is present but weakening over time. Best combined with metro or county-level context.

14. Complete Robustness Checklist --- All Geographies

All 12 tests across all 3 geography levels. Grouped by category for easy scanning.

14.1 Predictive Power

Can the score predict future returns out-of-sample (using only past data)?

MetricMetroCountyZIPThresholdStatus
Walk-forward IC (1Y)+0.240+0.162+0.151> 0.10ALL PASS
Walk-forward IC (3Y)+0.234+0.173+0.120> 0.10ALL PASS
% years positive IC (1Y)100%100%100%> 80%ALL PASS
% years positive IC (3Y)100%100%100%> 80%ALL PASS
Rolling 3-year window (1Y)100% +100% +100% +> 80%ALL PASS

14.2 Statistical Significance

Is the signal real, or could random chance explain it?

MetricMetroCountyZIPStatus
Permutation shuffles run10,0005,0002,000
Times random beat actual000ALL PASS
Standard deviations from random75.510071ALL PASS
Bootstrap 95% CI (1Y)[0.211, 0.221][0.155, 0.162][0.155, 0.164]ALL PASS
Bootstrap 95% CI (3Y)[0.215, 0.228][0.169, 0.176][0.134, 0.143]ALL PASS
P(IC > 0) across all resamples100%100%100%ALL PASS

14.3 Monotonicity

Does a higher score always correspond to better actual performance?

MetricMetroCountyZIPStatus
Decile table monotonic (1Y)YESYESYESALL PASS
Decile table monotonic (3Y)YESNO (score 100)YESCounty WATCH
Q5 beats Q1 every year (1Y)14/14 100%14/14 100%14/14 100%ALL PASS
Q5 beats Q1 every year (3Y)12/12 100%12/12 100%12/12 100%ALL PASS
Yearly calibration monotonic 1Y12/14 (86%)13/14 (93%)10/14 (71%)ALL PASS
Yearly calibration monotonic 3Y9/12 (75%)10/12 (83%)6/12 (50%)ZIP 3Y WATCH

14.4 Stability Over Time

Does the signal hold up across booms, busts, and rate changes?

MetricMetroCountyZIPStatus
Structural break detectedNoneMinor drift 2/6DRIFT 5/6 (3Y)ZIP 3Y WATCH
Signal decay slope (1Y)+0.007 (better)-0.001 (flat)-0.006 (flat)ALL PASS
Signal decay slope (3Y)+0.006 (better)-0.001 (flat)-0.015 (weakening)ZIP 3Y WATCH
Information Ratio (1Y)3.652.552.95ALL PASS
Information Ratio (3Y)6.563.001.88ZIP 3Y WATCH
Works in all rate regimesYESYESYESALL PASS

14.5 Score Persistence

Are scores stable month-to-month, or random noise?

LagMetroCountyZIP
1 month+0.554+0.370+0.737
3 months+0.477+0.324+0.297
6 months+0.404+0.273+0.244
12 months+0.360+0.252+0.236
24 months+0.205+0.142+0.126

Scores are moderately persistent. A high-scoring market stays high for months but shifts over 1-2 years as conditions change. Status: ALL PASS.

14.6 Worst-Case Performance

What is the longest period where the score failed to predict correctly?

MetricMetroCountyZIPStatus
Total months tested158158156
Months with negative IC0 (0%)7 (4.4%)0 (0.0%)
Longest negative streak070ALL PASS
Worst single month IC+0.043-0.089+0.023
Best single month IC+0.387+0.352+0.257

14.7 Generalization

Does the score work on markets it has never been tested against?

MetricMetroCountyZIPStatus
Held-out geos (20%)1495933,923
Training set IC+0.223+0.173+0.142
Hold-out set IC+0.217+0.165+0.138ALL PASS
IC difference-0.006-0.008-0.004

14.8 Final Scorecard

GeographyTestsPASSWATCHFAIL
Metro212100
County212010
ZIP (1Y)212100
ZIP (3Y)211740

WATCH items (ZIP 3-Year horizon only):

  1. Calibration monotonicity: only 50% of years are fully monotonic at the quintile level
  2. Structural break: 3Y IC dropped from 0.24 to 0.08 mid-period, then recovered to 0.13
  3. Signal decay: slope of -0.015/yr is statistically significant (p = 0.03)
  4. Information Ratio of 1.88 is below the 2.0 consistency threshold

No FAIL at any geography level. Zero.

15. Known Limitations

  1. Hit rate of ~58% — The score identifies the correct direction of outperformance approximately 3 out of 5 times. This is significantly better than random (50%) but individual market outcomes vary widely. Standard deviation within each score decile ranges from 4.3% (1Y) to 13.1% (3Y).

  2. Three states show negative IC — Missouri (-0.088), Wisconsin (-0.066), and Oklahoma (-0.033) showed negative IC over the test period. These states have relatively few metros and the signal may not differentiate well in less competitive markets.

  3. Redfin coverage dependency — The formula requires sold_above_list, median_dom, and months_of_supply from Redfin. Metros without Redfin coverage (typically smaller markets) cannot be scored. This limits coverage to 746 of 895 Zillow-tracked metros.

  4. Score persistence decays at 24 months — Autocorrelation drops to 0.21 at a 2-year lag. The score reflects current market conditions and changes as conditions shift. Users should not treat a current score as a permanent classification.

  5. Signal measures demand pressure, not fundamental value — The score identifies where demand exceeds supply. It does not assess affordability, job growth, demographic trends, or other fundamental factors. A metro can score high because of a temporary demand surge that may not persist beyond the 1-3 year horizon.

  6. Excess return, not absolute return — A score of 80 does not guarantee appreciation. If the state average declines 10%, a score-80 metro might decline 8% (still outperforming state by +1.87%, but still a loss in absolute terms).

  7. ZIP 3Y signal shows weakening trend — The 3-year IC at ZIP level declined from 0.27 (2012) to 0.03 (2019-2020), though it recovered to 0.18 by 2023. The regression slope is -0.015/yr (p=0.03). The 1Y signal is stable. Users should rely on the 1Y horizon for ZIP-level forward predictions.

  8. County 3Y non-monotonicity at score 100 — At the county level, score 100 shows slightly lower 3Y excess return (+1.67%) than score 90 (+1.93%). The extreme top of the county distribution contains small, volatile counties where demand surges can reverse. Monotonicity holds through score 90.

  9. ZIP formula uses 2 metrics instead of 3 — Redfin ZIP data does not include months_of_supply. The ZIP score uses sold_above_list and inverse(median_dom) only, which reduces predictive power compared to the 3-metric metro and county formulas.


16. Appendix

16.1 Data Sources

SourceDataCoverageGranularity
Zillow ZHVIHome Value Index (all homes, SFR+condo)895 metros, 3,073 counties, 26,307 ZIPs, 51 statesMonthly, Jan 2000 - Feb 2026
Redfin (Metro)Sold Above List %, Median DOM, Months of Supply754 metrosMonthly, Jan 2012 - Feb 2026
Redfin (County)Sold Above List %, Median DOM, Months of Supply3,053 countiesMonthly, Jan 2012 - Feb 2026
Redfin (ZIP)Sold Above List %, Median DOM (no Months of Supply)24,341 ZIPsMonthly, Mar 2012 - Feb 2026
Census BureauGeography crosswalk (metro/county/ZIP-to-state mapping)All US geographiesStatic

16.2 Source Scripts

ScriptPurpose
scripts/analysis/fresh_predictor_hunt.pyTested 57 metrics from 5 data sources; identified top predictors
scripts/analysis/monotonic_score_builder.pyBuilt monotonic score with isotonic calibration
scripts/analysis/full_backtest.pyFull 12-year backtest with decile tables, year-by-year IC, cycle analysis
scripts/analysis/recentered_score.pyRe-centered score so 50 = state average
scripts/analysis/rigorous_validation.py12-test statistical validation battery (metro)
scripts/analysis/county_backtest.pyFull county-level backtest + validation battery
scripts/analysis/zip_backtest.pyFull ZIP-level backtest + validation (2-metric variant)

16.3 Metric Selection Process

57 candidate metrics were tested individually from Zillow (18), Redfin (13), Census (9), Realtor (12), and Economic (5) data sources. Each metric was evaluated by:

  1. In-sample cross-sectional Spearman IC with 3Y excess return vs state
  2. Walk-forward out-of-sample IC (5 annual folds)
  3. Quintile spread monotonicity
  4. Information Ratio (IC stability)

The top 3 metrics by combined OOS IC and stability were selected:

  • sold_above_list (OOS IC: +0.172, IR: 11.82)
  • median_dom (OOS IC: +0.180, IR: 8.64)
  • months_of_supply (OOS IC: +0.159, IR: 3.65)

These were combined into a single composite signal because they capture complementary dimensions of demand-supply imbalance while avoiding redundancy.

16.4 Methodology Notes

Excess return computation: For each metro at each historical date, the 1Y (or 3Y) forward return was calculated as (ZHVI_future / ZHVI_current) - 1. The state return was computed identically from state-level ZHVI. Excess return = metro return minus state return.

Cross-sectional standardization: Each month, across all metros with data, each metric is z-scored: (value - mean) / std. This normalizes for the fact that absolute metric levels change over time (e.g., "50% sold above list" meant different things in 2015 vs 2022).

Percentile ranking: The composite signal is percentile-ranked within each month's cross-section, producing a raw 0-100 score. This eliminates distributional assumptions.

Re-centering: The percentile rank at which the average excess return equals zero was identified via isotonic regression (pct_rank = 55.6 for the 3Y horizon). The score mapping was linearly adjusted so this point maps to score 50. Below state-average percentiles are stretched into [1, 50]; above state-average percentiles are compressed into [50, 99].

Walk-forward validation: For each test year, the model uses only prior years' data to establish the signal direction (positive IC confirms the metric predicts in the expected direction). The test year's scores are then evaluated against actual future returns. No information from the test year or beyond is used at any point.