openavmkit.sales_chasing
Sales-chasing detection.
Sales chasing is the practice -- sometimes deliberate, sometimes an unintended side effect of a valuation methodology -- of moving a parcel's appraised value toward its observed sale price. It makes a valuation look very strong on sold parcels (ratios collapse onto 1.0 and COD drops) while not improving (and sometimes worsening) uniformity among comparable unsold parcels. Because a ratio study can only be scored against parcels that actually sold, a roll affected by sales chasing can post numbers that look better than the values would out-of-sample. openavmkit's own models are evaluated out-of-sample, so a naive head-to-head can understate openavmkit relative to such a roll.
This is an information-gap problem, not an accusation: the point is simply that scoring a roll on the same sales that may have informed it is not the same test our held-out models face. This module turns that asymmetry into measurable signals so a very tight assessor ratio study can be interpreted with the right context rather than taken at face value. Three signals are computed; each degrades gracefully when its inputs are unavailable:
- Ratio spike at 1.0 -- the share of sold parcels whose
value / sale_pricelands withinepsof 1.0. A large mass of ratios sitting exactly on 1.0 is a strong indicator that values were taken directly from sale prices. - COD-CHD divergence -- mirrors the model utility score's
sales_chase_score((1 / COD) * CHD; see :func:openavmkit.modeling.compute_utility_score): a suspiciously low ratio-COD on sold parcels paired with high within-cluster dispersion (CHD) of the values themselves. A genuinely accurate roll has both low COD and low CHD; a chased roll buys low COD on sold parcels without the matching uniformity. Requires a horizontal-equity cluster column. - In/out-of-sample COD gap -- COD on pre-valuation sold parcels vs. post-valuation
sold parcels. A large jump once we include sales the assessor could not have seen at
roll-close measures the in-sample advantage directly. Requires
sale_age_daysand an alignedvaluation_date(see the ratio study docs).
SalesChasingResult
dataclass
SalesChasingResult(field, signals=list())
Combined sales-chasing verdict for one valuation field.
verdict
property
verdict
"likely" (>=2 signals), "possible" (1 signal), or "no signal".
to_markdown
to_markdown()
Render the result as a small Markdown table plus a verdict line.
Source code in openavmkit/sales_chasing.py
97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 | |
SalesChasingSignal
dataclass
SalesChasingSignal(name, value, reference, flagged, detail)
One sales-chasing signal and its verdict.
Attributes:
| Name | Type | Description |
|---|---|---|
name |
str
|
Human-readable signal name. |
value |
float or None
|
The signal value for the field under suspicion (None if it could not be computed). |
reference |
float or None
|
The same signal computed for the reference field (e.g. our own model), for context. None when no reference field was supplied or it could not be computed. |
flagged |
bool
|
Whether this signal indicates sales chasing. |
detail |
str
|
Short explanation of the verdict. |
detect_sales_chasing
detect_sales_chasing(df, suspect_field, sale_price_field='sale_price', reference_field=None, cluster_field=None, sale_age_field='sale_age_days', spike_eps=0.02, spike_min_share=0.1, spike_ratio_vs_ref=1.5, cod_ratio_max=0.7, chd_ratio_min=0.9, oos_cod_jump=1.5)
Run the sales-chasing signals on suspect_field.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
Sold parcels. Must contain |
required |
suspect_field
|
str
|
Valuation column under examination (e.g. |
required |
sale_price_field
|
str
|
Sale-price column used as ground truth. Defaults to |
'sale_price'
|
reference_field
|
str
|
A second valuation column (e.g. our own |
None
|
cluster_field
|
str
|
Horizontal-equity cluster column. If omitted, the first of |
None
|
sale_age_field
|
str
|
Column of days between sale and valuation date (positive = before valuation). Used by the in/out-of-sample signal; skipped if absent. |
'sale_age_days'
|
spike_eps
|
float
|
Spike-at-1.0 thresholds: bucket half-width, minimum suspect share, and the factor by which the suspect must exceed the reference share (see module docstring). |
0.02
|
spike_min_share
|
float
|
Spike-at-1.0 thresholds: bucket half-width, minimum suspect share, and the factor by which the suspect must exceed the reference share (see module docstring). |
0.02
|
spike_ratio_vs_ref
|
float
|
Spike-at-1.0 thresholds: bucket half-width, minimum suspect share, and the factor by which the suspect must exceed the reference share (see module docstring). |
0.02
|
cod_ratio_max
|
float
|
COD-CHD divergence thresholds. Sales chasing is flagged when the suspect's ratio-COD on
sold parcels is at most |
0.7
|
chd_ratio_min
|
float
|
COD-CHD divergence thresholds. Sales chasing is flagged when the suspect's ratio-COD on
sold parcels is at most |
0.7
|
oos_cod_jump
|
float
|
In/out-of-sample threshold: flag when post-valuation COD is at least this multiple of pre-valuation COD. |
1.5
|
Returns:
| Type | Description |
|---|---|
SalesChasingResult
|
The per-signal verdicts and a combined verdict. |
Source code in openavmkit/sales_chasing.py
165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 | |