Skip to content

openavmkit.sales_chasing

Sales-chasing detection.

Sales chasing is the practice -- sometimes deliberate, sometimes an unintended side effect of a valuation methodology -- of moving a parcel's appraised value toward its observed sale price. It makes a valuation look very strong on sold parcels (ratios collapse onto 1.0 and COD drops) while not improving (and sometimes worsening) uniformity among comparable unsold parcels. Because a ratio study can only be scored against parcels that actually sold, a roll affected by sales chasing can post numbers that look better than the values would out-of-sample. openavmkit's own models are evaluated out-of-sample, so a naive head-to-head can understate openavmkit relative to such a roll.

This is an information-gap problem, not an accusation: the point is simply that scoring a roll on the same sales that may have informed it is not the same test our held-out models face. This module turns that asymmetry into measurable signals so a very tight assessor ratio study can be interpreted with the right context rather than taken at face value. Three signals are computed; each degrades gracefully when its inputs are unavailable:

  1. Ratio spike at 1.0 -- the share of sold parcels whose value / sale_price lands within eps of 1.0. A large mass of ratios sitting exactly on 1.0 is a strong indicator that values were taken directly from sale prices.
  2. COD-CHD divergence -- mirrors the model utility score's sales_chase_score ((1 / COD) * CHD; see :func:openavmkit.modeling.compute_utility_score): a suspiciously low ratio-COD on sold parcels paired with high within-cluster dispersion (CHD) of the values themselves. A genuinely accurate roll has both low COD and low CHD; a chased roll buys low COD on sold parcels without the matching uniformity. Requires a horizontal-equity cluster column.
  3. In/out-of-sample COD gap -- COD on pre-valuation sold parcels vs. post-valuation sold parcels. A large jump once we include sales the assessor could not have seen at roll-close measures the in-sample advantage directly. Requires sale_age_days and an aligned valuation_date (see the ratio study docs).

SalesChasingResult dataclass

SalesChasingResult(field, signals=list())

Combined sales-chasing verdict for one valuation field.

verdict property

verdict

"likely" (>=2 signals), "possible" (1 signal), or "no signal".

to_markdown

to_markdown()

Render the result as a small Markdown table plus a verdict line.

Source code in openavmkit/sales_chasing.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
def to_markdown(self) -> str:
    """Render the result as a small Markdown table plus a verdict line."""
    header = "| Signal | Value | Reference | Flag |\n| --- | --- | --- | --- |"
    rows = []
    for s in self.signals:
        val = "n/a" if s.value is None or not np.isfinite(s.value) else f"{s.value:.2f}"
        ref = (
            "n/a"
            if s.reference is None or not np.isfinite(s.reference)
            else f"{s.reference:.2f}"
        )
        flag = "**yes**" if s.flagged else "no"
        rows.append(f"| {s.name} | {val} | {ref} | {flag} |")
    table = header + "\n" + "\n".join(rows)
    verdict = {
        "likely": (
            "**Multiple sales-chasing signals fired.** Interpret the assessor's results "
            "on sold parcels with this context; it is a prompt to look closer, not a verdict."
        ),
        "possible": (
            "**One sales-chasing signal fired.** Worth a closer look, but not conclusive."
        ),
        "no signal": "No sales-chasing signals detected.",
    }[self.verdict]
    details = "\n".join(f"- *{s.name}*: {s.detail}" for s in self.signals)
    return f"{verdict}\n\n{table}\n\n{details}\n"

SalesChasingSignal dataclass

SalesChasingSignal(name, value, reference, flagged, detail)

One sales-chasing signal and its verdict.

Attributes:

Name Type Description
name str

Human-readable signal name.

value float or None

The signal value for the field under suspicion (None if it could not be computed).

reference float or None

The same signal computed for the reference field (e.g. our own model), for context. None when no reference field was supplied or it could not be computed.

flagged bool

Whether this signal indicates sales chasing.

detail str

Short explanation of the verdict.

detect_sales_chasing

detect_sales_chasing(df, suspect_field, sale_price_field='sale_price', reference_field=None, cluster_field=None, sale_age_field='sale_age_days', spike_eps=0.02, spike_min_share=0.1, spike_ratio_vs_ref=1.5, cod_ratio_max=0.7, chd_ratio_min=0.9, oos_cod_jump=1.5)

Run the sales-chasing signals on suspect_field.

Parameters:

Name Type Description Default
df DataFrame

Sold parcels. Must contain suspect_field and sale_price_field. May contain a horizontal-equity cluster column and sale_age_field for the optional signals.

required
suspect_field str

Valuation column under examination (e.g. "assr_market_value").

required
sale_price_field str

Sale-price column used as ground truth. Defaults to "sale_price".

'sale_price'
reference_field str

A second valuation column (e.g. our own "prediction") used as a sanity baseline for the relative thresholds. If omitted, the relative comparisons are skipped and the spike signal falls back to its absolute threshold only.

None
cluster_field str

Horizontal-equity cluster column. If omitted, the first of he_id/impr_he_id/ land_he_id present is used; if none are present, the COD-CHD signal is skipped.

None
sale_age_field str

Column of days between sale and valuation date (positive = before valuation). Used by the in/out-of-sample signal; skipped if absent.

'sale_age_days'
spike_eps float

Spike-at-1.0 thresholds: bucket half-width, minimum suspect share, and the factor by which the suspect must exceed the reference share (see module docstring).

0.02
spike_min_share float

Spike-at-1.0 thresholds: bucket half-width, minimum suspect share, and the factor by which the suspect must exceed the reference share (see module docstring).

0.02
spike_ratio_vs_ref float

Spike-at-1.0 thresholds: bucket half-width, minimum suspect share, and the factor by which the suspect must exceed the reference share (see module docstring).

0.02
cod_ratio_max float

COD-CHD divergence thresholds. Sales chasing is flagged when the suspect's ratio-COD on sold parcels is at most cod_ratio_max times the reference's (suspiciously better on sold parcels) while its within-cluster dispersion (CHD) is still at least chd_ratio_min times the reference's (no matching gain in uniformity). Requires a reference field.

0.7
chd_ratio_min float

COD-CHD divergence thresholds. Sales chasing is flagged when the suspect's ratio-COD on sold parcels is at most cod_ratio_max times the reference's (suspiciously better on sold parcels) while its within-cluster dispersion (CHD) is still at least chd_ratio_min times the reference's (no matching gain in uniformity). Requires a reference field.

0.7
oos_cod_jump float

In/out-of-sample threshold: flag when post-valuation COD is at least this multiple of pre-valuation COD.

1.5

Returns:

Type Description
SalesChasingResult

The per-signal verdicts and a combined verdict.

Source code in openavmkit/sales_chasing.py
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
def detect_sales_chasing(
    df: pd.DataFrame,
    suspect_field: str,
    sale_price_field: str = "sale_price",
    reference_field: str | None = None,
    cluster_field: str | None = None,
    sale_age_field: str = "sale_age_days",
    spike_eps: float = 0.02,
    spike_min_share: float = 0.10,
    spike_ratio_vs_ref: float = 1.5,
    cod_ratio_max: float = 0.7,
    chd_ratio_min: float = 0.9,
    oos_cod_jump: float = 1.5,
) -> SalesChasingResult:
    """Run the sales-chasing signals on ``suspect_field``.

    Parameters
    ----------
    df : pandas.DataFrame
        Sold parcels. Must contain ``suspect_field`` and ``sale_price_field``. May contain a
        horizontal-equity cluster column and ``sale_age_field`` for the optional signals.
    suspect_field : str
        Valuation column under examination (e.g. ``"assr_market_value"``).
    sale_price_field : str, optional
        Sale-price column used as ground truth. Defaults to ``"sale_price"``.
    reference_field : str, optional
        A second valuation column (e.g. our own ``"prediction"``) used as a sanity baseline
        for the relative thresholds. If omitted, the relative comparisons are skipped and the
        spike signal falls back to its absolute threshold only.
    cluster_field : str, optional
        Horizontal-equity cluster column. If omitted, the first of ``he_id``/``impr_he_id``/
        ``land_he_id`` present is used; if none are present, the COD-CHD signal is skipped.
    sale_age_field : str, optional
        Column of days between sale and valuation date (positive = before valuation). Used by
        the in/out-of-sample signal; skipped if absent.
    spike_eps, spike_min_share, spike_ratio_vs_ref
        Spike-at-1.0 thresholds: bucket half-width, minimum suspect share, and the factor by
        which the suspect must exceed the reference share (see module docstring).
    cod_ratio_max, chd_ratio_min
        COD-CHD divergence thresholds. Sales chasing is flagged when the suspect's ratio-COD on
        sold parcels is at most ``cod_ratio_max`` times the reference's (suspiciously *better*
        on sold parcels) while its within-cluster dispersion (CHD) is still at least
        ``chd_ratio_min`` times the reference's (no matching gain in uniformity). Requires a
        reference field.
    oos_cod_jump : float
        In/out-of-sample threshold: flag when post-valuation COD is at least this multiple of
        pre-valuation COD.

    Returns
    -------
    SalesChasingResult
        The per-signal verdicts and a combined verdict.
    """
    result = SalesChasingResult(field=suspect_field)

    cluster = _resolve_cluster_field(df, cluster_field)
    have_ref = reference_field is not None and reference_field in df.columns

    # --- Signal 1: ratio spike at 1.0 ---------------------------------------------------
    spike = _spike_share(df, suspect_field, sale_price_field, spike_eps)
    ref_spike = (
        _spike_share(df, reference_field, sale_price_field, spike_eps) if have_ref else None
    )
    spike_flag = bool(np.isfinite(spike) and spike >= spike_min_share)
    if spike_flag and ref_spike is not None and np.isfinite(ref_spike) and ref_spike > 0:
        # If we have a baseline, require the suspect to spike notably more than it.
        spike_flag = spike >= ref_spike * spike_ratio_vs_ref
    result.signals.append(
        SalesChasingSignal(
            name=f"Ratio spike at 1.0 (±{spike_eps:g})",
            value=None if not np.isfinite(spike) else spike * 100.0,
            reference=None if ref_spike is None or not np.isfinite(ref_spike) else ref_spike * 100.0,
            flagged=spike_flag,
            detail=(
                f"{spike * 100:.1f}% of sold ratios sit within ±{spike_eps:g} of 1.0"
                if np.isfinite(spike)
                else "could not compute ratios"
            ),
        )
    )

    # --- Signal 2: COD-CHD divergence ---------------------------------------------------
    # Sales chasing buys a low ratio-COD on *sold* parcels without the matching uniformity
    # (CHD) among *similar* parcels. An honestly-better valuation lowers both COD and CHD;
    # a chaser lowers COD while CHD stays put (or worsens). We compare against the reference
    # rather than the absolute (1/COD)*CHD score (the model utility scorer's `sales_chase_score`,
    # modeling.py:1712), which avoids dividing by a near-zero reference. Needs a reference.
    if cluster is not None and have_ref:
        cod_s = _cod(df, suspect_field, sale_price_field)
        chd_s = _median_chd(df, suspect_field, cluster)
        cod_r = _cod(df, reference_field, sale_price_field)
        chd_r = _median_chd(df, reference_field, cluster)
        computable = all(np.isfinite(x) for x in (cod_s, chd_s, cod_r, chd_r)) and cod_r > 0
        # Suspiciously better COD on sold parcels...
        better_cod = computable and cod_s <= cod_r * cod_ratio_max
        # ...with no matching improvement in within-cluster uniformity.
        not_more_uniform = computable and chd_s >= chd_r * chd_ratio_min
        div_flag = bool(better_cod and not_more_uniform)
        # Display the (1/COD)*CHD score for suspect and reference for context.
        score = (1.0 / cod_s) * chd_s if computable else float("nan")
        ref_score = (1.0 / cod_r) * chd_r if computable else float("nan")
        result.signals.append(
            SalesChasingSignal(
                name="COD-CHD divergence (1/COD × CHD)",
                value=None if not np.isfinite(score) else score,
                reference=None if not np.isfinite(ref_score) else ref_score,
                flagged=div_flag,
                detail=(
                    f"sold-COD {cod_s:.1f} vs ours {cod_r:.1f} (much tighter), but CHD "
                    f"{chd_s:.1f} vs ours {chd_r:.1f} (no better) — clustered on '{cluster}'"
                    if div_flag
                    else f"sold-COD {cod_s:.1f} vs ours {cod_r:.1f}, CHD {chd_s:.1f} vs "
                    f"ours {chd_r:.1f} (clustered on '{cluster}')"
                    if computable
                    else "could not compute COD/CHD"
                ),
            )
        )

    # --- Signal 3: in/out-of-sample COD gap ---------------------------------------------
    if sale_age_field in df.columns:
        seen = df[df[sale_age_field] >= 0]  # at/before valuation: assessor could have seen
        unseen = df[df[sale_age_field] < 0]  # after valuation: out-of-sample for the roll
        cod_seen = _cod(seen, suspect_field, sale_price_field)
        cod_unseen = _cod(unseen, suspect_field, sale_price_field)
        gap_flag = bool(
            np.isfinite(cod_seen)
            and cod_seen > 0
            and np.isfinite(cod_unseen)
            and cod_unseen >= cod_seen * oos_cod_jump
        )
        result.signals.append(
            SalesChasingSignal(
                name="In/out-of-sample COD gap",
                value=None if not np.isfinite(cod_unseen) else cod_unseen,
                reference=None if not np.isfinite(cod_seen) else cod_seen,
                flagged=gap_flag,
                detail=(
                    f"COD jumps from {cod_seen:.1f} (pre-valuation sold) to "
                    f"{cod_unseen:.1f} (post-valuation sold)"
                    if np.isfinite(cod_seen) and np.isfinite(cod_unseen)
                    else "not enough pre/post-valuation sold parcels"
                ),
            )
        )

    return result