Skip to content

openavmkit.ratio_study

RatioStudy

RatioStudy(predictions, ground_truth, max_trim)

Performs an IAAO-standard Ratio Study, generating all the relevant statistics.

Attributes:

Name Type Description
predictions ndarray

Series representing predicted values

ground_truth ndarray

Series representing ground truth values (typically observed sale prices)

count int

The number of observations

median_ratio float

The median value of all prediction/ground_truth ratios

mean_ratio float

The mean value of all prediction/ground_truth ratios

cod float

The coefficient of dispersion, a measure of variability (lower is better)

cod_trim float

The coefficient of dispersion, after outlier ratios outside the interquartile range have been trimmed

prd float

The price-related differential, a measure of vertical equity

prb float

The price-related bias, a measure of vertical equity

Initialize a ratio study object

Parameters:

Name Type Description Default
predictions ndarray

Series representing predicted values

required
ground_truth ndarray

Series representing ground truth values (typically observed sale prices)

required
max_trim float

The maximum amount of records allowed to be trimmed in a ratio study

required
Source code in openavmkit/ratio_study.py
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
def __init__(self, predictions: np.ndarray, ground_truth: np.ndarray, max_trim: float):
    """
    Initialize a ratio study object

    Parameters
    ----------
    predictions : np.ndarray
        Series representing predicted values
    ground_truth : np.ndarray
        Series representing ground truth values (typically observed sale prices)
    max_trim : float
        The maximum amount of records allowed to be trimmed in a ratio study
    """
    if len(predictions) != len(ground_truth):
        raise ValueError("predictions and ground_truth must have the same length")

    if len(predictions) == 0:
        self.count = 0
        self.predictions = np.array([])
        self.ground_truth = np.array([])
        self.median_ratio = float("nan")
        self.cod = float("nan")
        self.cod_trim = float("nan")
        self.prd = float("nan")
        self.prb = float("nan")
        self.prd_trim = float("nan")
        self.prb_trim = float("nan")
        self.median_ratio_trim = float("nan")
        self.mean_ratio = float("nan")
        self.mean_ratio_trim = float("nan")
        return

    self.count = len(predictions)
    self.predictions = predictions
    self.ground_truth = ground_truth

    ratios = div_series_z_safe(predictions, ground_truth).astype(float)
    if len(ratios) > 0:
        median_ratio = float(np.median(ratios))
    else:
        median_ratio = float("nan")

    # trim the ratios to remove outliers -- trim to the interquartile range
    trim_predictions, trim_ground_truth = stats.trim_outlier_ratios(predictions, ground_truth, max_trim)
    trim_ratios = div_series_z_safe(predictions, ground_truth).astype(float)

    self.count_trim = len(trim_ratios)

    cod = stats.calc_cod(ratios)
    cod_trim = stats.calc_cod(trim_ratios)

    prd = stats.calc_prd(predictions, ground_truth)
    prd_trim = stats.calc_prd(trim_predictions, trim_ground_truth)

    prb, _, _ = stats.calc_prb(predictions, ground_truth)
    prb_trim, _, _ = stats.calc_prb(trim_predictions, trim_ground_truth)

    self.median_ratio = median_ratio

    if len(ratios) == 0:
        self.mean_ratio = float("nan")
    else:
        self.mean_ratio = float(np.mean(ratios))

    if len(trim_ratios) == 0:
        self.mean_ratio_trim = float("nan")
        self.median_ratio_trim = float("nan")
    else:
        self.mean_ratio_trim = float(np.mean(trim_ratios))
        self.median_ratio_trim = float(np.median(trim_ratios))

    self.cod = cod
    self.cod_trim = cod_trim

    self.prd = prd
    self.prd_trim = prd_trim

    self.prb = prb
    self.prb_trim = prb_trim

RatioStudyBootstrapped

RatioStudyBootstrapped(predictions, ground_truth, max_trim, confidence_interval=0.95, iterations=1000)

Performs an IAAO-standard Ratio Study, generating all the relevant statistics. This version adds confidence intervals.

Attributes:

Name Type Description
iterations float

Number of bootstrap iterations

confidence_interval float

The confidence interval (e.g. 0.95 for 95% confidence)

median_ratio ConfidenceStat

The median value of all prediction/ground_truth ratios

mean_ratio ConfidenceStat

The mean value of all prediction/ground_truth ratios

cod ConfidenceStat

The coefficient of dispersion, a measure of variability (lower is better)

prd ConfidenceStat

The price-related differential, a measure of vertical equity

median_ratio_trim ConfidenceStat

The median value of trimmed prediction/ground_truth ratios

mean_ratio_trim ConfidenceStat

The mean value of trimmed prediction/ground_truth ratios

cod_trim ConfidenceStat

The coefficient of dispersion, a measure of variability (lower is better), of the trimmed set

prd_trim ConfidenceStat

The price-related differential, a measure of vertical equity, of the trimmed set

Initialize a Bootstrapped ratio study object

Parameters:

Name Type Description Default
predictions ndarray

Series representing predicted values

required
ground_truth ndarray

Series representing ground truth values (typically observed sale prices)

required
max_trim float

The maximum amount of records allowed to be trimmed in a ratio study

required
confidence_interval float

Desired confidence interval (default is 0.95, indicating 95% confidence)

0.95
iterations int

How many bootstrap iterations to perform

1000
Source code in openavmkit/ratio_study.py
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
def __init__(
    self,
    predictions: np.ndarray,
    ground_truth: np.ndarray,
    max_trim: float,
    confidence_interval: float = 0.95,
    iterations: int = 1000,
):
    """
    Initialize a Bootstrapped ratio study object

    Parameters
    ----------
    predictions : np.ndarray
        Series representing predicted values
    ground_truth : np.ndarray
        Series representing ground truth values (typically observed sale prices)
    max_trim : float
        The maximum amount of records allowed to be trimmed in a ratio study
    confidence_interval : float
        Desired confidence interval (default is 0.95, indicating 95% confidence)
    iterations : int
        How many bootstrap iterations to perform
    """
    if len(predictions) == 0:
        self.count = 0
        self.iterations = 0
        self.median_ratio = None
        self.mean_ratio = None
        self.cod = None
        self.prd = None
        self.median_ratio_trim = None
        self.mean_ratio_trim = None
        self.cod_trim = None
        self.prd_trim = None

    self.count = len(ground_truth)
    self.iterations = iterations
    self.confidence_interval = confidence_interval

    results = stats.calc_ratio_stats_bootstrap(predictions, ground_truth)

    self.cod = results["cod"]
    self.median_ratio = results["median_ratio"]
    self.mean_ratio = results["mean_ratio"]
    self.prd = results["prd"]

    trim_predictions, trim_ground_truth = stats.trim_outlier_ratios(predictions, ground_truth, max_trim)

    self.count_trim = len(trim_ground_truth)

    results = stats.calc_ratio_stats_bootstrap(trim_predictions, trim_ground_truth)

    self.cod_trim = results["cod"]
    self.median_ratio_trim = results["median_ratio"]
    self.mean_ratio_trim = results["mean_ratio"]
    self.prd_trim = results["prd"]

run_and_write_ratio_study_breakdowns

run_and_write_ratio_study_breakdowns(settings)

Runs ratio studies, with breakdowns, and writes them to disk.

Parameters:

Name Type Description Default
settings dict

Settings dictionary

required
Source code in openavmkit/ratio_study.py
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
def run_and_write_ratio_study_breakdowns(settings: dict):
    """Runs ratio studies, with breakdowns, and writes them to disk.

    Parameters
    ----------
    settings : dict
        Settings dictionary
    """
    model_groups = get_model_group_ids(settings)
    rs = settings.get("analysis", {}).get("ratio_study", {})
    skip = rs.get("skip", [])
    for model_group in model_groups:
        if model_group in skip:
            print(f"Skipping {model_group}...")
            continue
        print(f"Generating report for {model_group}")
        path = f"out/models/{model_group}/main/model_ensemble.pickle"
        if os.path.exists(path):
            os.makedirs(f"out/models/{model_group}", exist_ok=True)
            ensemble_results = read_pickle(path)
            df_sales = ensemble_results.df_sales
            _run_and_write_ratio_study_breakdowns(
                settings, df_sales, model_group, f"out/models/{model_group}"
            )