Skip to content

openavmkit.vertical_equity_study

VerticalEquityStudy

VerticalEquityStudy(df_sales_in, field_sales, field_prediction, confidence_interval=0.95, iterations=10000, seed=777)

Perform vertical equity analysis and summarize the results.

Attributes:

Name Type Description
rows int

Total number of rows in the input DataFrame.

confidence_interval float

The confidence interval (e.g. 0.95 for 95% confidence)

prd ConfidenceStat

The price-related differential, with confidence intervals

prb ConfidenceStat

The price-related bias, with confidence intervals

quantiles DataFrame

A dataframe containing the median ratio, with confidence intervals, of all ten price quantile tiers

Source code in openavmkit/vertical_equity_study.py
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
def __init__(
    self,
    df_sales_in: pd.DataFrame,
    field_sales: str,
    field_prediction: str,
    confidence_interval : float = 0.95,
    iterations: int = 10000,
    seed : int = 777
):
    df_sales = df_sales_in.copy()

    n = len(df_sales)
    self.rows = n
    self.confidence_interval = confidence_interval

    # Calculate PRD and PRB
    #----------------------

    predictions = df_sales[field_prediction].to_numpy()
    sales = df_sales[field_sales].to_numpy()

    results = calc_ratio_stats_bootstrap(predictions, sales, confidence_interval, iterations=iterations, seed=seed)
    self.prd = results["prd"]

    prb_point, prb_low, prb_high = calc_prb(predictions, sales, confidence_interval)

    self.prb = ConfidenceStat(prb_point, confidence_interval, prb_low, prb_high)

    # Calculate quantiles
    #--------------------

    df_sales["quantile"] = _calc_quantiles(df_sales, field_sales)

    data = {
        "quantile":[],
        "ratio":[],
        "ratio_low":[],
        "ratio_high":[]
    }
    labels = df_sales["quantile"].unique()
    for label in labels:
        df_sub = df_sales[df_sales["quantile"].eq(label)]
        predictions = df_sub[field_prediction]
        sales = df_sub[field_sales]

        results = calc_ratio_stats_bootstrap(predictions, sales, confidence_interval, iterations=iterations, seed=seed)
        med_ratio = results["median_ratio"]

        ratio = med_ratio.value
        low = med_ratio.low
        high = med_ratio.high

        data["ratio"].append(ratio)
        data["ratio_low"].append(low)
        data["ratio_high"].append(high)
        data["quantile"].append(label)

    df = pd.DataFrame(data=data)
    df = df.sort_values(by="quantile", key=lambda col: col.astype(int))
    self.quantiles = df