Skip to content

openavmkit.horizontal_equity_study

HorizontalEquityClusterSummary

HorizontalEquityClusterSummary(id, count, chd, min, max, median)

Summary for an individual horizontal equity cluster.

Attributes:

Name Type Description
id str

Identifier of the cluster.

count int

Number of records in the cluster.

chd float

CHD value for the cluster.

min float

Minimum value in the cluster.

max float

Maximum value in the cluster.

median float

Median value in the cluster.

Initialize a HorizontalEquityClusterSummary instance.

Parameters:

Name Type Description Default
id str

Cluster identifier.

required
count int

Number of records in the cluster.

required
chd float

COD value for the cluster.

required
min float

Minimum value in the cluster.

required
max float

Maximum value in the cluster.

required
median float

Median value in the cluster.

required
Source code in openavmkit/horizontal_equity_study.py
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
def __init__(
    self, id: str, count: int, chd: float, min: float, max: float, median: float
):
    """
    Initialize a HorizontalEquityClusterSummary instance.

    Parameters
    ----------
    id : str
        Cluster identifier.
    count : int
        Number of records in the cluster.
    chd : float
        COD value for the cluster.
    min : float
        Minimum value in the cluster.
    max : float
        Maximum value in the cluster.
    median : float
        Median value in the cluster.
    """
    self.id = id
    self.count = count
    self.chd = chd
    self.min = min
    self.max = max
    self.median = median

HorizontalEquityStudy

HorizontalEquityStudy(df, field_cluster, field_value)

Perform horizontal equity analysis and summarize the results.

Attributes:

Name Type Description
summary HorizontalEquitySummary

Overall summary statistics.

cluster_summaries dict[str, HorizontalEquityClusterSummary]

Dictionary mapping cluster IDs to their summaries.

Initialize a HorizontalEquityStudy instance by computing cluster summaries.

Parameters:

Name Type Description Default
df DataFrame

Input DataFrame containing data for horizontal equity analysis.

required
field_cluster str

Column name indicating cluster membership.

required
field_value str

Column name of the values to analyze.

required
Source code in openavmkit/horizontal_equity_study.py
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
def __init__(self, df: pd.DataFrame, field_cluster: str, field_value: str):
    """
    Initialize a HorizontalEquityStudy instance by computing cluster summaries.

    Parameters
    ----------
    df : pandas.DataFrame
        Input DataFrame containing data for horizontal equity analysis.
    field_cluster : str
        Column name indicating cluster membership.
    field_value : str
        Column name of the values to analyze.
    """

    clusters = df[field_cluster].unique()
    self.cluster_summaries = {}

    chds = np.array([])
    for cluster in clusters:
        df_cluster = df[df[field_cluster].eq(cluster)]
        count = len(df_cluster)
        if count > 0:
            chd = stats.calc_cod(df_cluster[field_value].values)
            min_value = df_cluster[field_value].min()
            max_value = df_cluster[field_value].max()
            median_value = df_cluster[field_value].median()
        else:
            chd = float("nan")
            min_value = float("nan")
            max_value = float("nan")
            median_value = float("nan")
        summary = HorizontalEquityClusterSummary(
            cluster, count, chd, min_value, max_value, median_value
        )
        self.cluster_summaries[cluster] = summary
        chds = np.append(chds, chd)

    if len(chds) > 0:
        min_chd = np.min(chds)
        max_chd = np.max(chds)
        med_chd = float(np.median(chds))
    else:
        min_chd = float("nan")
        max_chd = float("nan")
        med_chd = float("nan")

    self.summary = HorizontalEquitySummary(
        len(df), len(clusters), min_chd, max_chd, med_chd
    )

HorizontalEquitySummary

HorizontalEquitySummary(rows, clusters, min_chd, max_chd, median_chd)

Summary statistics for horizontal equity analysis.

Attributes:

Name Type Description
rows int

Total number of rows in the input DataFrame.

clusters int

Total number of clusters identified.

min_chd float

Minimum CHD (Coefficient of Horizontal Dispersion) value of any cluster.

max_chd float

Maximum CHD value of any cluster.

median_chd float

Median CHD value of all clusters.

Initialize a HorizontalEquitySummary instance.

Parameters:

Name Type Description Default
rows int

Total number of rows in the DataFrame.

required
clusters int

Total number of clusters.

required
min_chd float

Minimum COD value.

required
max_chd float

Maximum COD value.

required
median_chd float

Median COD value.

required
Source code in openavmkit/horizontal_equity_study.py
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
def __init__(
    self,
    rows: int,
    clusters: int,
    min_chd: float,
    max_chd: float,
    median_chd: float,
):
    """
    Initialize a HorizontalEquitySummary instance.

    Parameters
    ----------
    rows : int
        Total number of rows in the DataFrame.
    clusters : int
        Total number of clusters.
    min_chd : float
        Minimum COD value.
    max_chd : float
        Maximum COD value.
    median_chd : float
        Median COD value.
    """
    self.rows = rows
    self.clusters = clusters
    self.min_chd = min_chd
    self.max_chd = max_chd
    self.median_chd = median_chd

mark_horizontal_equity_clusters

mark_horizontal_equity_clusters(df, settings, verbose=False, settings_object='horizontal_equity', id_name='he_id', output_folder='', t=None)

Compute and mark horizontal equity clusters in the DataFrame.

Uses clustering (via make_clusters) based on a location field and categorical/numeric fields specified in settings to generate a horizontal equity cluster ID which is stored in the specified id_name column.

Parameters:

Name Type Description Default
df DataFrame

Input DataFrame.

required
settings dict

Settings dictionary.

required
verbose bool

If True, prints progress information.

False
settings_object str

The settings object to use for horizontal equity analysis.

'horizontal_equity'
id_name str

Name of the column to store the horizontal equity cluster ID.

'he_id'
output_folder str

Output folder path (stores information about the clusters for later use).

''
t TimingData

TimingData object to record performance metrics.

None

Returns:

Type Description
DataFrame

DataFrame with a new cluster ID column (id_name).

Source code in openavmkit/horizontal_equity_study.py
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
def mark_horizontal_equity_clusters(
    df: pd.DataFrame,
    settings: dict,
    verbose: bool = False,
    settings_object: str = "horizontal_equity",
    id_name: str = "he_id",
    output_folder: str = "",
    t: TimingData = None,
) -> pd.DataFrame:
    """
    Compute and mark horizontal equity clusters in the DataFrame.

    Uses clustering (via `make_clusters`) based on a location field and categorical/numeric
    fields specified in settings to generate a horizontal equity cluster ID which is stored
    in the specified `id_name` column.

    Parameters
    ----------
    df : pandas.DataFrame
        Input DataFrame.
    settings : dict
        Settings dictionary.
    verbose : bool, optional
        If True, prints progress information.
    settings_object : str, optional
        The settings object to use for horizontal equity analysis.
    id_name : str, optional
        Name of the column to store the horizontal equity cluster ID.
    output_folder : str, optional
        Output folder path (stores information about the clusters for later use).
    t : TimingData, optional
        TimingData object to record performance metrics.

    Returns
    -------
    pandas.DataFrame
        DataFrame with a new cluster ID column (`id_name`).
    """

    he = settings.get("analysis", {}).get(settings_object, {})
    location = he.get("location", None)
    fields_categorical = he.get("fields_categorical", [])
    fields_numeric = he.get("fields_numeric", None)
    df[id_name], _, _ = make_clusters(
        df,
        location,
        fields_categorical,
        fields_numeric,
        verbose=verbose,
        output_folder=output_folder,
    )
    return df

mark_horizontal_equity_clusters_per_model_group_sup

mark_horizontal_equity_clusters_per_model_group_sup(sup, settings, verbose=False, use_cache=True, do_land_clusters=True, do_impr_clusters=True)

Mark horizontal equity clusters on the 'universe' DataFrame of a SalesUniversePair.

Updates the 'universe' DataFrame with horizontal equity clusters by calling mark_horizontal_equity_clusters and then sets the updated DataFrame in sup.

Parameters:

Name Type Description Default
sup SalesUniversePair

SalesUniversePair containing sales and universe data.

required
settings dict

Settings dictionary.

required
verbose bool

If True, prints progress information.

False
use_cache bool

If True, uses cached DataFrame if available.

True
do_land_clusters bool

If True, marks land horizontal equity clusters.

True
do_impr_clusters bool

If True, marks improvement horizontal equity clusters.

True

Returns:

Type Description
SalesUniversePair

Updated SalesUniversePair with marked horizontal equity clusters.

Source code in openavmkit/horizontal_equity_study.py
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
def mark_horizontal_equity_clusters_per_model_group_sup(
    sup: SalesUniversePair,
    settings: dict,
    verbose: bool = False,
    use_cache: bool = True,
    do_land_clusters: bool = True,
    do_impr_clusters: bool = True,
) -> SalesUniversePair:
    """
    Mark horizontal equity clusters on the 'universe' DataFrame of a SalesUniversePair.

    Updates the 'universe' DataFrame with horizontal equity clusters by calling
    `mark_horizontal_equity_clusters` and then sets the updated DataFrame in `sup`.

    Parameters
    ----------
    sup : SalesUniversePair
        SalesUniversePair containing sales and universe data.
    settings : dict
        Settings dictionary.
    verbose : bool, optional
        If True, prints progress information.
    use_cache : bool, optional
        If True, uses cached DataFrame if available.
    do_land_clusters : bool, optional
        If True, marks land horizontal equity clusters.
    do_impr_clusters : bool, optional
        If True, marks improvement horizontal equity clusters.

    Returns
    -------
    SalesUniversePair
        Updated SalesUniversePair with marked horizontal equity clusters.
    """

    df_universe = sup["universe"]
    if verbose:
        print("")
        print("Marking horizontal equity clusters...")
    df_universe = _mark_horizontal_equity_clusters_per_model_group(
        df_universe,
        settings,
        verbose,
        output_folder="horizontal_equity/general",
        use_cache=use_cache,
    )
    if do_land_clusters:
        if verbose:
            print("")
            print("Marking LAND horizontal equity clusters...")
        df_universe = _mark_horizontal_equity_clusters_per_model_group(
            df_universe,
            settings,
            verbose,
            settings_object="land_equity",
            id_name="land_he_id",
            output_folder="horizontal_equity/land",
            use_cache=use_cache,
        )
    if do_impr_clusters:
        if verbose:
            print("")
            print("Marking IMPROVEMENT horizontal equity clusters...")
        df_universe = _mark_horizontal_equity_clusters_per_model_group(
            df_universe,
            settings,
            verbose,
            settings_object="impr_equity",
            id_name="impr_he_id",
            output_folder="horizontal_equity/improvement",
            use_cache=use_cache,
        )
        sup.set("universe", df_universe)
    return sup