Skip to content

openavmkit.utilities.modeling

AverageModel

AverageModel(type, sales_chase)

An intentionally bad predictive model, to use as a sort of control. Produces predictions equal to the average of observed sale prices.

Attributes:

Name Type Description
type str

The type of average to use

sales_chase float

Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of sales_chase. So sales_chase=0.05 will copy each sale price with 5% random noise. NOTE: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.

Initialize an AverageModel

Parameters:

Name Type Description Default
type str

The type of average to use

required
sales_chase float

Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of sales_chase. So sales_chase=0.05 will copy each sale price with 5% random noise. NOTE: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.

required
Source code in openavmkit/utilities/modeling.py
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
def __init__(self, type: str, sales_chase: float):
    """Initialize an AverageModel

    Parameters
    ----------
    type : str
        The type of average to use
    sales_chase : float
        Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold
        parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of
        ``sales_chase``. So ``sales_chase=0.05`` will copy each sale price with 5% random noise.
        **NOTE**: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.
    """
    self.type = type
    self.sales_chase = sales_chase

GWRModel

GWRModel(coords_train, X_train, y_train, gwr_bw)

Geographic Weighted Regression Model

Attributes:

Name Type Description
coords_train list[tuple[float, float]]

list of geospatial coordinates corresponding to each observation in the training set

X_train ndarray

2D array of independent variables' values from the training set

y_train ndarray

1D array of dependent variable's values from the training set

gwr_bw float

Bandwidth for GWR calculation

Parameters:

Name Type Description Default
coords_train list[tuple[float, float]]

list of geospatial coordinates corresponding to each observation in the training set

required
X_train ndarray

2D array of independent variables' values from the training set

required
y_train ndarray

1D array of dependent variable's values from the training set

required
gwr_bw float

Bandwidth for GWR calculation

required
Source code in openavmkit/utilities/modeling.py
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
def __init__(
    self,
    coords_train: list[tuple[float, float]],
    X_train: np.ndarray,
    y_train: np.ndarray,
    gwr_bw: float,
):
    """
    Parameters
    ----------
    coords_train : list[tuple[float, float]]
        list of geospatial coordinates corresponding to each observation in the training set
    X_train : np.ndarray
        2D array of independent variables' values from the training set
    y_train : np.ndarray
        1D array of dependent variable's values from the training set
    gwr_bw : float
        Bandwidth for GWR calculation
    """
    self.coords_train = coords_train
    self.X_train = X_train
    self.y_train = y_train
    self.gwr_bw = gwr_bw

GarbageModel

GarbageModel(min_value, max_value, sales_chase, normal)

An intentionally bad predictive model, to use as a sort of control. Produces random predictions.

Attributes:

Name Type Description
min_value float

The minimum value of to "predict"

max_value float

The maximum value of to "predict"

sales_chase float

Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of sales_chase. So sales_chase=0.05 will copy each sale price with 5% random noise. NOTE: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.

normal bool

If True, the randomly generated predictions follow a normal distribution based on the observed sale price's standard deviation. If False, randomly generated predictions follow a uniform distribution between min and max.

Initialize a GarbageModel

Parameters:

Name Type Description Default
min_value float

The minimum value of to "predict"

required
max_value float

The maximum value of to "predict"

required
sales_chase float

Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of sales_chase. So sales_chase=0.05 will copy each sale price with 5% random noise. NOTE: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.

required
normal bool

If True, the randomly generated predictions follow a normal distribution based on the observed sale price's standard deviation. If False, randomly generated predictions follow a uniform distribution between min and max.

required
Source code in openavmkit/utilities/modeling.py
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
def __init__(
    self, min_value: float, max_value: float, sales_chase: float, normal: bool
):
    """Initialize a GarbageModel

    Parameters
    ----------
    min_value : float
        The minimum value of to "predict"
    max_value : float
        The maximum value of to "predict"
    sales_chase : float
        Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold
        parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of
        ``sales_chase``. So ``sales_chase=0.05`` will copy each sale price with 5% random noise.
        **NOTE**: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.
    normal : bool
        If True, the randomly generated predictions follow a normal distribution based on the observed sale price's
        standard deviation. If False, randomly generated predictions follow a uniform distribution between min and max.
    """
    self.min_value = min_value
    self.max_value = max_value
    self.sales_chase = sales_chase
    self.normal = normal

GroundTruthModel

GroundTruthModel(observed_field, ground_truth_field)

Mostly only used in Synthetic models, where you want to compare against simulation ground_truth instead of observed sale price, which you can never do in real life.

Attributes:

Name Type Description
observed_field str

The field that represents observed sale prices

ground_truth_field str

The field that represents platonic ground truth

Initialize a GroundTruthModel object

Parameters:

Name Type Description Default
observed_field str

The field that represents observed sale prices

required
ground_truth_field str

The field that represents platonic ground truth

required
Source code in openavmkit/utilities/modeling.py
229
230
231
232
233
234
235
236
237
238
239
240
def __init__(self, observed_field: str, ground_truth_field: str):
    """Initialize a GroundTruthModel object

    Parameters
    ----------
    observed_field : str
        The field that represents observed sale prices
    ground_truth_field : str
        The field that represents platonic ground truth
    """
    self.observed_field = observed_field
    self.ground_truth_field = ground_truth_field

LocalSqftModel

LocalSqftModel(loc_map, location_fields, overall_per_impr_sqft, overall_per_land_sqft, sales_chase)

Produces predictions equal to the localized average price/sqft of land or building, multiplied by the observed size of the parcel's land or building, depending on whether it's vacant or improved.

Unlike NaiveSqftModel, this model is sensitive to location, based on user-specified locations, and might actually result in decent predictions.

Attributes:

Name Type Description
loc_map dict[str : tuple[DataFrame, DataFrame]

A dictionary that maps location field names to localized per-sqft values. The dictionary itself is keyed by the names of the location fields themselves (e.g. "neighborhood", "market_region", "census_tract", etc.) or whatever the user specifies.

Each entry is a tuple containing two DataFrames:

  • Values per improved square foot
  • Values per land square foot

Each DataFrame is keyed by the unique values for the given location. (e.g. "River heights", "Meadowbrook", etc., if the location field in question is "neighborhood") The other field in each DataFrame will be {location_field}_per_impr_sqft or {location_field}_per_land_sqft

location_fields list

List of location fields used (e.g. "neighborhood", "market_region", "census_tract", etc.)

overall_per_impr_sqft float

Fallback value per improved square foot, to use for parcels of unspecified location. Based on the overall average value for the dataset.

overall_per_land_sqft float

Fallback value per land square foot, to use for parcels of unspecified location. Based on the overall average value for the dataset.

sales_chase float

Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of sales_chase. So sales_chase=0.05 will copy each sale price with 5% random noise. NOTE: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.

Initialize a LocalSqftModel

Parameters:

Name Type Description Default
loc_map dict[str : tuple[DataFrame, DataFrame]

A dictionary that maps location field names to localized per-sqft values. The dictionary itself is keyed by the names of the location fields themselves (e.g. "neighborhood", "market_region", "census_tract", etc.) or whatever the user specifies.

Each entry is a tuple containing two DataFrames:

  • Values per improved square foot
  • Values per land square foot

Each DataFrame is keyed by the unique values for the given location. (e.g. "River heights", "Meadowbrook", etc., if the location field in question is "neighborhood") The other field in each DataFrame will be {location_field}_per_impr_sqft or {location_field}_per_land_sqft

required
location_fields list

List of location fields used (e.g. "neighborhood", "market_region", "census_tract", etc.)

required
overall_per_impr_sqft float

Fallback value per improved square foot, to use for parcels of unspecified location. Based on the overall average value for the dataset.

required
overall_per_land_sqft float

Fallback value per land square foot, to use for parcels of unspecified location. Based on the overall average value for the dataset.

required
sales_chase float

Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of sales_chase. So sales_chase=0.05 will copy each sale price with 5% random noise. NOTE: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.

required
Source code in openavmkit/utilities/modeling.py
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
def __init__(
    self,
    loc_map: dict,
    location_fields: list,
    overall_per_impr_sqft: float,
    overall_per_land_sqft: float,
    sales_chase: float,
):
    """Initialize a LocalSqftModel

    Parameters
    ----------
    loc_map : dict[str : tuple[DataFrame, DataFrame]
        A dictionary that maps location field names to localized per-sqft values. The dictionary itself is keyed by the
        names of the location fields themselves (e.g. "neighborhood", "market_region", "census_tract", etc.) or whatever
        the user specifies.

        Each entry is a tuple containing two DataFrames:

          - Values per improved square foot
          - Values per land square foot

        Each DataFrame is keyed by the unique *values* for the given location. (e.g. "River heights", "Meadowbrook",
        etc., if the location field in question is "neighborhood") The other field in each DataFrame will be
        ``{location_field}_per_impr_sqft`` or ``{location_field}_per_land_sqft``
    location_fields : list
        List of location fields used (e.g. "neighborhood", "market_region", "census_tract", etc.)
    overall_per_impr_sqft : float
        Fallback value per improved square foot, to use for parcels of unspecified location. Based on the
        overall average value for the dataset.
    overall_per_land_sqft : float
        Fallback value per land square foot, to use for parcels of unspecified location. Based on the overall average
        value for the dataset.
    sales_chase : float
        Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold
        parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of
        ``sales_chase``. So ``sales_chase=0.05`` will copy each sale price with 5% random noise.
        **NOTE**: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.
    """
    self.loc_map = loc_map
    self.location_fields = location_fields
    self.overall_per_impr_sqft = overall_per_impr_sqft
    self.overall_per_land_sqft = overall_per_land_sqft
    self.sales_chase = sales_chase

MRAModel

MRAModel(fitted_model, intercept)

Multiple Regression Analysis Model

Plain 'ol (multiple) linear regression

Attributes:

Name Type Description
fitted_model RegressionResults

Fitted model from running the regression

intercept bool

Whether the model was fit with an intercept or not.

Source code in openavmkit/utilities/modeling.py
338
339
340
def __init__(self, fitted_model: RegressionResults, intercept: bool):
    self.fitted_model = fitted_model
    self.intercept = intercept

NaiveSqftModel

NaiveSqftModel(dep_per_built_sqft, dep_per_land_sqft, sales_chase)

An intentionally bad predictive model, to use as a sort of control. Produces predictions equal to the prevailing average price/sqft of land or building, multiplied by the observed size of the parcel's land or building, depending on whether it's vacant or improved.

Attributes:

Name Type Description
dep_per_built_sqft float

Dependent variable value divided by improved square footage

dep_per_land_sqft float

Dependent variable value divided by land square footage

sales_chase float

Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of sales_chase. So sales_chase=0.05 will copy each sale price with 5% random noise. NOTE: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.

Initialize a NaiveSqftModel

Parameters:

Name Type Description Default
dep_per_built_sqft float

Dependent variable value divided by improved square footage

required
dep_per_land_sqft float

Dependent variable value divided by land square footage

required
sales_chase float

Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of sales_chase. So sales_chase=0.05 will copy each sale price with 5% random noise. NOTE: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.

required
Source code in openavmkit/utilities/modeling.py
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
def __init__(
    self, dep_per_built_sqft: float, dep_per_land_sqft: float, sales_chase: float
):
    """Initialize a NaiveSqftModel

    Parameters
    ----------
    dep_per_built_sqft: float
        Dependent variable value divided by improved square footage
    dep_per_land_sqft: float
        Dependent variable value divided by land square footage
    sales_chase : float
        Simulates sales chasing. If 0.0, no sales chasing will occur. For any other value, predictions against sold
        parcels will chase (copy) the observed sale price, with a bit of random noise equal to the value of
        ``sales_chase``. So ``sales_chase=0.05`` will copy each sale price with 5% random noise.
        **NOTE**: This is for analytical purposes only, one should not intentionally chase sales when working in actual production.
    """
    self.dep_per_built_sqft = dep_per_built_sqft
    self.dep_per_land_sqft = dep_per_land_sqft
    self.sales_chase = sales_chase

PassThroughModel

PassThroughModel(field)

Mostly used for representing existing valuations to compare against, such as the Assessor's values

Attributes:

Name Type Description
field str

The field that holds the values you want to pass through as predictions

Initialize a PassThroughModel

Parameters:

Name Type Description Default
field str

The field that holds the values you want to pass through as predictions

required
Source code in openavmkit/utilities/modeling.py
272
273
274
275
276
277
278
279
280
281
282
283
def __init__(
    self,
    field: str,
):
    """Initialize a PassThroughModel

    Parameters
    ----------
    field : str
        The field that holds the values you want to pass through as predictions
    """
    self.field = field

SpatialLagModel

SpatialLagModel(per_sqft)

Use a spatial lag field as your prediction

Attributes:

Name Type Description
per_sqft bool

If True, normalize by square foot. If False, use the direct value of the spatial lag field.

Initialize a SpatialLagModel

Parameters:

Name Type Description Default
per_sqft bool

If True, normalize by square foot. If False, use the direct value of the spatial lag field.

required
Source code in openavmkit/utilities/modeling.py
252
253
254
255
256
257
258
259
260
def __init__(self, per_sqft: bool):
    """Initialize a SpatialLagModel

    Parameters
    ----------
    per_sqft : bool
        If True, normalize by square foot. If False, use the direct value of the spatial lag field.
    """
    self.per_sqft = per_sqft