Configuration
Configuring cloud storage
openavmkit includes a module for working with remote storage services. At this time the library supports three cloud storage methods:
- Microsoft Azure
- Hugging Face
- SFTP
To configure cloud storage, you will need to create a file that stores your connection credentials (such as API keys or passwords). This file should be named .env and should be placed in the notebooks/ directory.
This file is already ignored by git, but do make sure you don't accidentally commit this file to the repository or share it with others, as it contains your sensitive login information!
This file should be a plain text file formatted like this:
SOME_VARIABLE=some_value
ANOTHER_VARIABLE=another_value
YET_ANOTHER_VARIABLE=123
That's just an example of the format; here are the actual variables that it recognizes:
| Variable Name | Description |
|---|---|
AZURE_ACCESS |
The type of access your azure account has. Legal values are: read_only, read_write. |
AZURE_STORAGE_CONNECTION_STRING |
The connection string for the Azure storage account |
HF_ACCESS |
The type of access your huggingface account has. Legal values are: read_only, read_write. |
HF_TOKEN |
The Hugging Face API token |
SFTP_ACCESS |
The type of access your SFTP account has. Legal values are: read_only, read_write. |
SFTP_HOSTNAME |
The hostname of the SFTP server |
SFTP_USERNAME |
The username for the SFTP server |
SFTP_PASSWORD |
The password for the SFTP server |
SFTP_PORT |
The port number for the SFTP server |
You only need to provide values for the service that you're actually using. For instance, here's what the file might look like if you are using Azure:
AZURE_ACCESS=read_write
AZURE_STORAGE_CONNECTION_STRING=<YOUR_AZURE_CONNECTION_STRING>
If you're just getting started, you can use read-only anonymous access to a public Azure container:
AZURE_ACCESS=read_only
This will let you download the inputs for any of the public datasets provided by The Center for Land Economics, which are hosted in an Azure container at https://landeconomics.blob.core.windows.net/localities-public. Point a locality's cloud.json at that URL (see below) and you can pull the data without any credentials. Note that you will be unable to upload your changes and outputs to containers that you have read-only access to.
If you want to sync with your own cloud storage, you will need to set up your own hosting account and then provide the appropriate credentials in the .env file.
Per-locality cloud.json
Cloud destinations are configured per locality in a cloud.json file alongside the locality's settings.json, at notebooks/pipeline/data/<locality>/cloud.json. This file selects which cloud service the locality syncs with and supplies the (non-sensitive) destination identifiers. Sensitive credentials still live in .env — never put them here, since cloud.json may itself be synced to the cloud.
| Key | Cloud type | Required for | Description |
|---|---|---|---|
type |
(any) | always | Which cloud service this locality uses. One of azure, huggingface, sftp. |
azure_storage_container_name |
azure |
read_write |
Name of the Azure storage container. |
azure_storage_container_url |
azure |
read_only |
URL of the Azure storage container (used for anonymous read-only access). |
hf_repo_id |
huggingface |
always | The Hugging Face repository ID, e.g. <your-org>/<your-repo>. |
hf_revision |
huggingface |
optional | A specific revision/branch to pull from. Defaults to the repo's default branch. |
Example cloud.json for a locality syncing with the public Center for Land Economics Azure container (read-only access, no credentials needed):
{
"type": "azure",
"azure_storage_container_url": "https://landeconomics.blob.core.windows.net/localities-public"
}
Example for Azure read-write to your own container:
{
"type": "azure",
"azure_storage_container_name": "<your-container-name>"
}
Example for Hugging Face:
{
"type": "huggingface",
"hf_repo_id": "<your-org>/<your-repo>"
}
Because the cloud destination is stored per locality in cloud.json, you can have different localities pointing at different services (one at Azure, another at Hugging Face, etc.) — .env just needs the credentials for whichever services you actually use.
Configuring PDF report generation
openavmkit includes a module for generating PDF reports. This module uses the pdfkit library, which is a Python wrapper for the wkhtmltopdf command line tool. Although pdfkit will be installed automatically along with the rest of the dependencies, you will need to install wkhtmltopdf manually on your system to use this feature. If you skip this step, don't worry, you'll still be able to use the rest of the library, it just won't generate PDF reports.
Installing wkhtmltopdf:
Manual installation
Visit the wkhtmltopdf download page and download the appropriate installer for your operating system.
Windows
- Download the installer linked above
- Run the installer and follow the instructions
- Add the
wkhtmltopdfinstallation directory to your system's PATH environment variable
If you don't know what 3. means:
The idea is that you want the wkhtmltopdf executable to be available from any command prompt, so you can run it from anywhere. For that to work, you need to make sure that the folder that the wkhtmltopdf executable is in is listed in your system's PATH environment variable.
Here's how to do that:
- Find the folder where
wkhtmltopdfwas installed. On Windows it's probably inC:\Program Files\wkhtmltopdf\bin, but it could be somewhere else. Pay attention when you install it. - Follow this tutorial to edit your PATH environment variable. You want to add the folder from step 1 to the PATH variable.
- Open a new command prompt and type
wkhtmltopdf --version. If you see a version number, you're all set!
Linux
On Debian/Ubuntu, run:
sudo apt-get update
sudo apt-get install wkhtmltopdf
macOS
Ensure you have Homebrew installed. Then run:
brew install wkhtmltopdf
Configuring US Census API Access
OpenAVMKit can enrich your data with US Census information using the Census API. To use this feature, you'll need to:
- Get a US Census API key from api.census.gov/data/key_signup.html
- Add your US Census API key to the
.envfile in thenotebooks/directory
Getting a US Census API Key
- Visit api.census.gov/data/key_signup.html
- Fill out the form with your information
- Agree to the US Census terms of service
- You will receive your API key via email
Configuring the US Census API Key
Add your US Census API key to the .env file in the notebooks/ directory:
CENSUS_API_KEY=your_api_key_here
Using US Census Enrichment
To enable US Census enrichment in your locality settings, add the following to your settings.json:
{
"process": {
"enrich": {
"census": {
"enabled": true,
"year": 2022,
"fips": "24510",
"fields": [
"median_income",
"total_pop"
]
}
}
}
}
Key settings:
enabled: Set totrueto enable Census enrichmentyear: The Census year to query (default: 2022)fips: The 5-digit FIPS code for your locality (state + county)fields: List of Census fields to include
The Census enrichment will automatically join Census block group data to your parcels using spatial joins, adding geographic and demographic information to your dataset.
Configuring OpenStreetMap Enrichment
OpenAVMKit can enrich your data with geographic features from OpenStreetMap, such as water bodies, parks, educational institutions, transportation networks, and golf courses. This enrichment adds distance-based features to your dataset, which can be valuable for property valuation.
Using OpenStreetMap Enrichment
To enable OpenStreetMap enrichment in your locality settings, add something like the following to your settings.json (this is just an example, open street map enrichment is highly configurable):
{
"process": {
"enrich": {
"universe": {
"openstreetmap": {
"enabled": true,
"water_bodies": {
"enabled": true,
"min_area": 10000,
"top_n": 5,
"sort_by": "area"
},
"transportation": {
"enabled": true,
"min_length": 1000,
"top_n": 5,
"sort_by": "length"
},
"educational": {
"enabled": true,
"min_area": 1000,
"top_n": 5,
"sort_by": "area"
},
"parks": {
"enabled": true,
"min_area": 2000,
"top_n": 5,
"sort_by": "area"
},
"golf_courses": {
"enabled": true,
"min_area": 10000,
"top_n": 3,
"sort_by": "area"
}
},
"distances": [
{
"id": "water_bodies",
"max_distance": 1500,
"unit": "m"
},
{
"id": "water_bodies_top",
"field": "name",
"max_distance": 1500,
"unit": "m"
},
{
"id": "parks",
"max_distance": 800,
"unit": "m"
},
{
"id": "parks_top",
"field": "name",
"max_distance": 800,
"unit": "m"
},
{
"id": "golf_courses",
"max_distance": 1500,
"unit": "m"
},
{
"id": "golf_courses_top",
"field": "name",
"max_distance": 1500,
"unit": "m"
},
{
"id": "educational",
"max_distance": 1500,
"unit": "m"
},
{
"id": "educational_top",
"field": "name",
"max_distance": 1500,
"unit": "m"
},
{
"id": "transportation",
"max_distance": 1500,
"unit": "m"
},
{
"id": "transportation_top",
"field": "name",
"max_distance": 1500,
"unit": "m"
}
]
}
}
}
}
Feature Types and Settings
The OpenStreetMap enrichment supports the following feature types:
-
Water Bodies: Rivers, lakes, reservoirs, etc.
min_area: Minimum area in square meters (default: 10000)top_n: Number of largest water bodies to track individually (default: 5)
-
Transportation: Major roads, railways, etc.
min_length: Minimum length in meters (default: 1000)top_n: Number of longest transportation routes to track individually (default: 5)
-
Educational Institutions: Universities, colleges, etc.
min_area: Minimum area in square meters (default: 1000)top_n: Number of largest institutions to track individually (default: 5)
-
Parks: Public parks, gardens, playgrounds, etc.
min_area: Minimum area in square meters (default: 2000)top_n: Number of largest parks to track individually (default: 5)
-
Golf Courses: Golf courses and related facilities
min_area: Minimum area in square meters (default: 10000)top_n: Number of largest golf courses to track individually (default: 3)
Distance Calculations
For each feature type, the enrichment calculates:
-
Aggregate distances: Distance to the nearest feature of that type
- Output variable:
dist_to_[feature_type]_any(in meters)
- Output variable:
-
Individual distances: Distance to each of the top N largest features
- Output variable:
dist_to_[feature_type]_[feature_name](in meters) - Example:
dist_to_parks_central_park
- Output variable:
Configuration Options
enabled: Set totrueto enable OpenStreetMap enrichmentmin_area/min_length: Filter out features smaller than this thresholdtop_n: Number of largest features to track individuallysort_by: Property to use for sorting features (area or length)max_distance: Maximum distance to calculate (in meters)unit: Unit of measurement for distances (m for meters)
The OpenStreetMap enrichment will automatically join geographic feature data to your parcels using spatial joins, adding distance-based features to your dataset.