The calc expression language
OpenAVMKit's settings.json lets you derive new columns, normalize messy values, and define filter expressions without writing Python. This is done through calc blocks, which use a small Lisp-style expression language: every expression is a JSON list whose first element is an operator and whose remaining elements are operands.
This page is the authoritative reference for the operator set. For where calc is used in the broader settings tree, see Advanced settings reference. For an introduction to the calc system in context, see the tutorial § B.4.
1. Where calc is used
calc blocks appear in several places in settings.json. They all use the same expression language documented here, but the timing and the data they operate on differ:
| Location | Runs | Operates on | Typical use |
|---|---|---|---|
data.load.<id>.calc |
At file-load time, immediately after column renames and dtype coercion | The just-loaded file's columns (renamed) | Convert units (acres → sqft), build composite keys, normalize categorical values, mark valid sales |
data.process.enrich.calc.universe / .sales |
After all enrichments are complete | The fully-enriched universe / sales DataFrames | Derive features that depend on enriched columns (Census fields, distance fields, GIS area, etc.) |
Key point:
data.load.<id>.calconly sees columns that exist in the file you just loaded.data.process.enrich.calc.*sees the full enriched picture. If a calc you wrote indata.loadsays "field not found," it's almost always because the field comes from enrichment — move the calc todata.process.enrich.calc.
A sibling system, tweak, handles targeted per-row overrides — see § 5 Tweaks below.
2. Expression syntax
Every calc entry is a JSON list of the form:
["<operator>", <operand1>, <operand2>, ...]
A calc block is a dictionary mapping new column names to expressions:
{
"calc": {
"land_area_sqft": ["*", "TOTAL_ACRES", 43560],
"valid_sale": ["?", ["==", "SALE_STATUS", "str:QUALIFIED"]]
}
}
Each entry produces a column with the given name, evaluated row-by-row.
Operand types
How each operand is interpreted:
| Operand | Meaning | Example |
|---|---|---|
| Bare string (no prefix) | Column name — resolves to the named column's values for each row | "TOTAL_ACRES" |
"str:..." |
String literal (everything after the prefix) | "str:QUALIFIED" |
| Number (int or float) | Numeric literal | 43560 |
| Nested list | Recursive sub-expression — the result of evaluating it | ["+", "BLDG_AREA_1", "BLDG_AREA_2"] |
"$$path.to.value" |
Settings reference resolved by the preprocessor before calc runs (see advanced_settings.md § 1.2) |
"$$ref.calc.qual_to_perc_map" |
Why the
str:prefix exists. A bare string is interpreted as a column name. If you want a literal string, you must say so explicitly:"str:QUALIFIED", not"QUALIFIED". Without the prefix the calc engine would look for a column calledQUALIFIEDand raise an error.
3. Operator reference
3.1 Control flow
? — selection filter
["?", <filter_expression>]
Returns a boolean Series — True for rows where the filter expression matches, False otherwise. Used to derive boolean fields like valid_sale.
"valid_sale": ["?", ["==", "SALE_STATUS", "str:QUALIFIED"]]
where — conditional
["where", <filter_expression>, <value_if_true>, <value_if_false>]
Returns a Series whose values come from <value_if_true> for matching rows and <value_if_false> for non-matching rows.
"sale_price_clean": ["where",
["==", "DEED_TYPE", "str:WARRANTY"],
"SALE_PRICE",
0
]
values — multi-column packing
["values", <col1>, <col2>, ...]
Returns a sub-DataFrame containing the specified columns. Mostly used as input to operators that take multiple columns at once (notably join).
3.2 Type coercion (unary)
| Operator | Effect | Example |
|---|---|---|
asint |
Cast to nullable int64 | ["asint", "RAW_COUNT"] |
asfloat |
Cast to float | ["asfloat", "RAW_PRICE"] |
asstr |
Cast to string | ["asstr", "PARCEL_ID"] |
3.3 Math (unary)
| Operator | Effect | Example |
|---|---|---|
floor |
np.floor (round down) |
["floor", "AREA"] |
ceil |
np.ceil (round up) |
["ceil", "AREA"] |
round |
np.round (round half-to-even) |
["round", "AREA"] |
abs |
np.abs (absolute value) |
["abs", "DELTA"] |
set |
Pass-through (returns the operand unchanged) | ["set", "OTHER_FIELD"] |
3.4 Logic (unary)
| Operator | Effect | Example |
|---|---|---|
not |
Bitwise NOT (flip booleans) | ["not", "is_vacant"] |
3.5 String (unary)
| Operator | Effect | Example |
|---|---|---|
strip |
Remove leading and trailing whitespace | ["strip", "ADDRESS"] |
striplzero |
Remove leading zeros | ["striplzero", "PARCEL_ID"] |
stripkey |
Remove all whitespace and leading zeros — for normalizing parcel keys | ["stripkey", "PARCEL_ID"] |
3.6 Comparison (binary)
| Operator | Effect | Notes |
|---|---|---|
== |
Equals | When the right side is a string, both sides are stripped before comparison |
!= |
Not equals | Same string-stripping behavior |
["==", "SALE_STATUS", "str:QUALIFIED"]
["!=", "ZONING", "str:EXEMPT"]
3.7 Arithmetic (binary)
| Operator | Effect | Example |
|---|---|---|
+ |
Add | ["+", "BLDG_AREA_1", "BLDG_AREA_2"] |
- |
Subtract | ["-", "TOTAL", "PORTION"] |
* |
Multiply | ["*", "TOTAL_ACRES", 43560] |
/ |
Divide (raises on divide-by-zero) | ["/", "PRICE", "AREA"] |
// |
Integer divide (cast to int) | ["//", "AGE_YEARS", 10] |
/0 |
Z-safe divide (returns NaN on divide-by-zero instead of erroring) |
["/0", "PRICE", "AREA"] |
min |
Element-wise minimum | ["min", "VALUE_A", "VALUE_B"] |
max |
Element-wise maximum | ["max", "VALUE_A", "VALUE_B"] |
round_nearest |
Round LHS to the nearest multiple of RHS | ["round_nearest", "bldg_condition_num", 5] |
3.8 Logic (binary)
| Operator | Effect | Example |
|---|---|---|
and |
Bitwise AND on boolean Series | ["and", "is_residential", "is_owner_occupied"] |
or |
Bitwise OR on boolean Series | ["or", "is_townhome", "is_condo"] |
3.9 Set membership
| Operator | Effect | Example |
|---|---|---|
isin |
True for rows whose LHS appears in the RHS list | ["isin", "ZONING", ["str:R1", "str:R2", "str:R3"]] |
3.10 Mapping and gap-filling
map — dictionary lookup
["map", <column_or_expr>, <dict>]
Looks each value up in the dict and substitutes. Values not in the dict are passed through unchanged.
["map", "GRADE", {
"A+": "Excellent",
"A": "Excellent",
"B": "Good",
"C": "Fair"
}]
Dictionaries are commonly defined once in a ref block and referenced via $$ref.calc.<name> to keep settings.json DRY.
fillempty — replace blanks
["fillempty", <column>, <fill_value>]
Treats NaN, empty strings, and the case-insensitive strings "none", "null", "n/a", "na", "nan", "<na>" as missing, and replaces them.
fillna — replace pandas NaN
["fillna", <column>, <fill_value>]
Stricter than fillempty — only replaces actual NaN. Use fillempty for messy real-world data.
3.11 String replace and search
| Operator | Effect | Example |
|---|---|---|
replace |
Apply a dict of literal find/replace pairs to a string column | ["replace", "GRADE", {"+": "", "-": ""}] |
replace_regex |
Same as replace, but the keys are regex patterns |
["replace_regex", "ADDRESS", {"\\s+": " "}] |
contains |
True if LHS string contains RHS substring | ["contains", "DESC", "str:CONDO"] |
contains_case_insensitive |
Case-insensitive variant of contains. RHS may also be a list of needles, in which case the result is true if LHS contains any of them. |
Single: ["contains_case_insensitive", "DESC", "str:condo"]List: ["contains_case_insensitive", "DESC", ["str:condo", "str:townhome", "str:apartment"]] |
3.12 String split, join, and slice
| Operator | Effect | Example |
|---|---|---|
split_before |
Take everything before the first occurrence of the separator | ["split_before", "GRADE", "str:-"] |
split_after |
Take everything after the first occurrence of the separator | ["split_after", "GRADE", "str: "] |
join |
Concatenate the columns from a values expression with a separator |
see below |
substr |
Slice a string by index | ["substr", "PARCEL_ID", {"left": 0, "right": 6}] |
join operates on multiple columns at once, so the LHS must be a values expression:
["join",
["values", "REID", ["asstr", ["datetimestr", "SALE_DATE", "str:%m/%d/%Y"]]],
"str:---"
]
This produces a sale key like "123456---2024-03-15".
substr accepts a dict for the right-hand side: {"left": <start>, "right": <end>}. Either bound is optional.
3.13 Date
| Operator | Effect | Example |
|---|---|---|
datetime |
Parse a string column as a datetime using the given format | ["datetime", "SALE_DATE", "str:%m/%d/%Y"] |
datetimestr |
Parse and reformat as ISO date string (YYYY-MM-DD) |
["datetimestr", "SALE_DATE", "str:%m/%d/%Y %H:%M"] |
3.14 Geometry
geo_area — compute area from geometry
["geo_area", <unit>]
Computes the area of the row's geometry in the specified unit. Requires a GeoDataFrame with a geometry column. Valid units: sqft, sqm, acres, sqkm, hectares. The DataFrame is reprojected to an equal-area CRS before computing area.
"land_area_sqft": ["geo_area", "str:sqft"]
4. Worked examples
Convert acres to square feet
"land_area_sqft": ["*", "TOTAL_ACRES", 43560]
Build a composite sale key
"key_sale": ["join",
["values",
"REID",
["asstr", ["datetimestr", "SALE_DATE", "str:%m/%d/%Y %H:%M"]]
],
"str:---"
]
Normalize building grade text
"bldg_quality_txt": ["split_before",
["split_before",
["split_before", ["map", "GRADE", "$$ref.calc.qual_to_perc_map"], "str: "],
"str:-"
],
"str:+"
]
Mark valid sales
"valid_sale": ["?", ["==", "SALE_STATUS", "str:QUALIFIED"]]
Compose multiple selection criteria
"valid_residential_sale": ["?",
["and",
["==", "SALE_STATUS", "str:QUALIFIED"],
["isin", "ZONING", ["str:R1", "str:R2", "str:R3"]]
]
]
Convert grade letter to a numeric quality score
"bldg_quality_num": ["/",
["asfloat",
["replace",
["split_after", ["map", "GRADE", "$$ref.calc.qual_to_perc_map"], "str: "],
{"%": ""}
]
],
100
]
This example, drawn from the us-nc-guilford settings, shows nested operations: take a string like "B+ 85%", look it up in a reference map, split off the percentage, strip the %, cast to float, divide by 100.
Compute GIS area at load time
"land_area_sqft": ["geo_area", "str:sqft"]
(For most use cases, the basic-geo enrichment step in data.process.enrich.basic already produces land_area_gis_<unit> automatically — see advanced_settings.md § 4.1. Use geo_area in a calc only when you need it before enrichment runs.)
5. Tweaks
tweak is a sibling system that handles targeted per-row overrides. It's simpler than calc — no expression language, just key-value substitutions.
A tweak block is a list of entries, each with field, key, and values:
{
"tweak": [
{
"field": "bldg_quality_num",
"key": "PARCEL_ID",
"values": {
"12345": 0.85,
"67890": 0.90
}
}
]
}
This says: "for the row where PARCEL_ID == 12345, set bldg_quality_num to 0.85; for the row where PARCEL_ID == 67890, set it to 0.90." All other rows are unchanged.
Use tweaks for: jurisdiction-specific manual overrides — known data errors on specific parcels, hand-corrected values that don't fit a general rule.
Don't use tweaks for: rule-based transformations. Those belong in calc or data.process.fill.
6. Common pitfalls
- "Field not found" errors. A bare string operand is interpreted as a column name. If you meant a string literal, prefix it with
str:. If you meant a reference to a value defined elsewhere insettings.json, prefix it with$$. - Calc references an enriched field.
data.load.<id>.calconly sees the just-loaded file. If your calc references a Census or distance column, move it todata.process.enrich.calc. - Type errors in arithmetic. A column loaded as a string (because of a non-numeric character somewhere in the data) won't multiply or divide cleanly. Use
asfloatorasintfirst, or usereplaceto scrub the offending characters. - Division by zero.
/raises an error on zeros. Use/0for z-safe division that returnsNaNinstead. - Map values that don't match.
mappasses through unmatched values unchanged. If you need to error or fall back to a default, add a follow-upfillnaorfillempty.
7. See also
- The tutorial § B.4 — calc in the context of authoring your first
settings.json - Advanced settings reference — settings.json structure, preprocessor, enrichment, fill rules, caching
- openavmkit/calculations.py — implementation
- Canonical examples that use calc heavily: