Geocoding, reverse geocoding, and IP-based location lookup for the Axion platform.
ObjGeoLocation wraps three location resolution methods behind a consistent
GeoPointType return value:
| Input | Method | Backend |
|---|---|---|
| Address string (noisy/clean) | geocode() |
OpenStreetMap Nominatim |
| Lat / Lng coordinates | reverse_geocode() |
OpenStreetMap Nominatim |
| IP address | geocode_ip() |
ip-api.com |
When Ollama is reachable, geocode() optionally sends the raw input
through an AI model first to clean up misspellings, abbreviations, and
ambiguous formats before hitting Nominatim.
All methods return a GeoPointType dict (or None on failure):
{
"lat": float, # decimal degrees
"lng": float, # decimal degrees
"address": str, # human-readable display address
"source": str, # GeoSource enum value
"confidence": float, # 0.0 – 1.0 (Nominatim importance score)
"raw": dict, # full API response
"original_address": str, # present only when AI cleaned the input
}
from ObjGeoLocation import ObjGeoLocation
geo = ObjGeoLocation()
# Noisy address → coordinates (AI cleaning enabled by default)
point = geo.geocode("123 mian rd capeTown")
# → {"lat": -33.924, "lng": 18.424, "address": "...", ...}
# Disable AI preprocessing
point = geo.geocode("1600 Amphitheatre Pkwy, Mountain View, CA", use_ai=False)
# Coordinates → address
info = geo.reverse_geocode(-33.9249, 18.4241)
# IP → approximate location
loc = geo.geocode_ip("8.8.8.8")
# → {"lat": 37.386, "lng": -122.084, "address": "Mountain View, California, US", ...}
When use_ai=True (default) and Ollama is reachable, the raw address is
sent to the configured LLM with a short cleaning prompt before geocoding.
This improves match quality for:
"capetown" → "Cape Town, South Africa""st" → "Street", "jhn" → likely corrected by contextIf Ollama is not available the call is silently skipped and the original
address is sent directly to Nominatim.
Nominatim's fair-use policy requires ≥ 1 second between requests.
All Nominatim calls (geocode and reverse) enforce this automatically via
_rate_limit().
For high-volume use, consider running a local Nominatim instance or using
a commercial geocoding provider.
geocode_ip() uses the free tier of ip-api.com:
confidence fixed at 0.5NoneObjGeoLocation.yaml defines an optional cache_geo_location table for
caching results and avoiding repeat API calls for the same address.
Caching is not implemented in the base class — extend if needed.
requests (already in requirements.txt)geopy — optional; not used by the base class but available forExternal-bureau responses (e.g. Experian DoNormalEnquiry) store address data
inside a JSON blob column. Two methods extract and geocode those addresses:
geocode_excoded_payload(payload, encoding, address_type="R", use_ai=True)Parses a single payload string or dict and geocodes its address records.
from ObjEnum import ExcodedEncoding
# payload is the EnvelopePayload value from bloom_donormalenquiry
results = geo.geocode_excoded_payload(
payload_json,
encoding=ExcodedEncoding.EXPERIAN,
address_type="R", # "R" residential, "W" work, "P" postal, "" = all
)
# → list of GeoPointType (one per matching address record)
geocode_sql_excoded(sql, encoding, payload_column=0, address_type="R", use_ai=True)Runs a SQL query, reads the payload blob from each row, and geocodes all
matching address records per row.
results = geo.geocode_sql_excoded(
f"""SELECT EnvelopePayload
FROM bloom_donormalenquiry
WHERE idnumber = {self.escape_sql(id)}""",
encoding=ExcodedEncoding.EXPERIAN,
address_type="R",
)
# → list[list[GeoPointType | None]]
# outer list: one entry per SQL row
# inner list: one entry per address record in that row's payload
| Field | Content |
|---|---|
ADDRESS_TYPE |
R Residential / W Work / P Postal |
LINE_1 |
Street number and name |
LINE_2 – LINE_4 |
Additional address lines |
POSTAL_CODE |
Postal code |
Null and blank line fields are silently skipped when building the address
string. The resulting string goes through AI cleaning and Nominatim in the
same way as any other geocode() call.
data_postcode is a shared, cross-dataset table that maps a raw postal
code string to a GeoPointType. It is populated automatically when
geomap_excoded is called with use_postcode=True.
South Africa has roughly 3,500 unique postal codes. Without this table each
batch would re-geocode the same codes via Nominatim. With it, the first batch
for any source table fills the cache; every subsequent batch (for any source
table) reads from data_postcode instead, reducing Nominatim traffic to near
zero once the table is saturated.
data_postcode — cross-dataset persistent cache.{source_table}_geomap — dataset-specific persistent cache.data_postcode.data_postcode schema| Column | Type | Notes |
|---|---|---|
Guid |
VARCHAR(36) PK | shortuuid per row |
Postcode |
VARCHAR(20) UNIQUE | raw postal code string |
Address |
TEXT | Nominatim display name |
Lat / Lng |
DECIMAL(10,7) | |
GeoSource |
VARCHAR(50) | GeoSource enum value |
Confidence |
DECIMAL(4,3) | 0.0–1.0 |
Package / Module |
VARCHAR(255) | Standard Axion columns |
Createdate |
DATETIME |
_ensure_postcode_table() — CREATE TABLE IF NOT EXISTS_lookup_postcode_cache(postcode) — SELECT by Postcode_store_postcode_cache(postcode, point, package) — INSERT IGNOREgeomap_excoded() scans a source table containing encoded payload blobs,
geocodes every address record found, and writes the results to a companion
{source_table}_geomap table which is created automatically.
Processing is incremental — rows whose (SourceGuid, AddressType) pair
is already present in the geomap table are skipped.
from ObjEnum import ExcodedEncoding
geo = ObjGeoLocation()
# Map all residential addresses from bloom_donormalenquiry
# that have not yet been geocoded
inserted = geo.geomap_excoded(
source_table="bloom_donormalenquiry",
encoding=ExcodedEncoding.EXPERIAN,
address_type="R", # "R" residential / "W" work / "P" postal / "" all
use_ai=True,
limit=500, # process at most 500 rows per call; 0 = no limit
)
print(f"Inserted {inserted} rows into bloom_donormalenquiry_geomap")
scan_all_address_types(source_table, encoding, address_types=None, **kwargs)Convenience wrapper — calls geomap_excoded for each address type in one
shot. Defaults to ["R", "W", "P"]. All keyword arguments are forwarded
(e.g. use_postcode, limit, min_confidence).
counts = geo.scan_all_address_types(
"bloom_donormalenquiry",
ExcodedEncoding.EXPERIAN,
use_postcode=True,
use_ai=True,
limit=1000,
min_confidence=0.05,
)
# → {"R": 847, "W": 312, "P": 201}
min_confidence filterThe min_confidence parameter (default 0.0) discards any Nominatim result
whose importance score falls below the threshold before it is stored or cached.
Useful for filtering out broad administrative-boundary matches that happen to
cover the queried postcode but give no meaningful location precision.
Recommended starting value: 0.05.
geomap_excoded emits a self.debug() progress line every 100 source rows,
showing rows processed, inserted, cache hits, geocoded, and low-confidence
discards. Visible when DO_DEBUG = True or via the MQTT debug stream.
bloom_donormalenquiry_geomap schema| Column | Type | Notes |
|---|---|---|
Guid |
VARCHAR(36) PK | shortuuid per geomap row |
SourceGuid |
VARCHAR(36) | → source table PK |
IDnumber |
CHAR(255) | Denormalised for lookup |
AddressType |
CHAR(1) | R / W / P |
AddressSeq |
INT | Position in address array |
OriginalAddress |
TEXT | Raw LINE_1..4 + POSTAL_CODE |
CleanedAddress |
TEXT | Nominatim display name |
Lat / Lng |
DECIMAL(10,7) | |
GeoSource |
VARCHAR(50) | GeoSource enum value |
Confidence |
DECIMAL(4,3) | 0.0–1.0 |
Package / Module |
VARCHAR(255) | Standard Axion columns |
Createdate |
DATETIME |
ObjGeoZone — define geographic zones and perform point-in-zone testsObjTypes.GeoPointType — return type aliasObjEnum.GeoSource — source identifier constantsObjEnum.ExcodedEncoding — payload encoding constants