Urban delivery routing breaks down at the intersection level. Edge weights capture travel time between points, but the real compliance surface — loading-bay windows, low-emission zone entry flags, curb-stopping rules, and traffic-signal dwell penalties — lives on nodes. Enriching graph nodes with this operational metadata is the decisive step that separates a theoretical street network from a logistics-ready routing graph, and it sits squarely within the broader OSM Graph Architecture & Network Modeling discipline. When node attributes are missing or inconsistent, solvers generate routes that violate municipal access rules, underestimate dwell time at restricted curbs, and fail compliance audits — regardless of how carefully the edge weights have been tuned.
This page walks through a production-tested pipeline for extracting, overlaying, and validating node attributes inside defined delivery zones. The workflow assumes you have already built a topologically sound directed graph; if you are starting from raw data, the building directed graphs from OSM PBF files guide covers edge directionality and topology cleaning that must precede node enrichment.
Prerequisites
Python and library versions
# pip install osmnx>=1.8 networkx>=3.0 geopandas>=0.13 shapely>=2.0 pandas>=2.0 pyarrow>=14 opening-hours
import osmnx as ox
import networkx as nx
import geopandas as gpd
import pandas as pd
from shapely.geometry import Point
System requirements
- At least 8 GB RAM for metropolitan-scale graphs (London, Paris, NYC bounding boxes)
- A local
.osm.pbfextract or direct Overpass API access for the target area - Delivery zone boundaries as GeoJSON or Shapefile (municipal open-data portals publish these for most EU and North American cities)
- A projected CRS chosen before any spatial operation — do not run joins on WGS84 coordinates
Install the conditional-access parser:
pip install opening-hours
Conceptual architecture
Node enrichment runs as a three-stage pipeline that attaches municipal geometry, OSM tag data, and external operational databases to each graph node before the graph is handed to a solver.
What each stage does:
- Spatial overlay — Projects the node GeoDataFrame to a local metric CRS, spatially joins zone polygon attributes, and resolves boundary conflicts deterministically.
- Attribute mapping — Parses OSM tags and external registries into normalized, solver-ready fields:
traffic_control,curb_accessibility,loading_bay_capacity, andaccess_restrictionstime windows. - Solver handoff — Validates the enriched attribute table against a strict schema, serializes node metadata separately from graph topology, and writes formats compatible with OR-Tools, OSRM, and Valhalla constraint inputs.
The enrichment targets a different cost surface than configuring edge weights for freight logistics, which handles segment traversal. Node attributes govern dwell time, stop eligibility, and access compliance — additive costs that live on vertices, not edges.
Step-by-step implementation
Step 1 — Graph extraction and topology validation
Extract the drivable network and isolate the node layer. Use the drive or drive_service network type to exclude pedestrian paths that cannot support commercial vehicles.
# osmnx>=1.8, networkx>=3.0
import osmnx as ox
import networkx as nx
# Replace with your target area bounding box (minlat, minlon, maxlat, maxlon)
BBOX = (51.48, -0.12, 51.53, -0.06) # Central London example
G = ox.graph_from_bbox(
north=BBOX[2], south=BBOX[0],
east=BBOX[3], west=BBOX[1],
network_type="drive",
retain_all=False, # drop disconnected components
simplify=True,
)
# Validate strong connectivity before enrichment — enriching an unconnected graph
# produces node attributes that never influence any route
if not nx.is_strongly_connected(G):
# Keep the largest strongly connected component only
largest_scc = max(nx.strongly_connected_components(G), key=len)
G = G.subgraph(largest_scc).copy()
# Project to local metric CRS for accurate spatial operations
G_proj = ox.project_graph(G) # auto-detects UTM zone from graph centroid
nodes_gdf, edges_gdf = ox.graph_to_gdfs(G_proj)
print(f"Projected CRS: {nodes_gdf.crs}")
print(f"Nodes: {len(nodes_gdf):,} | Edges: {len(edges_gdf):,}")
Calibration note: retain_all=False drops every node not reachable from the main connected component. For metropolitan areas with many isolated service roads this can remove 5–15 % of nodes — inspect the count before and after and confirm the losses are genuinely unreachable stubs, not data errors.
Step 2 — Spatial overlay with delivery zone polygons
Load municipal zone boundaries and join their attributes to the node GeoDataFrame with a vectorized spatial join.
# geopandas>=0.13, shapely>=2.0
import geopandas as gpd
zones_gdf = gpd.read_file("delivery_zones.geojson")
zones_gdf = zones_gdf.to_crs(nodes_gdf.crs) # align to projected CRS
# Buffer zone polygons by 8 m to absorb floating-point boundary imprecision.
# Nodes that land exactly on a polygon edge will sjoin to nothing without this.
zones_buffered = zones_gdf.copy()
zones_buffered["geometry"] = zones_gdf.geometry.buffer(8)
nodes_enriched = gpd.sjoin(
nodes_gdf.reset_index(), # bring osmid into columns
zones_buffered[["geometry", "zone_type", "time_window_start",
"time_window_end", "lez_flag", "max_weight_t"]],
how="left",
predicate="intersects", # 'op' is deprecated since geopandas 0.10
)
# A node touching two overlapping zones gets one row per zone.
# Sort by strictness and deduplicate: 'restricted' > 'commercial' > 'residential'.
ZONE_RANK = {"restricted": 0, "commercial": 1, "residential": 2, "industrial": 3}
nodes_enriched["_rank"] = nodes_enriched["zone_type"].map(ZONE_RANK).fillna(99)
nodes_enriched = (
nodes_enriched
.sort_values("_rank")
.drop_duplicates(subset=["osmid"], keep="first")
.drop(columns=["_rank", "index_right"])
.set_index("osmid")
)
# Nodes outside all zones → transit-only
nodes_enriched["zone_type"] = nodes_enriched["zone_type"].fillna("transit_only")
nodes_enriched["lez_flag"] = nodes_enriched["lez_flag"].fillna(False).astype(bool)
Calibration note: The 8 m buffer is appropriate for dense urban street networks at 1:5,000 scale. Increase to 15–20 m for rural networks or lower-accuracy municipal polygon data. Buffers that are too large incorrectly pull nodes from adjacent zones.
Step 3 — Attribute mapping and enrichment
Parse OSM tags from the node GeoDataFrame and merge with external registries. All OSM tag values in node data appear as strings or NaN — normalise before writing to the attribute table.
# osmnx>=1.8, pandas>=2.0, opening-hours
import pandas as pd
from opening_hours import OpeningHours
# --- 3a: traffic control type ---
SIGNAL_MAP = {
"traffic_signals": "signalised",
"stop": "stop_sign",
"give_way": "yield",
"crossing": "crossing",
"mini_roundabout": "roundabout",
}
nodes_enriched["traffic_control"] = (
nodes_gdf.get("highway", pd.Series(dtype=str))
.map(SIGNAL_MAP)
.reindex(nodes_enriched.index)
.fillna("uncontrolled")
)
# --- 3b: curb accessibility for heavy vehicles ---
KERB_HGV = {"lowered": True, "flush": True, "raised": False, "rolled": None}
raw_kerb = nodes_gdf.get("kerb", pd.Series(dtype=str)).reindex(nodes_enriched.index)
nodes_enriched["curb_hgv_accessible"] = raw_kerb.map(KERB_HGV)
# --- 3c: parse conditional delivery restrictions ---
def parse_delivery_window(cond_tag: str | None) -> list[dict]:
"""Return list of {days, start_min, end_min} dicts, or [] if unrestricted."""
if not isinstance(cond_tag, str) or not cond_tag.strip():
return []
try:
oh = OpeningHours(cond_tag.split("@", 1)[-1].strip(" ()"))
# Convert to (weekday_mask, start_minute, end_minute) — simplified
return [{"raw": cond_tag}] # extend with full parse in production
except Exception:
return [{"raw": cond_tag, "parse_error": True}]
raw_cond = nodes_gdf.get("delivery:conditional", pd.Series(dtype=str)).reindex(nodes_enriched.index)
nodes_enriched["delivery_windows"] = raw_cond.apply(parse_delivery_window)
# --- 3d: loading bay capacity from external registry ---
loading_bays = pd.read_csv("loading_bay_registry.csv", index_col="nearest_osmid")
nodes_enriched["loading_bay_capacity"] = (
loading_bays["bay_count"]
.reindex(nodes_enriched.index)
.fillna(0)
.astype(int)
)
nodes_enriched["loading_bay_max_weight_t"] = (
loading_bays["max_weight_t"]
.reindex(nodes_enriched.index)
.fillna(0.0)
)
Calibration note: delivery:conditional uses OpenStreetMap’s conditional restriction syntax, e.g. no @ (Mo-Fr 08:00-18:00). Not all cities tag this consistently — cross-validate against the municipal loading-bay register where available, as OSM coverage of loading restrictions in dense urban cores is typically 40–70 % complete.
Step 4 — Validation and consistency checks
Validate schema integrity before writing attributes back to the graph. Silent NaN values in solver inputs produce infeasible routes that are extremely hard to debug.
# pandas>=2.0
import pandas as pd
REQUIRED_COLS = [
"zone_type", "lez_flag", "traffic_control",
"curb_hgv_accessible", "loading_bay_capacity",
]
# 4a: Check for unexpected nulls in mandatory columns
null_counts = nodes_enriched[REQUIRED_COLS].isnull().sum()
if null_counts.any():
raise ValueError(f"Null values remain after enrichment:\n{null_counts[null_counts > 0]}")
# 4b: Zone consistency — loading bays must not appear in restricted zones
# where stopping is prohibited entirely
bad_bays = (
nodes_enriched
.query("zone_type == 'restricted' and loading_bay_capacity > 0")
)
if len(bad_bays):
print(f"Warning: {len(bad_bays)} nodes have loading bays in restricted zones — review")
# 4c: Signal penalty sanity — signalised nodes in 20 km/h zones should
# have shorter mean delays; flag outliers for manual review
sig_nodes = nodes_enriched[nodes_enriched["traffic_control"] == "signalised"]
print(f"Signalised intersections: {len(sig_nodes):,}")
print(f" With HGV-accessible curb: {sig_nodes['curb_hgv_accessible'].eq(True).sum():,}")
print(f" In LEZ: {sig_nodes['lez_flag'].sum():,}")
Step 5 — Serialization and solver integration
Export node metadata separately from graph topology to allow hot-swapping zone definitions without a full graph rebuild.
# pyarrow>=14, networkx>=3.0
import pyarrow as pa
import networkx as nx
# Serialize delivery_windows list column to JSON string for Parquet compatibility
import json
nodes_enriched["delivery_windows_json"] = (
nodes_enriched["delivery_windows"].apply(json.dumps)
)
export_cols = [
"zone_type", "lez_flag", "traffic_control", "curb_hgv_accessible",
"loading_bay_capacity", "loading_bay_max_weight_t", "delivery_windows_json",
]
nodes_enriched[export_cols].to_parquet(
"node_attributes.parquet",
engine="pyarrow",
compression="snappy",
)
# Write attributes back to the NetworkX graph for in-process routing
attr_dict = nodes_enriched[export_cols].to_dict(orient="index")
nx.set_node_attributes(G_proj, attr_dict)
# Optionally export full graph as GraphML for Valhalla or OR-Tools pre-processing
nx.write_graphml(G_proj, "enriched_graph.graphml")
Configuration reference
| Parameter | Recommended value | Notes |
|---|---|---|
sjoin predicate |
intersects |
op is deprecated since geopandas 0.10 |
| Boundary buffer | 5–10 m urban / 15–20 m rural | Prevents missed assignments from floating-point polygon edges |
Default traffic_control |
uncontrolled |
Never NaN — solvers treat null as no penalty and may ignore the node |
Default curb_hgv_accessible |
None (unknown) |
Distinct from False; solvers can apply a conservative penalty to None |
| Zone conflict resolution | Strictest zone wins | Prevents compliance violations at boundaries between residential and commercial zones |
| Node attribute storage | Parquet sidecar | Keeps graph topology small; zone updates don’t require full graph rebuild |
| Conditional tag parser | opening-hours library |
Handles the OpenStreetMap conditional restriction syntax reliably |
| CRS for spatial join | UTM zone matching city | Use osmnx.project_graph() — WGS84 joins introduce 0.3–2 % positional error |
Production optimization and scaling
Vectorize all spatial operations. Avoid row-wise loops when joining nodes to zones. Use geopandas vectorized sjoin with a pre-built spatial index (nodes_gdf.sindex) rather than iterating over rows with iterrows(). For 500k-node metropolitan graphs, vectorized joins run in under 30 seconds; row-wise equivalents take 20+ minutes.
Tile large cities. Process metro areas in overlapping 2 km tiles with 200 m overlap buffers. This prevents memory spikes during spatial joins and lets you parallelise zone assignments with concurrent.futures.ProcessPoolExecutor. After parallel joins, merge tiles and resolve boundary-node duplicates with the same strictest-zone rule used inside each tile.
Cache projected coordinates. OSMnx re-projects the graph on each run. Write the projected node GeoDataFrame to Parquet after step 1 and reload from disk on subsequent runs. Reprojection of a 300k-node graph takes 8–12 seconds — meaningful in a 15-minute pipeline.
Avoid attribute bloat. OSM nodes carry dozens of tags. Only map attributes that the solver consumes. Each additional column inflates the Parquet sidecar and degrades Parquet row-group scan speed. Strip tags unrelated to routing (e.g. name, wikidata, source) before writing.
Schema versioning. Municipal delivery rules change every quarter. Stamp each attribute export with schema_version and data_timestamp metadata, and register them in a simple schema_registry.json. This enables rollback when a zone boundary update introduces unexpected constraint violations and is essential for audit compliance in regulated last-mile operations.
For solver-specific constraint tuning once attributes are in place, see optimizing node attributes for last-mile routing.
Validation and testing
Run these checks after every pipeline execution before handing the graph to a routing engine:
1. Attribute coverage rate. The fraction of nodes with a non-default zone assignment should match the proportion of the bounding box covered by zone polygons. A 40 % zone-covered city where only 20 % of nodes have zone assignments indicates a CRS mismatch or buffer that is too narrow.
coverage = (nodes_enriched["zone_type"] != "transit_only").mean()
print(f"Zone coverage: {coverage:.1%}")
# Expected: within ±5 pp of zone polygon area / bounding box area
2. Signal node plausibility. Spot-check ten traffic_control='signalised' nodes against Google Maps or OSM. Intersections that appear as unmarked junctions in satellite imagery indicate a tag-mapping error.
3. Loading bay cross-validation. Compare loading_bay_capacity totals per zone against the municipal loading-bay register. Totals within 10 % confirm the registry join is correct; larger gaps point to osmid-snapping errors in the registry file.
4. Conditional window parse rate. Log the fraction of delivery:conditional tags that parse without error. Below 90 % parse rate indicates non-standard tag syntax that needs an explicit normalisation rule.
5. Graph attribute round-trip. After nx.set_node_attributes(), sample 100 nodes and confirm G_proj.nodes[osmid]['zone_type'] matches the Parquet export for the same osmid.
6. Regression on a fixed test bbox. Freeze a small test bounding box (e.g. 500 × 500 m around a known loading bay cluster), run the full pipeline, and assert exact node counts and attribute distributions. Run this as part of your CI pipeline on every zone-data update.
Troubleshooting
Boundary nodes assigned to wrong zone — or not assigned at all
Symptom: Nodes on polygon edges receive transit_only classification despite clearly falling inside a zone visually.
Root cause: Floating-point imprecision in the polygon coordinates places the node geometry marginally outside the polygon in the projected CRS.
Fix: Apply an 8 m buffer to zone polygons before sjoin. If this over-captures nodes from adjacent zones, reduce the buffer and add an explicit priority rule: the zone with the smaller area wins for nodes in the buffer overlap band.
sjoin returns duplicate node rows — node count inflated after join
Symptom: len(nodes_enriched) after sjoin exceeds len(nodes_gdf).
Root cause: Nodes touching two or more overlapping zone polygons produce one row per matching zone.
Fix: After the join, sort by zone priority rank and call drop_duplicates(subset=['osmid'], keep='first') as shown in step 2. Confirm the final count equals len(nodes_gdf).
Conditional delivery window tags fail to parse
Symptom: parse_error: True in delivery_windows for a significant fraction of nodes; delivery_windows is [] for most nodes in a city that has known restrictions.
Root cause: Non-standard tag phrasing (e.g. "No @ (weekdays)" instead of "no @ (Mo-Fr)") or encoding issues in the OSM extract.
Fix: Log all raw delivery:conditional values before parsing, build a normalisation dictionary for local variants, and run it as a preprocessing step before calling the opening-hours parser. Re-check OSM data freshness — conditional tags are frequently updated but inconsistently styled.
nx.set_node_attributes is slow on large graphs
Symptom: Writing attributes back to a 500k-node NetworkX MultiDiGraph takes several minutes.
Root cause: nx.set_node_attributes with a large dict iterates Python-side and triggers per-node dict updates.
Fix: Call nx.set_node_attributes once per attribute column rather than passing the full nested dict. Better still, for graphs above 200k nodes, keep all attributes in the Parquet sidecar and access them via Pandas during routing rather than embedding them in NetworkX node dicts.
Exported GraphML cannot be loaded by OR-Tools or Valhalla
Symptom: Solver throws a parse error on the delivery_windows_json attribute or rejects None-valued fields.
Root cause: GraphML encodes all attributes as strings; None becomes the literal string "None", and nested JSON strings need escaping that some parsers reject.
Fix: Strip complex columns (delivery_windows_json, list-type fields) from the GraphML export. Keep GraphML for topology only. Pass node constraints as a separate JSON or Parquet file referenced in the solver configuration.
Related
- Building directed graphs from OSM PBF files — topology construction that must precede node enrichment
- Configuring edge weights for freight logistics — segment-level cost functions that complement node-level constraints
- Handling turn restrictions in routing graphs — relation-based constraints that interact with node-level access rules
- Optimizing node attributes for last-mile routing — solver-specific constraint tuning once attributes are mapped
- Graph fragmentation prevention in OSM data — ensuring the topology is connected before enrichment adds operational meaning