Can I use traffic weight injection with the Contraction Hierarchies algorithm?

No. CH bakes edge weights at osrm-contract time. To use custom speeds you must switch to the MLD pipeline (osrm-partition + osrm-customize) and serve with --algorithm mld.

Why does osrm-customize silently ignore my segment-speed file?

The most common cause is a header row in the CSV — osrm-customize treats every line as data. Write the file with header=False and ensure node IDs are integers from the current .osrm.nodes extraction run.

How often should I re-run osrm-customize for live traffic updates?

osrm-customize is fast (typically under 60 seconds for a city-scale graph) and does not require re-extraction. You can automate it on a cron schedule — every 5–15 minutes is practical for near-real-time congestion updates.

Integrating Custom Traffic Weights into OSRM

Injecting custom traffic weights into OSRM replaces the static, OpenStreetMap-derived speeds baked into your road profile with real congestion data — historical duty-cycle averages, live probe feeds, or time-of-day speed tables. This technique is a focused extension of the OSRM Docker deployment and container orchestration workflow and slots directly into the broader Python routing engines and isochrone mapping discipline. The mechanism is OSRM’s segment-speed file: a headerless CSV of from_node_id,to_node_id,speed_kmh tuples that osrm-customize uses to overwrite edge travel costs in the Multi-Level Dijkstra (MLD) partitioned graph without requiring a full re-extraction.

Algorithm constraint: traffic weight injection is only possible with MLD (osrm-partition + osrm-customize). The Contraction Hierarchies (osrm-contract) algorithm bakes weights at contract time; updating them requires a complete rebuild. Every step below assumes MLD.

When to use this approach

Custom traffic weight injection is the right technique when:

Your Lua profile speeds are stale. The default car.lua profile assigns speeds from maxspeed tags and road-class heuristics. Real-world duty-cycle speeds on urban arterials can run 30–60 % below those values during peak hours.
You have time-of-day speed tables. Historical probe data (from fleet telematics, HERE, TomTom, or open sources like Uber Movement) gives you reliable average speeds by hour-of-day and day-of-week per road segment. A segment-speed file lets you parameterize osrm-customize against the correct time window for each dispatch batch.
You need sub-hourly updates without full graph rebuilds. osrm-customize on a city-scale MLD graph takes under 60 seconds, making near-real-time congestion updates practical within a Docker Compose service loop. A full re-extraction from the PBF source takes tens of minutes and is impractical at that cadence.
Your fleet operates in high-congestion corridors. For logistics engineers building travel-time matrices for last-mile delivery or fleet dispatch, a 15 % route-time error from stale speeds compounds across hundreds of stops. Injecting measured speeds cuts matrix error substantially.

This technique is not appropriate for scenarios where the network topology itself changes (new roads, reclassified highway= tags) — those changes require a fresh osrm-extract and osrm-partition pass first.

Implementation

The pipeline has three discrete stages: building the OSM node-pair lookup, preprocessing your traffic feed into a compliant CSV, and running osrm-customize inside Docker.

Stage 1 — Run the MLD preprocessing pipeline

osrm-customize operates on a graph that has already been extracted and partitioned. Run the full MLD sequence once on your source PBF file:

# Standard MLD pipeline — run once; re-run osrm-customize only on traffic updates
docker run -t -v "$(pwd):/data" osrm/osrm-backend \
  osrm-extract -p /opt/car.lua /data/region-latest.osm.pbf

docker run -t -v "$(pwd):/data" osrm/osrm-backend \
  osrm-partition /data/region-latest.osrm

# Initial customize with no custom speeds (uses profile defaults)
docker run -t -v "$(pwd):/data" osrm/osrm-backend \
  osrm-customize /data/region-latest.osrm

The .osrm.nodes file produced by osrm-extract maps internal node IDs to their original OSM node IDs. This file is the bridge between your traffic feed and OSRM’s internal edge representation.

Stage 2 — Build a compliant segment-speed CSV

osrm-customize reads a headerless CSV with exactly three columns: from_node_id, to_node_id, and speed_kmh. Both node ID columns must contain OSRM internal integers; the speed column must be positive. The following production script ingests a raw traffic DataFrame, applies fallback logic, clamps outliers, and writes a compliant file:

# requires: pandas>=1.4, numpy>=1.21
import pandas as pd
import numpy as np


def prepare_osrm_segment_speed_csv(
    raw_traffic_df: pd.DataFrame,
    output_path: str = "segment_speeds.csv",
    fallback_speed_kmh: float = 45.0,
    min_speed_kmh: float = 5.0,
    max_speed_kmh: float = 130.0,
) -> pd.DataFrame:
    """
    Convert a raw traffic DataFrame to an OSRM segment-speed CSV.

    Args:
        raw_traffic_df: DataFrame with columns:
            'from_node_id' (int) — OSRM internal node ID
            'to_node_id'   (int) — OSRM internal node ID
            'speed_kmh'    (float) — observed or historical speed
        output_path: Destination path read by osrm-customize
        fallback_speed_kmh: Applied when speed is missing or zero
        min_speed_kmh: Lower clamp bound (rejects near-zero probe outliers)
        max_speed_kmh: Upper clamp bound (rejects GPS-derived spikes)

    Returns:
        Cleaned DataFrame (for audit / join against ground truth)
    """
    df = raw_traffic_df[["from_node_id", "to_node_id", "speed_kmh"]].copy()

    # Drop rows where either node ID is null — cannot produce a valid edge
    df = df.dropna(subset=["from_node_id", "to_node_id"])
    df["from_node_id"] = df["from_node_id"].astype(np.int64)
    df["to_node_id"] = df["to_node_id"].astype(np.int64)

    # Coerce speed to numeric; replace non-positive values with fallback
    df["speed_kmh"] = pd.to_numeric(df["speed_kmh"], errors="coerce")
    mask_invalid = df["speed_kmh"].isna() | (df["speed_kmh"] <= 0)
    df.loc[mask_invalid, "speed_kmh"] = fallback_speed_kmh

    # Vectorised clamp — faster than apply() for large feeds
    df["speed_kmh"] = df["speed_kmh"].clip(lower=min_speed_kmh, upper=max_speed_kmh)

    # De-duplicate: keep the mean speed for any repeated directed edge
    df = (
        df.groupby(["from_node_id", "to_node_id"], as_index=False)["speed_kmh"]
        .mean()
    )

    # Write headerless CSV — osrm-customize treats every line as data
    df.to_csv(output_path, index=False, header=False, float_format="%.1f")
    print(f"Wrote {len(df):,} directed edges to {output_path}")
    return df


# Example: load from a Parquet traffic snapshot, then build the CSV
# raw = pd.read_parquet("traffic_2026_06_23_08h.parquet")
# clean = prepare_osrm_segment_speed_csv(raw, "segment_speeds.csv")

If your traffic feed is keyed by GPS coordinates rather than OSM node IDs, snap points to their nearest OSM nodes using a spatial join before building the lookup. A typical approach is to load the .osrm.nodes file as a GeoDataFrame and use geopandas.sjoin_nearest — see the generating isochrones with PySAL and GeoPandas workflow for spatial join patterns you can adapt.

Stage 3 — Inject and serve

With the CSV written, patch the partitioned graph and restart the router:

# Patch edge costs — fast; does not require re-extraction
docker run -t -v "$(pwd):/data" osrm/osrm-backend \
  osrm-customize /data/region-latest.osrm \
  --segment-speed-file /data/segment_speeds.csv

# Start the MLD routing daemon
docker run -d --name osrm-traffic \
  -p 5000:5000 \
  -v "$(pwd):/data" \
  osrm/osrm-backend \
  osrm-routed --algorithm mld /data/region-latest.osrm

Multiple speed files are accepted with repeated --segment-speed-file flags — useful for stacking a base historical layer with a live incident override layer.

Key parameters and tuning

Parameter	Recommended value	Sensitivity notes
`fallback_speed_kmh`	35–50 km/h for urban; 70–90 km/h for motorway	Too high → over-optimistic routes; too low → unreachable nodes near graph fringes
`min_speed_kmh`	5 km/h	Prevents divide-by-zero in travel-time cost; captures crawl conditions
`max_speed_kmh`	130 km/h (motorway cap)	GPS probe feeds often contain spike readings of 200+ km/h from bad fixes
De-duplication strategy	`mean` over duplicate directed edges	Consider `median` if your feed contains outlier probes that skew the mean
Update cadence	5–15 min for live feeds; hourly for historical bins	Below 5 min, `osrm-customize` I/O can overlap with the routing daemon’s mmap reads — use a blue/green volume swap
Speed file line count	No hard limit; 10 M+ rows tested	Large files: pre-sort by `from_node_id` for sequential disk reads

Integration points

The segment-speed CSV connects your traffic data pipeline to OSRM’s edge cost layer without touching graph topology. Downstream integration points:

Travel-time matrix generation. After customization, the /table/v1/driving endpoint returns travel-time matrices that now reflect your injected speeds. Feed these matrices into VRP solvers or fleet dispatch systems. If you are using custom cost functions for routing solvers, the OSRM matrix becomes the primary cost input.

Blue/green volume swap for zero-downtime updates. Write the new .osrm files (after osrm-customize) to a staging volume while the current routing daemon serves from the live volume. Switch the Docker volume mount and send SIGHUP to osrm-routed to reload. This avoids request failures during the customization write window.

Automated cron update loop. Wrap the Python CSV generation and the docker run osrm-customize call in a scheduler (APScheduler, Airflow, or a simple cron job) keyed to your traffic data refresh interval. The OSRM Docker container setup and volume orchestration reference covers the Compose configuration for persistent volume mounts needed to make this loop reliable.

Lua profile speed cap interaction. osrm-customize respects any hard speed cap (speed_profile.max_speed) defined in the Lua profile used during osrm-extract. If your injected speeds exceed the profile cap, they are silently clamped. Verify the cap in car.lua matches your maximum expected speed; re-extract if you need to raise it.

Validation checklist

Run these checks after every osrm-customize pass before promoting the graph to production traffic:

Node count sanity. Compare the number of rows in your segment-speed CSV against the total directed edge count in the graph. A ratio below 5 % coverage on an urban area suggests the node-ID mapping went wrong.
Sample route comparison. Pull 20–50 routes from both the pre-customization and post-customization endpoints for corridors with known congestion. Expected outcome: post-customization travel times on congested segments are 15–40 % higher during peak hours.
Zero-speed rejection audit. Confirm osrm-customize did not silently drop rows. Run the CSV through awk -F',' '$3 <= 0 {print NR": "$0}' segment_speeds.csv — the output should be empty after your Python preprocessing step.
/table/v1/driving matrix spot-check. Generate a 10×10 travel-time matrix for known origin–destination pairs. Cross-reference against historical GPS trip logs or a commercial API. Acceptable MAPE (mean absolute percentage error) for a well-tuned feed is under 12 % on arterials.
Algorithm flag verification. Query curl http://localhost:5000/route/v1/driving/lon1,lat1;lon2,lat2 and confirm the response metadata.datasource_names array contains "lua profile" entries — if it shows "ch" the container started with the wrong algorithm flag.
Daemon restart confirmation. After a blue/green swap, verify the new container is serving by checking docker logs osrm-traffic --tail 20 for the [info] running MLD startup message.

Why does osrm-customize silently ignore my speed file?

The most common cause is a header row. osrm-customize parses every line as from_node_id,to_node_id,speed_kmh data — a header of from_node_id,to_node_id,speed_kmh becomes a failed parse on line 1 and the rest of the file is skipped. Write with header=False. The second cause is using node IDs from a different osrm-extract run: if you re-extracted after changing the Lua profile, old node IDs are invalid.

Routes are unchanged after osrm-customize — what went wrong?

Check the algorithm flag first: docker inspect osrm-traffic | grep Cmd — if you see --algorithm ch, the container used Contraction Hierarchies. Rebuild with osrm-partition + osrm-customize (not osrm-contract) and restart with --algorithm mld. If the algorithm is correct, verify the .osrm.cells and .osrm.mldgr files were updated (check modification timestamps after osrm-customize completes).

How often can I re-run osrm-customize without degrading routing quality?

Each osrm-customize pass is independent and idempotent — the previous custom weights are fully replaced. Cadence is limited by I/O, not graph integrity. For city-scale graphs (5–15 M edges) the step completes in 30–90 seconds. Run it as frequently as your traffic feed updates; use the blue/green volume pattern above to avoid serving stale routes during the write window.

Deploying OSRM with Docker for Local Routing — container orchestration, volume mapping, and health checks for the OSRM service
Custom cost functions for Python routing solvers — extend beyond speed injection to composite cost functions covering toll, gradient, and vehicle class
Step-by-step OSRM Docker setup on AWS EC2 — infrastructure configuration for running this pipeline at production scale
Configuring edge weights for freight logistics — OSM tag mapping and vehicle-class considerations that interact with your Lua profile speed caps