Integrating Custom Traffic Weights into OSRM

Integrating custom traffic weights into OSRM requires mapping your external traffic dataset to OSRM’s internal edge IDs, formatting the data as a strict CSV, and running the osrm-customize binary to inject the weights into the Multi-Level Dijkstra (MLD) partitioned graph. Once processed, the updated .osrm files are served via osrm-routed, overriding default OpenStreetMap-derived speeds with your congestion values during route calculation. This pipeline replaces static profile weights with dynamic or historical data, enabling logistics engineers and GIS developers to generate realistic travel-time matrices for fleet dispatch and Python Routing Engines & Isochrone Mapping workflows.

1. Generate the Edge ID Lookup Table

OSRM’s routing graph uses internal integer identifiers for each directed edge. To apply custom weights, you must first generate a lookup table that maps these internal IDs to OpenStreetMap way IDs or geographic coordinates.

Run osrm-extract with the --generate-edge-lookup flag:

osrm-extract --generate-edge-lookup region-latest.osm.pbf -p profiles/car.lua

This produces a .osrm.edges CSV containing edge_id, osm_way_id, source_node_id, target_node_id, and geometry (WKT). If your traffic feed uses raw GPS coordinates instead of OSM references, perform a spatial join using geopandas.sjoin() or shapely to snap points to the nearest directed edge before merging. Always verify that your coordinate reference system (CRS) matches the projection used during extraction (typically EPSG:4326).

2. Format & Validate the Traffic CSV

The osrm-customize binary expects a strictly formatted CSV with exactly four columns: edge_id, weight, duration, and speed.

Critical unit requirements:

  • weight and duration must be expressed in deciseconds (tenths of a second). OSRM’s MLD partitioner uses deciseconds for internal cost calculations.
  • speed must be in km/h.
  • edge_id must be a positive integer matching the .osrm.edges lookup.

Missing, negative, or non-numeric values will cause the customization step to skip edges or fail. Always clamp unrealistic values (e.g., speeds > 200 km/h or durations < 1 decisecond) and drop rows with NaN edge IDs. For exact parsing behavior and schema validation rules, consult the official OSRM customization documentation.

3. Python Preprocessing Pipeline

The following production-ready script ingests a raw traffic feed, matches it to OSRM edge IDs, applies fallback logic, and outputs a compliant CSV. It handles missing traffic data gracefully, enforces decisecond conversion, and validates numeric bounds.

import pandas as pd
import numpy as np

def prepare_osrm_traffic_csv(
    raw_traffic_df: pd.DataFrame,
    edge_lookup_path: str,
    output_path: str = "traffic.csv",
    fallback_speed_kmh: float = 45.0,
    min_speed_kmh: float = 5.0,
    max_speed_kmh: float = 180.0
) -> None:
    """
    Converts raw traffic data to OSRM-compatible CSV format.
    
    Args:
        raw_traffic_df: DataFrame with columns ['osm_way_id', 'speed_kmh', 'congestion_factor']
        edge_lookup_path: Path to the .osrm.edges CSV generated by osrm-extract
        output_path: Destination CSV path for osrm-customize
        fallback_speed_kmh: Default speed for unmatched edges
        min_speed_kmh / max_speed_kmh: Safety bounds for speed clamping
    """
    # Load OSRM edge mapping (force int types to prevent float casting)
    edges = pd.read_csv(edge_lookup_path, dtype={'edge_id': 'Int64', 'osm_way_id': 'Int64'})
    
    # Merge traffic data onto OSRM edges
    merged = edges.merge(raw_traffic_df, on='osm_way_id', how='left')
    
    # Apply fallback for missing traffic data
    merged['speed_kmh'] = merged['speed_kmh'].fillna(fallback_speed_kmh)
    
    # Clamp speeds to realistic bounds
    merged['speed_kmh'] = merged['speed_kmh'].clip(lower=min_speed_kmh, upper=max_speed_kmh)
    
    # Calculate duration and weight in deciseconds
    # OSRM expects length in meters. Ensure 'length_m' exists in the lookup.
    merged['duration_decis'] = (merged['length_m'] / merged['speed_kmh'] * 3.6 * 10).round().astype(int)
    merged['weight_decis'] = merged['duration_decis'].copy()
    
    # Optional: Apply congestion multiplier if available
    if 'congestion_factor' in merged.columns:
        merged['weight_decis'] = (merged['weight_decis'] * merged['congestion_factor'].fillna(1.0)).round().astype(int)
    
    # Select and rename columns to OSRM schema
    output_df = merged[['edge_id', 'weight_decis', 'duration_decis', 'speed_kmh']].copy()
    output_df.columns = ['edge_id', 'weight', 'duration', 'speed']
    
    # Drop rows with missing edge IDs or invalid calculations
    output_df = output_df.dropna(subset=['edge_id', 'weight', 'duration', 'speed'])
    output_df = output_df[output_df['edge_id'] > 0]
    
    # Export without index, matching OSRM's strict CSV expectations
    output_df.to_csv(output_path, index=False, float_format='%.0f')
    print(f"✅ Exported {len(output_df)} edges to {output_path}")

# Example usage:
# prepare_osrm_traffic_csv(raw_df, "region-latest.osrm.edges", "traffic.csv")

4. Inject Weights & Serve the Graph

Once the CSV is generated, run osrm-customize to patch the partitioned graph. This step reads the MLD partition files and updates the edge cost matrix without requiring a full re-extraction.

osrm-customize region-latest.osrm --traffic traffic.csv

After customization, the .osrm files contain your traffic weights. Start the routing daemon to serve requests:

osrm-routed --algorithm=MLD region-latest.osrm

If you are containerizing this workflow, ensure your docker-compose.yml mounts the .osrm directory as a persistent volume and passes the --algorithm=MLD flag. For a complete reference on container orchestration, volume mapping, and health checks, see Deploying OSRM with Docker for Local Routing.

Validation & Troubleshooting

Symptom Root Cause Fix
osrm-customize exits with 0 but routes ignore traffic CSV missing header or using seconds instead of deciseconds Verify header row matches edge_id,weight,duration,speed and multiply time values by 10
High 404 or NoRoute responses Edge IDs out of range or mismatched partition files Re-run osrm-partition and osrm-customize on the exact same .osrm dataset
Slow query performance after customization Overly restrictive fallback speeds or missing edges Increase fallback_speed_kmh and verify .osrm.edges covers the full bounding box

Always validate your output against ground-truth travel times. Use the /table/v1/driving endpoint to generate a sample distance matrix and compare it against historical GPS logs or commercial routing APIs. For programmatic validation, leverage the pandas DataFrame.merge documentation to audit join cardinality and ensure no duplicate edge mappings inflate your cost matrix.