How to Extract OSM Road Networks with Osmium
To extract OSM road networks with Osmium, stream the .pbf file using the osmium extract CLI with a spatial filter (bounding box or GeoJSON polygon) and apply tag-based filtering for routing-relevant highway classes. For automated pipelines, use the pyosmium Python bindings to implement a SimpleHandler that filters ways, extracts directional attributes (oneway, maxspeed, lanes), and exports to GeoJSON or routing-ready PBF without loading the full dataset into RAM. This streaming architecture is critical when Building Directed Graphs from OSM PBF Files for logistics routing, fleet optimization, or urban mobility simulations.
Environment & Compatibility Requirements
Osmium is a C++ library with Python bindings that require compiled extensions. Compatibility varies by platform, and missing system dependencies will cause pip install osmium to fail during wheel compilation.
| Component | Minimum Version | Notes |
|---|---|---|
| OS | Linux (glibc 2.17+), macOS 11+, WSL2 (Windows) | Native Windows builds are deprecated; use WSL2 or conda-forge |
| Python | 3.9–3.12 | 3.8 lacks required C-API features for pyosmium |
libosmium |
3.0.0+ | Required for PBF streaming and geometry factories |
| System Deps | zlib, protobuf, bzip2, expat |
Install via package manager before pip install |
Recommended installation:
conda install -c conda-forge pyosmium osmium-tools
If using pip, install headers first:
# Ubuntu/Debian
sudo apt install libosmium-dev zlib1g-dev libprotobuf-dev libbz2-dev libexpat1-dev
pip install osmium
Refer to the official Osmium Tool documentation for platform-specific troubleshooting and advanced build flags.
CLI Extraction Workflow
The fastest path for regional extraction uses the osmium extract command with a bounding box and tag filter. Osmium streams the continental .pbf sequentially, retaining only ways containing routing-critical tags, and writes a spatially clipped output without holding the source file in memory.
osmium extract \
-b -122.5,37.7,-122.3,37.8 \
--keep-tags "highway,oneway,maxspeed,lanes,bridge,tunnel,surface" \
north-america-latest.osm.pbf \
-o sf_roads.pbf
Key parameters:
-b min_lon,min_lat,max_lon,max_lat: Defines the bounding box. Coordinates must be inWGS84order (longitude, latitude).--keep-tags: Accepts a comma-separated list. Prevents attribute bloat while preserving metadata required for routing cost functions.-o: Output path. Supports.pbf,.osm, and.o5m.
For polygon-based clipping, replace -b with --polygon region.geojson. Osmium automatically handles multipolygon boundaries and preserves topology during clipping. The resulting .pbf can be ingested directly into graph builders or converted to GeoJSON using osmium export.
Python Automation Script
Backend developers and GIS engineers typically require programmatic extraction to integrate with CI/CD pipelines or custom graph builders. The following script uses osmium.SimpleHandler to filter roads, reconstruct geometries, and export to GeoJSON:
import osmium
import json
import sys
class RoadNetworkHandler(osmium.SimpleHandler):
# Routing-relevant highway classes per OSM tagging conventions
HIGHWAY_FILTER = {
'motorway', 'trunk', 'primary', 'secondary', 'tertiary',
'residential', 'unclassified', 'service', 'motorway_link',
'trunk_link', 'primary_link', 'secondary_link', 'tertiary_link'
}
def __init__(self, output_path):
super().__init__()
self.output_path = output_path
self.nodes = {}
self.features = []
def node(self, n):
# Cache node coordinates for way geometry reconstruction
self.nodes[n.id] = (n.location.lon, n.location.lat)
def way(self, w):
highway = w.tags.get('highway')
if highway not in self.HIGHWAY_FILTER:
return
coords = []
for nd in w.nodes:
if nd.ref in self.nodes:
coords.append(self.nodes[nd.ref])
else:
return # Skip incomplete geometries
feature = {
"type": "Feature",
"properties": {
"highway": highway,
"oneway": w.tags.get("oneway", "no"),
"maxspeed": w.tags.get("maxspeed", ""),
"lanes": w.tags.get("lanes", ""),
"surface": w.tags.get("surface", ""),
"osm_id": w.id
},
"geometry": {
"type": "LineString",
"coordinates": coords
}
}
self.features.append(feature)
def end(self):
with open(self.output_path, 'w') as f:
json.dump({"type": "FeatureCollection", "features": self.features}, f, indent=2)
if __name__ == "__main__":
if len(sys.argv) != 3:
print("Usage: python extract_roads.py <input.pbf> <output.geojson>")
sys.exit(1)
handler = RoadNetworkHandler(sys.argv[2])
handler.apply_file(sys.argv[1], locations=True)
Execution:
python extract_roads.py north-america-latest.osm.pbf sf_roads.geojson
The handler caches nodes in memory, filters ways against a strict highway allowlist, and extracts routing attributes. For a complete breakdown of tag semantics and edge-case handling, consult the OSM Wiki highway key documentation.
Routing & Graph Integration
Extracted road networks rarely feed directly into production routing engines. Before ingestion, you must normalize directional constraints, resolve disconnected components, and assign traversal costs. This preprocessing stage is foundational to OSM Graph Architecture & Network Modeling, where raw geometries are transformed into adjacency structures optimized for Dijkstra, A*, or contraction hierarchies.
When preparing data for graph builders:
- Normalize
onewayvalues: Convertyes,1,-1, andreversibleinto boolean or directional flags. - Parse
maxspeed: Strip units (km/h,mph) and convert to a consistent numeric baseline. - Handle
lanes: Split multi-lane highways into parallel edges when modeling capacity or toll routing. - Validate geometry: Remove self-intersecting ways and snap endpoints to ensure graph connectivity.
Performance & Best Practices
- Stream over load: Never parse a full
.pbfinto memory. Use CLI extraction for regional cuts, and reservepyosmiumfor attribute transformation or custom filtering. - Pre-filter at source: Apply
--keep-tagsduring extraction to reduce downstream I/O and parsing overhead. - Use
locations=True: Inapply_file(), this flag enables coordinate access. Omitting it returnsNonefor node locations, breaking geometry reconstruction. - Batch writes: For large extracts, buffer features and flush to disk in chunks rather than appending to a single JSON array.
- Validate topology: Run
osmium fileinfoandosmium tags-filterpost-extraction to verify tag retention and spatial bounds.
By combining Osmium’s streaming parser with strict tag filtering, you can reliably produce lightweight, routing-ready datasets that scale from municipal boundaries to continental networks.