Online Data Downloads
This chapter focuses on downloading vector data directly from online providers into Whitebox workflows. The initial implementation is OpenStreetMap (OSM) via Overpass API using download_osm_vector.
Scope and Current Provider
Current tool:
wbe.vector.online_data.download_osm_vector(...)
The tool downloads OSM features within a longitude/latitude bounding box (EPSG:4326), optionally filters by theme, and writes output in any supported vector format based on output extension.
Quick Start
import whitebox_workflows as wbw
wbe = wbw.WbEnvironment()
roads = wbe.vector.online_data.download_osm_vector(
west=-80.54,
south=43.41,
east=-80.47,
north=43.47,
filter_preset="roads",
include_points=False,
include_lines=True,
include_polygons=False,
)
wbe.write_vector(roads, "kitchener_roads.geojson")
Presets and Filters
Preset classes:
allroadsbuildingswaterlandusetrailsparksrailamenitiesboundariestransitpoi
Optional custom filters:
filter_key="amenity"filter_key_value="amenity=school"
If custom filters are supplied, they take precedence over preset filtering.
Geometry Controls
Use geometry toggles to reduce result size and parsing overhead:
include_pointsinclude_linesinclude_polygons
Typical examples:
- Road centerlines only: points off, lines on, polygons off
- Building footprints only: points off, lines off, polygons on
Phase 2 options:
split_output_by_geometry=Truewrites separate files with_points,_lines, and_polygonssuffixes.
Caching and Provenance
Use optional caching when iterating on the same AOI/filter query repeatedly:
cache_dir=".wbw_cache/osm"cache_ttl_hours=24(set0to disable TTL checks)
Use provenance_output to write a JSON sidecar with endpoint, bbox, filters, feature counts, and cache usage metadata.
roads = wbe.vector.online_data.download_osm_vector(
west=-80.54,
south=43.41,
east=-80.47,
north=43.47,
filter_preset="trails",
include_points=False,
include_lines=True,
include_polygons=False,
split_output_by_geometry=True,
cache_dir=".wbw_cache/osm",
cache_ttl_hours=24,
provenance_output="kitchener_trails_provenance.json",
output="kitchener_trails.geojson",
)
Projection and Output
Rules:
- Query extent is interpreted as EPSG:4326 (lon/lat).
- Set
input_extent_epsgto provide west/south/east/north in another CRS (the bbox is transformed to EPSG:4326 before querying Overpass). - Output stays EPSG:4326 unless
output_epsgis provided. - Output format is inferred from filename extension (
.shp,.gpkg,.geojson,.topojson, ...).
Endpoint selection:
overpass_profilesupports:main,kumi,fr,custom.overpass_urloverrides the selected profile URL when provided.
Large-AOI chunking:
chunk_large_aoi=True(default) automatically tiles large query extents.chunk_max_area_deg2=4.0controls maximum area per chunk.max_chunk_count=64caps the number of generated chunk requests.chunk_parallel_requests=1(default) controls bounded parallel chunk fetch; set >1 to fetch chunks concurrently.
buildings = wbe.vector.online_data.download_osm_vector(
west=-80.54,
south=43.41,
east=-80.47,
north=43.47,
filter_preset="buildings",
include_points=False,
include_lines=False,
include_polygons=True,
output_epsg=32617,
)
wbe.write_vector(buildings, "kitchener_buildings_utm17n.gpkg")
Operational Guidance
Overpass public endpoints enforce rate limits. Prefer smaller AOIs and bounded requests.
Recommended practice:
- Keep AOIs compact
- Use thematic filters
- Set
max_elementsdefensively - Increase
timeout_secondsfor denser urban queries
Attribution and Licensing
OSM data are provided under ODbL. When distributing derived datasets or maps, ensure proper OpenStreetMap attribution and verify downstream licensing obligations for your use case.
Companion Example
See:
crates/wbw_python/examples/osm_download_vector.py