Overview

This manual is the long-form user guide for the WbW-Py API.

WbW-Py (Whitebox Workflows for Python) is the Python frontend to Whitebox Next Gen, exposing a high-performance geospatial engine for raster, vector, lidar, and sensor-bundle workflows through a Pythonic API.

Whitebox is a long-running geospatial project that began in academia and has evolved into a broad analysis platform with particular strength in geomorphometry, hydrological analysis, lidar processing, and remote sensing. Whitebox Next Gen is the current Rust-based major iteration, designed to pair high performance with modern APIs.

Whitebox Next Gen is intentionally full-stack: core geospatial capabilities that are often delegated to external C/C++ dependencies in other GIS platforms (for example raster I/O, projections, geometry/topology operations, and lidar handling) are implemented in the Whitebox codebase itself. This architecture is unusual in GIS and provides practical benefits for users: consistent behavior across platforms, tighter control over correctness and performance, fewer system-level dependency issues during installation, and faster iteration when fixing bugs or introducing new capabilities.

Within that architecture, WbW-Py is the user-facing orchestration layer for Python users. It is intended for three common use cases:

  • exploratory research notebooks and small scripts,
  • reproducible batch workflows in production pipelines,
  • integration with the broader Python geospatial ecosystem.

This manual balances beginner accessibility with reference-grade detail:

  • beginner friendly: clear chapter flow and runnable examples,
  • canonical reference: explicit patterns, boundary conditions, and validation guidance aligned with backend capabilities.

How to Use This Manual

This guide is designed for practitioners who want to move from exploratory scripts to reliable, repeatable geospatial pipelines. The examples are intentionally script-first, but each chapter also communicates design intent: when to keep data in memory, when to persist to disk, and how to validate each processing stage.

Readers get the best results by following chapters in order the first time, then using the script index as a task-oriented lookup once the core patterns are familiar.

When adapting examples, treat each script as a template with four parts:

  1. session or environment setup,
  2. data intake,
  3. transformation chain,
  4. validation and persistence.

This structure helps keep your own scripts consistent as they grow.

Goals:

  • Comprehensive API coverage.
  • Practical, script-first learning path.
  • Consistent chapter flow aligned with the project README.

Documentation style rules:

  • Every major concept must include executable code snippets.
  • Each chapter should include at least one complete end-to-end script.
  • Tool-specific parameter details are linked to tool reference docs where needed.

Common Pitfalls and Validation

  • Confirm inputs exist and have the expected CRS/schema/metadata before running long workflows.
  • Prefer explicit output names and formats for reproducibility, especially in batch scripts.
  • Re-open and inspect outputs after major steps to validate assumptions before chaining more tools.
  • For performance-sensitive runs, start with a small representative subset, then scale to full data.

Write Option References

For quick access to output option tables:

Installation and Setup

Installation is intentionally separated from workflow chapters so failures are detected early and diagnosed in isolation. The smoke tests here are not just "does import work" checks; they confirm that bindings, core runtime components, and representative interop pathways are all healthy on your machine.

For team environments, treat this chapter as a baseline validation checklist before onboarding scripts or CI jobs.

Development Install

Use this path for local development or source-based testing. It installs the package and links the Python layer with the current workspace backend.

./scripts/dev_python_install.sh

Pro Build

Use this path when testing environments that include optional pro-enabled capabilities.

./scripts/dev_python_install.sh --pro

Smoke Tests

Run both smoke tests before starting deeper debugging. The first validates import and startup; the second checks an interop roundtrip so I/O boundaries are confirmed.

python3 crates/wbw_python/examples/python_import_smoke_test.py
python3 crates/wbw_python/examples/interop_roundtrip_smoke_test.py

Build and Preview

This manual is structured as an mdBook project.

Treat documentation builds as part of normal engineering hygiene. A clean build confirms that chapter links, code fences, and chapter ordering remain coherent as examples evolve. Live preview is especially useful when refining conceptual text, because it helps ensure headings and narrative pacing stay readable.

Install mdBook

cargo install mdbook

Build the Manual

Run a full build when editing chapter structure, links, or long conceptual sections.

From repository root:

cd crates/wbw_python/manual
mdbook build

Generated HTML is written to book/.

Live Preview

Use live preview while refining wording and section flow in narrative-heavy chapters.

cd crates/wbw_python/manual
mdbook serve --open

Authoring Notes

  • Keep chapter order aligned with src/SUMMARY.md.
  • Prefer runnable snippets over pseudo-code.
  • Link chapter recipes to concrete scripts listed in the script index.

Quick Start

This chapter gives you the shortest path to a successful first run. Think of it as a minimal "vertical slice": create an environment, load a dataset, run a tool, and write output. The goal is to verify that your runtime, paths, and basic mental model are all working before you invest in larger workflows.

After this first pass, the rest of the manual progressively explains why each step matters and how to make the same pattern robust for production scripts.

The example below demonstrates the default object-first workflow pattern. It reads a DEM, runs a hydrological tool, and writes an output raster. If this runs successfully, your environment and core processing path are functioning.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
dem = wbe.read_raster("dem.tif")
filled = wbe.hydrology.depressions_storage.fill_depressions(dem)
wbe.write_raster(filled, "dem_filled.tif")

These chapters deepen the same lifecycle shown above: discovery before execution, explicit progress handling, and script catalogs for adaptation.

Quick-Start Conventions

  • Prefer object-first workflows (read_* -> transform -> write_*).
  • Validate tool visibility for scripted pipelines.
  • Persist only outputs you need to keep.

Environment and Discovery

This chapter covers environment lifecycle and tool discovery patterns.

Environment setup is where workflow reliability begins. By explicitly creating and configuring a runtime environment, you make script behavior predictable across machines and sessions. Discovery APIs then let you verify capability before execution, which avoids long-running failures caused by missing tools, unexpected categories, or version mismatches.

Create and Configure Environment

This example establishes an explicit runtime configuration. In production scripts, this is where you set working directory and verbosity policy.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
wbe.working_directory = '/path/to/data'
wbe.verbose = True

print(wbe.version())
print(wbe.license_type())

Namespace Discovery

Use discovery to understand available capability before writing long workflows or generating dynamic tooling UIs.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

print('categories:', wbe.categories())
print('domains:', wbe.domain_namespaces())
print('terrain tools sample:', wbe.terrain.list_tools()[:15])

Search and Describe Tools

This pattern is useful when you know a task but not the exact tool identifier.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

matches = wbe.search_tools('slope')
for m in matches[:5]:
	print(m.get('tool_id', m))

desc = wbe.describe_tool('slope')
print(desc)

Discovering Parameter Values with describe_tool

describe_tool returns a dictionary containing structured parameter metadata. Each entry in the params list includes:

KeyAlways presentDescription
nameyesParameter name as used in tool calls
descriptionyesHuman-readable description
requiredyesTrue if the parameter must be supplied
typewhen setSemantic type hint: "string", "float", "int", "bool", "path", "array[int]"
choiceswhen setList of valid string values for constrained parameters
default_valuewhen setDefault value as a string, for display purposes

Use this to enumerate valid values before calling a tool:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

desc = wbe.describe_tool('lidar_tin_gridding')
for p in desc['params']:
    name = p['name']
    choices = p.get('choices')
    default = p.get('default_value')
    if choices:
        print(f"{name}: choices={choices}, default={default!r}")
    else:
        print(f"{name}: type={p.get('type', 'any')}, default={default!r}")

Example output (selected parameters):

interpolation_parameter: choices=['elevation', 'intensity', 'class', 'return_number', 'number_of_returns', 'scan_angle', 'time', 'rgb', 'user_data'], default='elevation'
returns_included: choices=['all', 'first', 'last'], default='all'
triangulation_backend: choices=['auto', 'delaunator', 'wbtopology'], default='auto'
triangulation_thin_method: choices=['nearest_center', 'min_value', 'max_value'], default='nearest_center'
triangulation_thin_cell_size: type=float, default='0.0'

This is especially useful for building dynamic UIs, generating documentation, or writing validation helpers that do not hard-code allowed values.

Validate Tool Availability

Use hard checks like this in batch scripts so failures occur immediately, before expensive processing begins.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
available = set(wbe.list_tools())

required = {'fill_depressions', 'd8_flow_accum'}
missing = sorted(required - available)
if missing:
	raise RuntimeError(f'Missing required tools: {missing}')

Category Access Patterns

Prefer direct properties when available:

  • wbe.hydrology
  • wbe.terrain
  • wbe.raster
  • wbe.vector
  • wbe.lidar
  • wbe.remote_sensing
  • wbe.conversion
  • wbe.streams
  • wbe.precision_agriculture
  • wbe.other

Use generic accessors for dynamic workflows:

  • wbe.category(name)
  • wbe.domain(name)

Projection/georeferencing tools are cataloged under the canonical category key projection_georeferencing and can be queried with:

proj_tools = wbe.category('projection_georeferencing').list_tools()
print(proj_tools[:10])

WbEnvironment Method Reference

This reference lists callable WbEnvironment methods with brief descriptions. Special Python dunder methods are intentionally omitted.

Constructors

MethodDescription
from_floating_license_idCreate an environment using a floating license identifier and provider settings.
from_signed_entitlement_fileCreate an environment from a signed entitlement file and public-key metadata.
from_signed_entitlement_jsonCreate an environment from signed entitlement JSON and public-key metadata.

Runtime and Licensing

MethodDescription
versionReturn the Whitebox runtime version string.
license_typeReturn the active license mode (for example open, floating, or entitlement-backed).
license_infoReturn detailed license status information for diagnostics.

Discovery and Introspection

MethodDescription
list_toolsReturn all available tool IDs visible in the current environment.
categoriesReturn the list of top-level tool categories.
categoryReturn a category namespace object by name.
domain_namespacesReturn available domain namespace names.
domainReturn a domain namespace object by name.
describe_toolReturn metadata and parameter details for a specific tool ID.
search_toolsSearch tools by keyword or phrase.
list_tools_detailedReturn expanded metadata for all visible tools.

Core Data Readers

MethodDescription
read_rasterRead one raster into a Raster object.
read_vectorRead one vector dataset into a Vector object.
read_lidarRead one lidar dataset into a Lidar object.
read_rastersRead multiple rasters, optionally in parallel.
read_vectorsRead multiple vectors, optionally in parallel.
read_lidarsRead multiple lidar datasets, optionally in parallel.

Sensor Bundle Readers and Composites

MethodDescription
read_bundleAuto-detect and read a supported sensor bundle.
read_landsatRead a Landsat bundle with family-specific parsing.
read_sentinel2Read a Sentinel-2 SAFE bundle.
read_sentinel1Read a Sentinel-1 SAFE bundle.
read_planetscopeRead a PlanetScope bundle.
read_iceyeRead an ICEYE bundle.
read_dimapRead a DIMAP bundle.
read_maxar_worldviewRead a Maxar/WorldView bundle.
read_radarsat2Read a RADARSAT-2 bundle.
read_rcmRead a RADARSAT Constellation Mission (RCM) bundle.
true_colour_compositeBuild a true-colour composite raster from a bundle source.
false_colour_compositeBuild a false-colour composite raster from a bundle source.

Reprojection Helpers

MethodDescription
reproject_rasterReproject one raster with explicit resampling and grid controls.
reproject_vectorReproject one vector dataset with policy controls.
reproject_lidarReproject one lidar dataset with transform and failure controls.
reproject_rastersBatch reproject multiple rasters.
reproject_vectorsBatch reproject multiple vector datasets.
reproject_lidarsBatch reproject multiple lidar datasets.

Writers and Raster-Memory Management

MethodDescription
write_rasterWrite one raster to disk with optional format options.
write_rastersWrite multiple rasters to disk in one call.
remove_raster_from_memoryDrop a specific memory-backed raster from the environment cache.
clear_raster_memoryClear all memory-backed rasters tracked by the environment.
clear_memoryClear all memory-backed rasters, vectors, and LiDAR objects tracked by the environment.
raster_memory_countReturn the count of memory-backed rasters currently tracked.
raster_memory_bytesReturn the estimated bytes used by tracked memory-backed rasters.
write_vectorWrite one vector dataset to disk with optional format options.
write_lidarWrite one lidar dataset to disk with optional format options.
write_textWrite plain text content to a file path.

Key Environment Properties

PropertyDescription
working_directoryDefault working directory used for relative paths.
verboseControls environment/runtime status output emitted by the bindings.
max_procsMaximum process count used by eligible parallel operations.
projectionNamespace for CRS and coordinate transformation helper methods.
topologyNamespace for geometry-topology helper methods.
hydrology, terrain, raster, vector, lidar, remote_sensingPrimary category namespaces for tool discovery and execution.
precision_agriculturePro-tier precision agriculture tools (yield zoning, irrigation, crop stress, trafficability).
conversion, streams, otherAdditional category namespaces available in the environment.

Tool Execution and Progress

This chapter documents execution styles and progress handling.

Execution style affects both correctness and maintainability. Object-first calls are concise for in-memory workflows, while path-first calls can be clearer in batch pipelines and audit logs. Progress callbacks are operational tools, not just UI niceties: they support observability, timeout policies, and better failure triage in long geoprocessing runs.

Object-First Execution

Use object-first execution when your script is primarily in-memory and you want concise dataflow between steps.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
dem = wbe.read_raster('dem.tif')

filled = wbe.hydrology.depressions_storage.fill_depressions(dem)
accum = wbe.hydrology.flow_routing.d8_flow_accum(filled)
wbe.write_raster(accum, 'accum.tif')

Path-First Execution

Use path-first execution when you need explicit artifact paths for auditability, handoffs to external tools, or checkpointed batch runs.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

result = wbe.hydrology.depressions_storage.fill_depressions(
	dem='dem.tif',
	output='filled.tif',
)
print(result)

Basic Progress Callback

Use a simple callback for interactive runs where immediate feedback is enough.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

filled = wbe.hydrology.depressions_storage.fill_depressions(
	dem='dem.tif',
	output='filled.tif',
	callback=wb.print_progress,
)

Custom Progress Callback

Use a custom callback when integrating with logs, dashboards, job schedulers, or failure retry logic.

import json
import re

PERCENT_RE = re.compile(r'(-?\d+(?:\.\d+)?)\s*%')

def on_progress(event):
	payload = event
	if isinstance(event, str):
		try:
			payload = json.loads(event)
		except json.JSONDecodeError:
			payload = {'message': event}

	if isinstance(payload, dict):
		pct = payload.get('percent')
		msg = payload.get('message', '')
		if pct is None and msg:
			m = PERCENT_RE.search(str(msg))
			pct = float(m.group(1)) if m else None
		if pct is not None:
			if pct <= 1.0:
				pct *= 100.0
			print(f'[{int(max(0, min(100, pct))):3d}%] {msg}')

# Use: callback=on_progress

Treat this as the default operational template for robust scripts.

  1. Validate required tools.
  2. Run tool chain memory-first.
  3. Emit progress for long operations.
  4. Persist only key outputs.

Working with Rasters

This chapter documents practical raster workflows in WbW-Py with emphasis on inspection, iteration, modification, and persistence.

Raster workflows usually alternate between high-performance tool operations and targeted custom logic. The important concept is to choose the lowest-cost path for each step: use backend tools for heavy transformations, and reserve NumPy-level iteration for domain-specific adjustments that tools do not expose directly.

Raster Lifecycle

This lifecycle helps you separate inspection from transformation so assumptions about CRS, resolution, and nodata are explicit before heavy operations.

Typical lifecycle:

  1. Read raster.
  2. Inspect metadata.
  3. Transform values (tool-driven or array-driven).
  4. Persist outputs with explicit options when needed.
import whitebox_workflows as wb

wbe = wb.WbEnvironment()
dem = wbe.read_raster('dem.tif')
meta = dem.metadata()

print(meta.rows, meta.columns)
print('EPSG:', meta.epsg_code)
print('NoData:', meta.nodata)

Memory-Backed Rasters for Pipeline Efficiency

For workflows that chain multiple tool operations, memory-backed rasters eliminate disk I/O between steps. This is especially valuable when processing large rasters in complex pipelines. Rasters remain in process memory, accessible to subsequent tools without writing intermediate results to disk.

Load a raster into memory with file_mode="m":

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

# Read directly into memory; no disk I/O for subsequent operations
dem = wbe.read_raster('dem.tif', file_mode='m')
slope = wbe.read_raster('slope.tif', file_mode='m')

# Both rasters are now memory-backed. Chain operations without disk:
result = dem.add(slope)
print(result.file_path)  # prints: memory://raster/...

Memory-backed paths are compatible with all downstream raster operations:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
dem = wbe.read_raster('dem.tif', file_mode='m')

# Inspect metadata
meta = dem.metadata()
print(f"Rows: {meta.rows}, Cols: {meta.columns}")

# Chain tool operations
scaled = wbe.run_tool('multiply', {
    'input': dem.file_path,
    'multiplier': 1.5
})

# Export to disk when ready
wbe.write_raster(scaled, 'dem_scaled_1p5x.tif')

Memory Lifecycle and Cleanup

Memory-backed rasters persist in the process store until explicitly removed or cleared. For long-running jobs, manage memory explicitly to avoid accumulation:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

# Check current memory usage
count_before = wbe.raster_memory_count()
bytes_before = wbe.raster_memory_bytes()
print(f"Rasters in memory: {count_before}")
print(f"Bytes used: {bytes_before}")

# Read two rasters
dem1 = wbe.read_raster('large_dem1.tif', file_mode='m')
dem2 = wbe.read_raster('large_dem2.tif', file_mode='m')

print(f"After reads: {wbe.raster_memory_count()}")

# Remove one raster when done
wbe.remove_raster_from_memory(dem1)
print(f"After remove: {wbe.raster_memory_count()}")

# Or clear all rasters at once
wbe.clear_raster_memory()
print(f"After clear: {wbe.raster_memory_count()}")

Best practices:

  • Use file_mode='m' for intermediate results in tool chains.
  • Export memory-backed rasters to disk with write_raster() when persisting results.
  • Call remove_raster_from_memory() after a raster is no longer needed.
  • Use clear_raster_memory() between independent job phases.
  • Use clear_memory() when resetting all in-process raster/vector/lidar stores together.
  • Monitor raster_memory_count() and raster_memory_bytes() in large pipelines.

Iterating Through Grid Cells

Use this pattern only when tool methods or vectorized operations cannot express your custom rule directly.

For cell-level logic, convert to NumPy and iterate safely.

import numpy as np
import whitebox_workflows as wb

wbe = wb.WbEnvironment()
r = wbe.read_raster('dem.tif')
a = r.to_numpy(all_bands=False, dtype='float64')
meta = r.metadata()

rows, cols = a.shape
for row in range(rows):
    for col in range(cols):
        z = a[row, col]
        if np.isfinite(z) and z != meta.nodata:
            # Example transform: clamp negatives.
            if z < 0:
                a[row, col] = 0.0

Fast Random Cell Access with Pinning

For custom neighbourhood logic, flow-path traversal, or other pseudo-random cell access, use pinned raster views. Pinning avoids repeated per-access lookup and lock-routing overhead by holding a direct in-memory view during the loop.

Single-raster pattern:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
pointer = wbe.read_raster("D8Pointer.tif")

with pointer.pin() as p:
    v = p[100, 200]
    p[100, 200] = v + 1.0

Multi-raster scan-loop pattern (read one raster, write another):

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
src = wbe.read_raster("D8Pointer.tif")
dst = wb.Raster.new_from_other(src, data_type="int32")

with wb.pin_rasters(src, dst) as (srcp, dstp):
    meta = src.metadata()
    for row in range(meta.rows):
        for col in range(meta.columns):
            value = srcp[row, col]
            # Replace this with your custom per-cell rule.
            dstp[row, col] = value

Notes:

  • Use pinning for loops dominated by scalar r[row, col] accesses.
  • with scope exit flushes any pinned writes safely.
  • For pure row-wise transforms, get_row_data/set_row_data is still efficient.

Writing Modified Data Back

This example shows the common pattern of deriving a new raster while preserving georeferencing context from a base raster.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
base = wbe.read_raster('dem.tif')
a = base.to_numpy(all_bands=False)
a = a * 1.05

out = wb.Raster.from_numpy(a, base, output_path='dem_scaled.tif')
wbe.write_raster(out, 'dem_scaled_cog.tif', options={
    'compress': True,
    'strict_format_options': True,
    'geotiff': {'layout': 'cog', 'tile_size': 512},
})

Supported raster write keys and valid values are documented in Output Controls.

Multi-Band Iteration

Use this structure when per-band logic differs or when your transform depends on band-specific rules.

import numpy as np
import whitebox_workflows as wb

wbe = wb.WbEnvironment()
rgb = wbe.read_raster('multiband.tif')
arr = rgb.to_numpy(all_bands=True, dtype='float32')
# arr shape is typically (bands, rows, cols)

bands, rows, cols = arr.shape
for b in range(bands):
    for row in range(rows):
        for col in range(cols):
            v = arr[b, row, col]
            if np.isfinite(v):
                arr[b, row, col] = max(v, 0.0)

wb.Raster.from_numpy(arr, rgb, output_path='multiband_clamped.tif')

Tool-First Raster Processing

This is the preferred pattern for scale: run optimized tools first, then apply targeted custom fixes only where necessary.

Use tools for most heavy computation, then iterate only where custom logic is needed.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
dem = wbe.read_raster('dem.tif')
filled = wbe.hydrology.depressions_storage.fill_depressions(dem)
slope = wbe.terrain.derivatives.slope(filled)

# Optional post-processing pass in NumPy.
a = slope.to_numpy(all_bands=False)
a[a < 0] = 0
slope_fixed = wb.Raster.from_numpy(a, slope)

wbe.write_raster(slope_fixed, 'slope_fixed.tif')

NoData and Performance Guidance

  • Always check metadata().nodata when doing per-cell iteration.
  • Prefer vectorized NumPy operations over nested Python loops when possible.
  • Use tool methods (wbe.hydrology.*, wbe.raster.*, wbe.terrain.*) for large transforms.
  • Persist only final outputs where possible to keep memory-first workflows efficient.

Raster Object Method Reference

The tables below focus on callable Raster methods. Common simple properties such as file_path, file_name, active_band, and band_count are omitted from the tables to keep the reference readable.

Construction, Conversion, and Summary

MethodDescription
from_numpyCreate a raster from a NumPy array while inheriting grid geometry from a base raster.
bandReturn a band-specific raster view for multiband data.
to_numpyExport the active band or all bands to NumPy for custom numeric work.
deep_copyWrite a full raster copy to a derived or explicit output path.
new_from_other, new_from_other_with_dataCreate a new raster that inherits geometry from another raster, optionally with a new data buffer.
metadataReturn the RasterConfigs summary for rows, columns, resolution, bounds, and nodata.
calculate_clip_valuesCalculate percentile-based lower and upper clip values.
calculate_mean, calculate_mean_and_stdevCompute simple raster summary statistics.
normalizeProduce a normalized raster suitable for display or downstream scaling steps.
reinitialize_valuesReset all cells to a single constant value.
update_min_maxRecompute cached minimum and maximum values after edits.
num_cells, num_valid_cellsReport total cell count and valid-cell count.
size_of, get_data_size_in_bytesReport approximate in-memory or backing-data size.
is_memory_backedIndicate whether the raster is currently memory-backed.
get_short_filename, get_file_extensionReturn convenience filename information for reporting or output naming.

Grid Navigation and Direct Editing

MethodDescription
get_value, set_valueRead or write an individual cell value.
pinReturn a pinned raster view for low-overhead random read/write access in a with scope.
get_row_data, set_row_dataRead or replace an entire row of raster values.
increment, decrementAdd to or subtract from a single cell value.
increment_row_data, decrement_row_dataAdd to or subtract from every value in a row.
is_cell_nodataCheck whether a specific cell is nodata.
get_row_from_y, get_y_from_rowConvert between world Y coordinates and raster row indices.
get_column_from_x, get_x_from_columnConvert between world X coordinates and raster column indices.

CRS and Reprojection

MethodDescription
crs_wkt, crs_epsgInspect the raster CRS as WKT text or inferred/declared EPSG code.
set_crs_wkt, set_crs_epsgAssign CRS metadata without moving the raster grid.
clear_crsRemove CRS metadata when it is known to be wrong or must be re-assigned.
reprojectReproject the raster with explicit control over resampling, extent, resolution, and grid policies.
reproject_nearest, reproject_bilinearReproject with a fixed resampling method for common nearest-neighbour or bilinear cases.
reproject_to_match_gridReproject onto the exact grid geometry of a target raster.
reproject_to_match_resolutionReproject while matching the cell resolution of a reference raster.
reproject_to_match_resolution_in_epsgReproject to a target EPSG while borrowing cell resolution from a reference raster.

Unary Math and Numeric Transforms

Unary transform calls return a single Raster object. Band selection (band_mode='all'|'active'|'list' and bands=[...]) controls which bands are changed within that returned raster.

MethodDescription
absAbsolute value transform.
acos, arccosInverse cosine transform.
acoshInverse hyperbolic cosine transform.
asin, arcsinInverse sine transform.
asinhInverse hyperbolic sine transform.
atan, arctanInverse tangent transform.
atanhInverse hyperbolic tangent transform.
cbrtCube-root transform.
ceil, floor, round, truncStandard rounding-family transforms.
clampLimit values to a minimum and maximum range.
cos, cosh, sin, sinh, tan, tanhTrigonometric and hyperbolic transforms.
degrees, to_degrees, radians, to_radiansConvert angular units between radians and degrees.
exp, exp2, expm1Exponential transforms.
ln, log10, log1p, log2Natural-log and common logarithmic transforms.
neg, signum, sqrt, square, recipNegation, sign extraction, square root, squaring, and reciprocal transforms.
is_finite, is_infinite, is_nan, is_nodataBuild masks from numeric validity tests.
logical_not, logical_not_in_place, not_Logical-not style mask inversion.

Binary Arithmetic and Comparisons

MethodDescription
add, add_in_placeAdd another raster or scalar, either to a new raster or in place.
sub, subtract, sub_in_placeSubtract another raster or scalar.
mul, multiply, mul_in_placeMultiply by another raster or scalar.
div, divide, div_in_placeDivide by another raster or scalar.
pow, power, pow_in_placeRaise values to a raster/scalar power.
atan2Compute two-argument arctangent from paired raster/scalar inputs.
min, maxCompute cellwise minima or maxima.
eq, eq_in_placeEquality comparison.
ne, ne_in_placeInequality comparison.
gt, gt_in_placeGreater-than comparison.
ge, ge_in_placeGreater-than-or-equal comparison.
lt, lt_in_placeLess-than comparison.
le, le_in_placeLess-than-or-equal comparison.

Logical Combination

MethodDescription
and_Bitwise-style cellwise AND combination.
or_Bitwise-style cellwise OR combination.
xor_Bitwise-style cellwise XOR combination.
logical_and, logical_and_in_placeLogical AND for boolean or mask rasters.
logical_or, logical_or_in_placeLogical OR for boolean or mask rasters.
logical_xor, logical_xor_in_placeLogical XOR for boolean or mask rasters.

Working with Vectors

This chapter covers schema inspection, feature iteration, attribute reads/writes, and persistence workflows.

Vector processing is often more about data contracts than geometry mechanics. Schemas, field types, and attribute consistency determine whether downstream analysis remains trustworthy. The patterns below emphasize validating structure first, then applying deterministic edits, then persisting to stable interchange formats for downstream tools.

See Also: Online Sources

If you need to acquire vectors directly from web providers (starting with OSM Overpass), see the dedicated chapter:

Read and Inspect

This step establishes the schema contract your downstream edits depend on.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
roads = wbe.read_vector('roads.gpkg')

schema = roads.schema()
print(schema)
print('features:', roads.feature_count())

Memory-Backed Vectors for Pipeline Efficiency

For workflows that chain multiple vector operations, memory-backed vectors eliminate disk I/O between steps. This is valuable for complex pipelines where intermediate results are passed between spatial operations.

Load a vector into memory with file_mode='m':

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

# Read directly into memory
roads = wbe.read_vector('roads.gpkg', file_mode='m')
rivers = wbe.read_vector('rivers.gpkg', file_mode='m')

print(roads.file_path)  # prints: memory://vector/...

Memory-backed vectors are compatible with all downstream operations:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
v = wbe.read_vector('polygons.gpkg', file_mode='m')

# Inspect schema and metadata
schema = v.schema()
meta = v.metadata()

# Pass to spatial tools
centroids = wbe.vector.geometry_processing.centroid_vector(v)

# Export to disk when ready
wbe.write_vector(centroids, 'centroids_final.gpkg')

Vector Memory Lifecycle

Memory-backed vectors persist until explicitly removed or cleared. For long-running vector pipelines, manage memory explicitly:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

# Check current memory
print(f"Vectors in memory: {wbe.vector_memory_count()}")

# Read vectors
v1 = wbe.read_vector('large1.gpkg', file_mode='m')
v2 = wbe.read_vector('large2.gpkg', file_mode='m')

print(f"After reads: {wbe.vector_memory_count()}")

# Remove when done
wbe.remove_vector_from_memory(v1)
print(f"After remove: {wbe.vector_memory_count()}")

# Or clear all
wbe.clear_vector_memory()
print(f"After clear: {wbe.vector_memory_count()}")

Implicit Memory Output from Tools

All vector-output tools store their result in memory automatically when the output parameter is omitted. You do not need to pass file_mode='m' or choose a temporary path — simply leave output out and the returned Vector object is already memory-backed:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
roads = wbe.read_vector('roads.gpkg')

# No output path — result is stored in memory automatically
centroids = wbe.vector.geometry_processing.centroid_vector(roads)
print(centroids.file_path)  # prints: memory://vector/...

# Chain operations without any intermediate files
clipped = wbe.vector.overlay_analysis.clip(centroids, 'boundary.gpkg')
print(clipped.file_path)  # also memory://vector/...

# Persist the final result only
wbe.write_vector(clipped, 'result.gpkg')

This applies to all tool categories — GIS, hydrology, geomorphometry, and stream network tools all follow the same rule. Providing an explicit output path writes to disk as before.

Best practices:

  • Use file_mode='m' for intermediate spatial analysis results.
  • Export memory-backed vectors to disk with write_vector() when persisting final outputs.
  • Call remove_vector_from_memory() after a vector is no longer needed.
  • Use clear_vector_memory() between independent analysis phases.
  • Use clear_memory() when resetting all in-process raster/vector/lidar stores together.

Iterate Through Features

Use feature iteration for inspections, QA checks, or bespoke attribute rules.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
v = wbe.read_vector('roads.gpkg')

n = v.feature_count()
for i in range(n):
    attrs = v.attributes(i)
    # attrs is dict-like; process values
    print(i, attrs)

Read and Update Attribute Table

This example demonstrates single-field updates, grouped updates, and schema extension in one controlled sequence.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
v = wbe.read_vector('roads.gpkg')

# Read one field value
name0 = v.attribute(0, 'name')
print('name[0]=', name0)

# Update one field
v.update_attribute(0, 'name', 'Main Street')

# Update multiple fields
v.update_attributes(1, {'speed': 50, 'class': 'collector'})

# Add a new field
v.add_field('reviewed', field_type='bool', default_value=False)

Persist Vector Outputs

This pattern shows both default extension behavior and explicit format control for reproducibility.

For complete write-option keys and allowed values, see Output Controls.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
roads = wbe.read_vector('roads.gpkg')
centroids = wbe.vector.geometry_processing.centroid_vector(roads)

# Extensionless output defaults to GeoPackage
wbe.write_vector(centroids, 'roads_centroids')

# Explicit output format
wbe.write_vector(buffered, 'roads_buffer.parquet', options={
    'strict_format_options': True,
    'geoparquet': {'compression': 'zstd'},
})

Practical Notes

  • Use schema() first to validate field names and types.
  • Prefer update_attributes() for grouped edits to a feature.
  • Re-read and validate after major writes, especially when switching formats.

Vector Object Method Reference

Common simple properties such as file_path and file_name are omitted here so the tables stay focused on callable Vector methods.

Schema and Attribute Access

MethodDescription
schemaReturn the vector schema, including field structure and geometry information.
feature_countReport how many features are present.
attribute_fields, attribute_field_namesInspect available attribute fields by full definition or by field name list.
attributeRead a single field value from one feature.
attributesRead all attribute values for one feature as a grouped record.
add_fieldAdd a new attribute field to the dataset schema.
update_attributeUpdate one field in one feature.
update_attributesUpdate multiple fields in one feature at once.

File, Metadata, and Copying

MethodDescription
metadataReturn VectorMetadata describing file state, CRS, and feature count.
absolute_pathResolve the vector to an absolute file path string.
parent_directoryReturn the containing directory path.
existsCheck whether the backing dataset exists on disk.
get_short_filename, get_file_extensionReturn convenience filename information.
get_file_size_in_bytes, get_last_modified_unix_secondsInspect filesystem metadata for reporting or audit logs.
deep_copyWrite a copied vector dataset to a derived or explicit output path.

CRS and Geometry-Safe Persistence

MethodDescription
crs_wkt, crs_epsgInspect CRS metadata as WKT text or EPSG code.
set_crs_wkt, set_crs_epsgAssign CRS metadata without moving feature coordinates.
clear_crsRemove CRS metadata so it can be assigned again explicitly.
reprojectReproject the vector dataset with explicit failure, topology, and antimeridian policies.

Working with Lidar

This chapter documents lidar workflows in WbW-Py, including file-backed processing, metadata checks, and output controls.

Lidar datasets are typically large enough that memory strategy becomes a primary design concern. The core pattern in this chapter is to combine file-backed objects, vectorized column operations, and chunked processing so scripts scale from small validation tiles to production-scale point clouds without rewriting your workflow model.

Baseline Workflow

This baseline confirms read, inspect, transform, and write in a single lidar path before introducing chunking complexity.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
las = wbe.read_lidar('survey.las')
meta = las.metadata()

print(meta.file_path)
print('epsg:', meta.crs_epsg)

normals = wbe.lidar.calculate_point_normals(las)
wbe.write_lidar(normals, 'survey_normals.copc.laz')

Memory-Backed Lidar for Pipeline Efficiency

For workflows that chain multiple lidar operations, memory-backed lidar objects eliminate disk I/O between steps. This is valuable when processing large point clouds through sequential filtering or classification steps.

Load a point cloud into memory with file_mode='m':

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

# Read directly into memory
survey = wbe.read_lidar('survey.las', file_mode='m')
print(survey.file_path)  # prints: memory://lidar/...

Memory-backed lidar objects support the full NumPy API and all downstream operations:

import whitebox_workflows as wb
import numpy as np

wbe = wb.WbEnvironment()

# Read into memory
survey = wbe.read_lidar('survey.las', file_mode='m')

# Inspect and process
meta = survey.metadata()
print(f'Points: {meta.point_count}')

# Extract and edit points
arr = survey.to_numpy(cols=['x', 'y', 'z', 'classification'])
high_mask = arr[:, 2] > 250
arr[high_mask, 3] = 6

# Write edits back to disk
edited = wb.Lidar.from_numpy(
    arr,
    base=survey,
    cols=['x', 'y', 'z', 'classification'],
    output_path='survey_filtered.laz',
)

Lidar Memory Lifecycle

Memory-backed lidar objects persist until explicitly removed or cleared. For long-running lidar pipelines, manage memory explicitly:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

# Check current memory
print(f"Lidar objects in memory: {wbe.lidar_memory_count()}")

# Read point clouds
survey1 = wbe.read_lidar('large1.las', file_mode='m')
survey2 = wbe.read_lidar('large2.las', file_mode='m')

print(f"After reads: {wbe.lidar_memory_count()}")

# Remove when done
wbe.remove_lidar_from_memory(survey1)
print(f"After remove: {wbe.lidar_memory_count()}")

# Or clear all
wbe.clear_lidar_memory()
print(f"After clear: {wbe.lidar_memory_count()}")

Implicit Memory Output from Tools

All lidar-output tools store their result in memory automatically when the output parameter is omitted. You do not need to pass file_mode='m' or choose a temporary path — simply leave output out and the returned Lidar object is already memory-backed:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
survey = wbe.read_lidar('survey.laz')

# No output path — result is stored in memory automatically
filtered = wbe.lidar.filtering_classification.filter_lidar_classes(survey, excluded_classes=[7])
print(filtered.file_path)  # prints: memory://lidar/...

# Chain operations without any intermediate files
thinned = wbe.lidar.interpolation_gridding.lidar_thin(filtered, resolution=0.5)
print(thinned.file_path)  # also memory://lidar/...

# Persist the final result only
wbe.write_lidar(thinned, 'survey_clean.copc.laz')

This applies to all lidar tools. Providing an explicit output path writes to disk as before.

Best practices:

  • Use file_mode='m' for intermediate point cloud processing.
  • Export memory-backed lidar to disk with write_lidar() when persisting final outputs.
  • Call remove_lidar_from_memory() after a point cloud is no longer needed.
  • Use clear_lidar_memory() between independent analysis phases.
  • Use clear_memory() when resetting all in-process raster/vector/lidar stores together.
  • Monitor lidar_memory_count() for large processing jobs.

Iterating Through Lidar Points

Stable WbW-Py lidar objects are file-backed and tool-oriented, with explicit columnar point access through NumPy. Direct point-by-point Python iterators are still not the primary API path.

Recommended point-level workflow:

  1. Use Lidar.to_numpy() with selected point fields.
  2. Perform vectorized filtering/classification edits in NumPy.
  3. Write updates with Lidar.from_numpy(...).

The example below reclassifies points using a simple mask to illustrate the pattern without domain-specific classification logic.

import whitebox_workflows as wb
import numpy as np

wbe = wb.WbEnvironment()
las = wbe.read_lidar('survey.las')

arr = las.to_numpy(cols=['x', 'y', 'z', 'classification'])
ground_mask = arr[:, 3] == 2
arr[ground_mask, 3] = 6

edited = wb.Lidar.from_numpy(
    arr,
    base=las,
    cols=['x', 'y', 'z', 'classification'],
    output_path='survey_reclassified.laz',
)

print('points:', edited.point_count)

Output Controls

Use these options when you need to tune compression and structure for storage, cloud access, or downstream compatibility.

For complete lidar write-option keys and allowed values, see Output Controls.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
las = wbe.read_lidar('survey.las')

wbe.write_lidar(las, 'survey_out.laz', options={
    'laz': {
        'chunk_size': 25000,
        'compression_level': 7,
    },
})

wbe.write_lidar(las, 'survey_out.copc.laz', options={
    'copc': {
        'max_points_per_node': 75000,
        'max_depth': 8,
        'node_point_ordering': 'hilbert',
    },
})

Chunked Numpy Streaming

For very large point clouds, use chunked column streaming to avoid holding a full point matrix in memory.

Recommended chunked workflow:

  1. Read chunks with Lidar.to_numpy_chunks(...).
  2. Apply vectorized edits per chunk.
  3. Write edited chunks with Lidar.from_numpy_chunks(...).

Use this pattern whenever full-matrix reads risk exhausting memory.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
lidar = wbe.read_lidar('survey.las')

cols = ['x', 'y', 'z', 'classification']
chunks = lidar.to_numpy_chunks(chunk_size=200_000, cols=cols)

for chunk in chunks:
    # Reclassify non-ground points above a simple elevation threshold.
    mask = chunk[:, 2] > 250.0
    chunk[mask, 3] = 6

edited = wb.Lidar.from_numpy_chunks(
    chunks,
    base=lidar,
    cols=cols,
    output_path='survey_chunked_reclassified.laz',
)

print('points:', edited.point_count)

For callback-driven processing, pass callback= to to_numpy_chunks(...) to process each chunk as it is decoded. In callback mode, no list is returned.

Notes:

  • LAS/LAZ chunk outputs use the shared core streaming rewrite path.
  • Other output formats currently use the existing non-streaming write path.

Best Practices

  • Validate CRS before and after transformations.
  • Prefer COPC output for large datasets and cloud streaming scenarios.
  • Keep raw source lidar immutable; write derived products to new files.

Lidar Object Method Reference

Common simple properties such as file_path, file_name, and point_count are omitted here so the tables stay focused on callable Lidar methods and the most important array-processing workflows.

Array and Chunk Workflows

MethodDescription
to_numpyExport selected point fields as a 2D NumPy array.
to_numpy_chunksStream selected point fields as chunked NumPy arrays for large clouds.
from_numpyBuild a new lidar file by applying a full 2D NumPy array back onto a base cloud.
from_numpy_chunksBuild a new lidar file from an iterable of chunked NumPy arrays.

File, Metadata, and Copying

MethodDescription
metadataReturn LidarMetadata for path, file state, and CRS information.
absolute_pathResolve the cloud to an absolute file path string.
parent_directoryReturn the containing directory path.
existsCheck whether the backing lidar file exists.
get_short_filename, get_file_extensionReturn convenience filename information.
get_file_size_in_bytes, get_last_modified_unix_secondsInspect filesystem metadata for reporting or audit logs.
deep_copyWrite a copied lidar dataset to a new path.
copy_to_pathCopy the lidar dataset to an explicit destination.
write_to_pathPersist the lidar dataset with optional format-specific write options.

CRS and Coordinate Management

MethodDescription
crs_wkt, crs_epsgInspect CRS metadata as WKT text or EPSG code.
set_crs_wkt, set_crs_epsgAssign CRS metadata without moving point coordinates.
clear_crsRemove CRS metadata when it is wrong or unknown.
reprojectReproject the point cloud with explicit 3D-transform and failure-policy controls.

Working with Sensor Bundles

This chapter documents supported sensor bundle families and common workflows.

Bundle APIs abstract away family-specific packaging details so you can focus on analysis intent: identify useful channels, inspect quality assets, and produce consistent derived outputs. The conceptual goal is to normalize early-stage ingestion across heterogeneous providers while preserving enough provenance to keep downstream interpretation defensible.

Supported Families

  • Generic auto-detect: read_bundle(...)
  • Landsat: read_landsat(...)
  • Sentinel-1 SAFE: read_sentinel1(...)
  • Sentinel-2 SAFE: read_sentinel2(...)
  • PlanetScope: read_planetscope(...)
  • ICEYE: read_iceye(...)
  • DIMAP: read_dimap(...)
  • Maxar/WorldView: read_maxar_worldview(...)
  • RADARSAT-2: read_radarsat2(...)
  • RCM: read_rcm(...)

Common Inspection Pattern

Start with this inspection sequence to quickly understand what content is available before choosing analysis-specific channels.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
b = wbe.read_bundle('BUNDLE_ROOT')

print('family:', b.family)
print('bands:', b.list_band_keys())
print('measurements:', b.list_measurement_keys())
print('qa:', b.list_qa_keys())
print('assets:', b.list_asset_keys())

Family-Specific Examples

These examples show the same workflow shape across different providers: load, inspect keys, read representative channels, then generate a preview or derived artifact.

Sentinel-2

Use this when building optical workflows from known band identifiers.

s2 = wbe.read_sentinel2('S2_SCENE.SAFE')
red = s2.read_band('B04')
nir = s2.read_band('B08')
rgb = s2.true_colour_composite(wbe, output_path='s2_true_colour.tif')

Landsat

Use this when scene metadata and quality indicators are part of ingestion logic.

ls = wbe.read_landsat('LC09_SCENE')
print(ls.processing_level(), ls.cloud_cover_percent())
preview = ls.read_band(ls.list_band_keys()[0])

Sentinel-1 / SAR Families

Use this for SAR measurement inspection and quick visual QA composites.

s1 = wbe.read_sentinel1('S1_SCENE.SAFE')
print(s1.polarizations())
meas = s1.read_measurement(s1.list_measurement_keys()[0])
false_colour = s1.false_colour_composite(wbe, output_path='s1_false_colour.tif')

PlanetScope / ICEYE / DIMAP / Maxar-WorldView / RADARSAT-2 / RCM

This loop pattern is useful for multi-provider pipelines that need consistent intake checks.

for loader, path in [
    (wbe.read_planetscope, 'PLANETSCOPE_SCENE'),
    (wbe.read_iceye, 'ICEYE_SCENE'),
    (wbe.read_dimap, 'DIMAP_SCENE'),
    (wbe.read_maxar_worldview, 'MAXAR_SCENE'),
    (wbe.read_radarsat2, 'RADARSAT2_SCENE'),
    (wbe.read_rcm, 'RCM_SCENE'),
]:
    bundle = loader(path)
    print(bundle.family, bundle.bundle_root)
  1. Open with family-specific reader when known.
  2. Inspect key sets (list_*_keys).
  3. Read representative channels.
  4. Build previews/composites for QA.
  5. Persist derived rasters and document source family + key choices.

Calibration Contract Notes (Landsat/Sentinel-2)

Remote-sensing radiometric and thermal tools in WbW-Python now rely on a standardized sensor-bundle calibration contract in wbraster.

  • Sentinel-2 bundle metadata provides quantification values used for DN to TOA reflectance scaling.
  • Landsat bundles provide typed per-band reflectance and thermal constants used in TOA reflectance and LST workflows.

For best results, keep original SAFE/MTL metadata with scene rasters when moving or staging bundle data for analysis.

Sensor Bundle Object Method Reference

Common bundle properties such as family and bundle_root are omitted here so the tables stay focused on callable Bundle methods.

Key Discovery and Data Access

MethodDescription
list_band_keysList spectral or image-band keys exposed by the bundle.
list_measurement_keysList measurement-layer keys, commonly used in SAR families.
list_qa_keysList quality-assurance layer keys.
list_asset_keysList auxiliary assets packaged with the bundle.
list_aux_keysList auxiliary layers that are not primary bands or measurements.
read_bandRead a band by key as a raster object.
read_measurementRead a measurement layer by key as a raster object.
read_qa_layerRead a QA layer by key as a raster object.
read_assetRead a named asset from the bundle.
read_aux_layerRead an auxiliary layer by key.

Scene and Platform Metadata

MethodDescription
metadata_jsonReturn the bundle metadata payload as JSON text.
mission, product_type, processing_level, processing_baselineInspect the provider, product family, and processing lineage.
scene_id, tile_id, collection_number, path_rowInspect scene identifiers used for cataloging and lineage.
acquisition_datetime_utc, acquisition_modeInspect when and how the scene was acquired.
cloud_cover_percentReport cloud cover when the bundle family exposes it.
polarization, polarizationsInspect single-polarization or multi-polarization SAR metadata.
look_direction, orbit_directionInspect platform look and orbit geometry.
incidence_angle_near_deg, incidence_angle_far_deg, off_nadir_angle_deg, view_angle_degInspect sensor/view geometry angles.
sun_azimuth_deg, sun_elevation_deg, sun_zenith_degInspect solar geometry for optical scenes.
pixel_spacing_range_m, pixel_spacing_azimuth_mInspect SAR-oriented pixel spacing metadata.
spatial_boundsReturn spatial footprint bounds for the scene.

Derived Preview Products

MethodDescription
true_colour_compositeBuild a true-colour preview raster when the bundle family supports it.
false_colour_compositeBuild a false-colour preview raster for quick QA or analysis setup.

Output Controls

This chapter documents output controls for raster, vector, and lidar writes in WbW-Py.

Output settings are where reproducibility becomes explicit. Defaults are useful for fast iteration, but production workflows should pin format, compression, and layout choices so outputs remain comparable across runs and environments. Treat this chapter as the policy layer for how artifacts are written, named, and validated.

General Principles

  • Start with default output behavior unless you need strict reproducibility.
  • Use strict_format_options=True when you want invalid option/format combinations to fail instead of silently ignoring options.
  • Prefer extensionless outputs for sensible defaults when prototyping.

Raster Output Controls

write_raster(...) and write_rasters(...) support an options dictionary.

Use this when output layout and compression are part of a reproducibility or distribution requirement.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
r = wbe.read_raster('dem.tif')

wbe.write_raster(r, 'out_default.tif')

wbe.write_raster(
    r,
    'out_cog.tif',
    options={
        'compress': True,
        'strict_format_options': True,
        'geotiff': {
            'layout': 'cog',
            'tile_size': 512,
            'compression': 'deflate',
            'bigtiff': False,
        },
    },
)

Raster Write Option Reference

The raster options dictionary currently supports these top-level keys:

  • compress: boolean convenience switch for GeoTIFF-family outputs.
  • strict_format_options: boolean validation switch.
  • geotiff: nested dictionary for GeoTIFF / BigTIFF / COG-specific controls.
  • jpeg2000: nested dictionary for JPEG2000 (.jp2) controls.

compress

compress is a convenience flag. It applies only to GeoTIFF-family outputs.

  • True: maps to GeoTIFF deflate compression.
  • False: maps to uncompressed GeoTIFF output.

Default behavior:

  • compress is unset by default (not True and not False).
  • For GeoTIFF-family outputs, an unset compress falls through to the GeoTIFF writer default compression, which is deflate.
  • For non-GeoTIFF outputs, compress has no effect.

If geotiff.compression is also supplied, the explicit geotiff.compression value takes precedence over compress.

strict_format_options

strict_format_options accepts True or False.

  • False (default): GeoTIFF-specific options are ignored when writing a non-GeoTIFF output such as .jp2, .gpkg, or .zarr; JPEG2000-specific options are ignored when writing non-.jp2 outputs.
  • True: using GeoTIFF-specific options with a non-GeoTIFF output path, or JPEG2000-specific options with a non-.jp2 output path, raises an error.

jpeg2000

The jpeg2000 dictionary supports these keys:

  • compression: string compression mode.
  • quality_db: numeric quality target in dB used when compression='lossy'.
  • decomp_levels: non-negative integer decode-resolution level hint (0 to 255).
  • color_space: string color-space override.

Default values when keys are omitted:

  • compression: lossy.
  • quality_db: 35.0 when compression='lossy' and no explicit quality_db is provided.
  • decomp_levels: writer default (unset).
  • color_space: writer default (unset); inferred from band layout when possible.

Supported jpeg2000.compression values:

  • lossless
  • lossy

Supported jpeg2000.color_space values:

  • greyscale (aliases: grayscale, gray, grey)
  • srgb (alias: rgb)
  • ycbcr (alias: ycb)
  • multiband (aliases: multi_band, multi)

Notes:

  • quality_db is optional; when omitted with compression='lossy', the writer default quality is used.

geotiff

The geotiff dictionary supports these keys:

  • compression: string compression codec.
  • bigtiff: boolean.
  • layout: string layout name.
  • rows_per_strip: positive integer used when layout='stripped'.
  • tile_width: positive integer used when layout='tiled'.
  • tile_height: positive integer used when layout='tiled'.
  • tile_size: positive integer shortcut used by layout='cog', and also accepted as a shortcut for both tile width and tile height when layout='tiled'.
  • cog_tile_size: positive integer alias for tile_size when layout='cog'.

Default values when keys are omitted:

  • compression: deflate.
  • bigtiff: False.
  • layout: standard.
  • rows_per_strip: 1 when layout='stripped'.
  • tile_width: 512 when layout='tiled'.
  • tile_height: defaults to tile_width when layout='tiled'.
  • tile_size / cog_tile_size: 512 when layout='cog'.

Supported geotiff.compression values:

  • none
  • off
  • uncompressed
  • deflate
  • zip
  • lzw
  • packbits
  • pack_bits
  • jpeg
  • webp
  • web_p
  • jpegxl
  • jpeg_xl
  • jxl

These names are aliases for the same underlying codecs:

  • none, off, and uncompressed
  • deflate and zip
  • packbits and pack_bits
  • webp and web_p
  • jpegxl, jpeg_xl, and jxl

Supported geotiff.layout values:

  • standard: default GeoTIFF writer behavior.
  • stripped: strip-organized GeoTIFF.
  • striped: alias for stripped.
  • tiled: tiled GeoTIFF.
  • cog: Cloud-Optimized GeoTIFF.

Layout-specific parameter behavior:

  • layout='standard': ignores strip/tile size keys.
  • layout='stripped' or layout='striped': uses rows_per_strip. Default is 1 if omitted.
  • layout='tiled': uses tile_width and tile_height. If omitted, tile_width defaults to 512. tile_height defaults to tile_width. If tile_size is supplied, it is accepted as a shortcut for both dimensions.
  • layout='cog': uses tile_size or cog_tile_size. Default is 512 if neither is supplied.

Extensionless Raster Outputs

If you omit the output extension, raster writes default to .tif. In that case the wrapper also defaults the GeoTIFF layout to COG unless you explicitly set a different layout.

# Writes my_surface.tif and defaults to COG-style layout.
wbe.write_raster(r, 'my_surface')

Practical Patterns

# Explicit uncompressed standard GeoTIFF
wbe.write_raster(r, 'out_standard_uncompressed.tif', options={
        'compress': False,
        'geotiff': {'layout': 'standard'},
})

# Stripped GeoTIFF with 64 rows per strip
wbe.write_raster(r, 'out_stripped.tif', options={
        'geotiff': {'layout': 'stripped', 'rows_per_strip': 64},
})

# Tiled GeoTIFF using a single tile_size shortcut
wbe.write_raster(r, 'out_tiled.tif', options={
        'geotiff': {'layout': 'tiled', 'tile_size': 256},
})

# COG with explicit codec and BigTIFF toggle
wbe.write_raster(r, 'out_cog.tif', options={
        'strict_format_options': True,
        'geotiff': {
                'layout': 'cog',
                'tile_size': 512,
                'compression': 'deflate',
                'bigtiff': False,
        },
})

Common Raster Profiles

These profiles illustrate common tradeoffs between compatibility and read performance.

# Stripped GeoTIFF
wbe.write_raster(r, 'out_stripped.tif', options={
    'geotiff': {'layout': 'stripped', 'rows_per_strip': 32},
})

# Tiled GeoTIFF
wbe.write_raster(r, 'out_tiled.tif', options={
    'geotiff': {'layout': 'tiled', 'tile_width': 256, 'tile_height': 256},
})

Vector Output Controls

write_vector(...) supports format-specific options.

Use strict format options in production so incompatible settings fail fast.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
v = wbe.read_vector('roads.gpkg')

# Extensionless output defaults to GeoPackage
wbe.write_vector(v, 'roads_copy')

# GeoParquet with strict options
wbe.write_vector(
    v,
    'roads.parquet',
    options={
        'strict_format_options': True,
        'geoparquet': {
            'compression': 'zstd',
            'max_rows_per_group': 250000,
            'write_batch_size': 8192,
        },
    },
)

Vector Write Option Reference

For vector writes, the options dictionary supports these keys:

  • strict_format_options: boolean validation switch.
  • geoparquet: nested dictionary used for GeoParquet-specific controls.

strict_format_options

  • False (default): format-specific options are ignored when they do not apply to the selected output format.
  • True: format-specific options on an incompatible output path raise an error.

geoparquet

The geoparquet dictionary supports:

  • max_rows_per_group: positive integer.
  • data_page_size_limit: positive integer.
  • write_batch_size: positive integer.
  • data_page_row_count_limit: positive integer.
  • compression: string codec name.

Default values when keys are omitted:

  • max_rows_per_group: 1_048_576.
  • data_page_size_limit: Parquet library default page size.
  • write_batch_size: Parquet library default write batch size.
  • data_page_row_count_limit: Parquet library default row-count limit.
  • compression: Parquet library default compression codec.

Supported geoparquet.compression values:

  • none
  • snappy
  • gzip
  • lz4
  • zstd
  • brotli

Notes:

  • GeoParquet controls are only applied for .parquet outputs.
  • If no output extension is provided, vector writes default to .gpkg.
# GeoParquet with explicit row-group and compression controls
wbe.write_vector(v, 'roads.parquet', options={
        'strict_format_options': True,
        'geoparquet': {
                'compression': 'zstd',
                'max_rows_per_group': 250000,
                'data_page_size_limit': 1048576,
                'write_batch_size': 8192,
                'data_page_row_count_limit': 20000,
        },
})

Lidar Output Controls

write_lidar(...) supports LAZ and COPC option blocks.

Choose LAZ for compact archives and COPC when cloud-native spatial access is important.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
l = wbe.read_lidar('survey.las')

# LAZ controls
wbe.write_lidar(l, 'survey_out.laz', options={
    'laz': {'chunk_size': 25000, 'compression_level': 7},
})

# COPC controls
wbe.write_lidar(l, 'survey_out.copc.laz', options={
    'copc': {
        'max_points_per_node': 75000,
        'max_depth': 8,
        'node_point_ordering': 'hilbert',
    },
})

Lidar Write Option Reference

For lidar writes, the options dictionary supports these top-level keys:

  • laz: nested dictionary with LAZ writer controls.
  • copc: nested dictionary with COPC hierarchy controls.

laz

Supported keys:

  • chunk_size: positive integer.
  • compression_level: integer compression level.

Default values when keys are omitted:

  • chunk_size: 50000.
  • compression_level: 6.

compression_level is typically used in the 0-9 range for LAZ workflows.

copc

Supported keys:

  • max_points_per_node: positive integer.
  • max_depth: positive integer.
  • node_point_ordering: string ordering mode.

Default values when keys are omitted:

  • max_points_per_node: 100000.
  • max_depth: 8.
  • node_point_ordering: auto.

Supported copc.node_point_ordering values:

  • auto
  • morton
  • hilbert

Notes:

  • If no output extension is provided, lidar writes default to .copc.laz.
  • COPC options are relevant when output format is COPC (.copc.laz or .copc.las).
# Extensionless output defaults to COPC
wbe.write_lidar(l, 'survey_out', options={
    'copc': {
        'max_points_per_node': 75000,
        'max_depth': 8,
        'node_point_ordering': 'hilbert',
    },
})

# Explicit LAZ controls
wbe.write_lidar(l, 'survey_out.laz', options={
    'laz': {
        'chunk_size': 25000,
        'compression_level': 7,
    },
})

Memory Lifecycle and Cleanup

For workflows that use memory-backed rasters, vectors, and lidar objects (file_mode='m'), explicit lifecycle management prevents unbounded memory growth in long-running jobs.

When to use memory mode

Memory mode is most valuable when:

  • Chaining multiple operations on the same data without disk I/O.
  • Processing intermediate results that are never persisted to disk.
  • Running batched analysis where you load, process, and clear per batch.
  • Working with smaller datasets where memory is not a constraint.

Avoid memory mode when:

  • Working with data larger than available RAM.
  • Processing single operations on large files.
  • Running unattended long-running jobs without explicit cleanup.

Explicit cleanup in long-running pipelines

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

# Long-running batch analysis
for tile_id in range(1, 1001):
    print(f'Processing tile {tile_id}')
    
    # Read data into memory for this tile
    dem = wbe.read_raster(f'tile_{tile_id}.tif', file_mode='m')
    bounds = wbe.read_vector(f'bounds_{tile_id}.gpkg', file_mode='m')
    
    # Process
    result = wbe.run_tool('clip_raster_by_polygon', {
        'input': dem.file_path,
        'polygon': bounds.file_path,
        'output': f'clipped_{tile_id}.tif'
    })
    
    # Explicit cleanup before next iteration
    wbe.remove_raster_from_memory(dem)
    wbe.remove_vector_from_memory(bounds)

Monitoring memory usage

For production scripts, track memory explicitly:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

print(f"Initial raster memory: {wbe.raster_memory_bytes() / 1e6:.1f} MB")
print(f"Initial vector memory: {wbe.vector_memory_bytes() / 1e6:.1f} MB")

# ... run operations ...

# Before returning or starting new phase
print(f"Final raster count: {wbe.raster_memory_count()}")
print(f"Final raster memory: {wbe.raster_memory_bytes() / 1e6:.1f} MB")

# Explicit reset if needed
wbe.clear_memory()
print(f"After clear: {wbe.raster_memory_count()}")

Extensionless Defaults

Extensionless writes are useful in prototyping, but pin extensions in production for deterministic artifact naming.

When no extension is provided:

  • raster -> .tif (COG-style GeoTIFF default)
  • vector -> .gpkg
  • lidar -> .copc.laz
wbe.write_raster(r, 'my_raster')
wbe.write_vector(v, 'my_vector')
wbe.write_lidar(l, 'my_lidar')

Reproducibility Checklist

  1. Pin output extension explicitly.
  2. Set strict_format_options=True when format mismatches must error.
  3. Pin codec/layout/tile parameters for raster outputs.
  4. For vector/lidar, pin format-specific compression and partitioning options.
  5. Re-open output and validate metadata before downstream analysis.

Supported Data Formats

This chapter summarizes practical format support exposed through WbW-Py.

Format support influences interoperability, performance, and long-term archive strategy. Read/write capability is only one part of the decision; you should also consider ecosystem compatibility, compression behavior, and whether a given format is best used as an interchange artifact or as an internal working format.

Backend support comes from core crates:

  • Raster: wbraster
  • Vector: wbvector
  • Lidar: wblidar

Raster Formats

Exhaustive raster format support in the current WbW-Py build:

FormatRead (read_raster)Write (write_raster)Common extensions / path rules
DTEDYesYes.dt0, .dt1, .dt2 (DTED 0, 1, 2 elevation data; WGS-84 geographic only)
Esri ASCII GridYesYes.asc (and .grd when detected as ASCII)
Esri Binary Grid workspaceYesBackend-onlyEsri Binary workspace directory (hdr.adf + w001001.adf) or .adf
Esri Float GridYesYes.flt, .hdr (single-band float grid with header file)
ERDAS IMAGINE (HFA)YesNo.img - read-only MVP; RLC (run-length) compression supported
GRASS ASCII RasterYesYes.txt / .asc when GRASS header keys are detected
Surfer GRDYesYes.grd (DSAA / DSRB signatures)
PCRasterYesYes.map (CSF signature)
SAGA Binary GridYesYes.sdat, .sgrd
Idrisi / TerrSet RasterYesYes.rst, .rdc
ER MapperYesYes.ers
ENVI HDR-labelled rasterYesYes.hdr, or data files (.img, .dat, .bin, .raw, .bil, .bsq, .bip) with .hdr sidecar
GeoTIFF / BigTIFF / COGYesYes.tif, .tiff
GeoPackage rasterYesYes.gpkg
JPEG2000 / GeoJP2YesYes.jp2
JPEG + World FileYesYes.jpg, .jpeg with .jgw, .jpgw, .jpegw, or .wld world file
PNG + World FileYesYes.png with .pgw, .pngw, or .wld world file
ZarrYesYes.zarr store (directory / suffix)
XYZ ASCII GridYesYes.xyz (whitespace or comma-delimited X Y Z points)

Typical pattern:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
r = wbe.read_raster('dem.tif')
wbe.write_raster(r, 'dem_out.tif')

Vector Formats

Exhaustive vector format support in the current WbW-Py build:

FormatRead (read_vector)Write (write_vector)Extensions / notes
FlatGeobufYesYes.fgb
GeoJSONYesYes.geojson, .json
TopoJSONYesYes.topojson
GeoPackageYesYes.gpkg
GeoParquetYesYes.parquet
GMLYesYes.gml
GPXYesYes.gpx
KMLYesYes.kml
KMZYesYes.kmz
MapInfo InterchangeYesYes.mif with .mid sidecar
OSM PBFYesNo.osm.pbf (read workflows only)
ESRI ShapefileYesYes.shp plus dataset sidecars

When write_vector(...) is called without an extension, WbW-Py defaults output to .gpkg.

Typical pattern:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
v = wbe.read_vector('roads.gpkg')
wbe.write_vector(v, 'roads_out.gpkg')

Lidar Formats

Exhaustive lidar format support in the current WbW-Py build:

FormatRead (read_lidar)Write (write_lidar)Extensions / notes
LASYesYes.las
LAZYesYes.laz
COPCYesYes.copc.las, .copc.laz
PLYYesYes.ply
E57YesYes.e57

When write_lidar(...) is called without an extension, WbW-Py defaults output to .copc.laz.

Typical pattern:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
l = wbe.read_lidar('survey.las')
wbe.write_lidar(l, 'survey_out.copc.laz')

Sensor Bundle Families

Supported bundle readers include:

  • read_bundle(...) (auto-detect)
  • read_landsat(...)
  • read_sentinel1(...)
  • read_sentinel2(...)
  • read_planetscope(...)
  • read_iceye(...)
  • read_dimap(...)
  • read_maxar_worldview(...)
  • read_radarsat2(...)
  • read_rcm(...)

Bundle inputs may be either extracted directories or supported archives:

  • .zip
  • .tar
  • .tar.gz
  • .tgz

See Working with Sensor Bundles for family-specific examples.

Validation Guidance

  1. Prefer stable interchange formats (.tif, .gpkg, .copc.laz) for production pipelines.
  2. Re-open outputs and verify metadata after write operations.
  3. Use explicit options where format behavior must be reproducible.

Reprojection and CRS

This chapter covers raster, vector, and lidar reprojection in WbW-Py.

CRS handling is a semantic correctness issue, not just a metadata preference. When coordinate assumptions are wrong, analyses may still run but yield invalid spatial conclusions. The patterns here emphasize explicit source/destination CRS checks, controlled reprojection settings, and immediate post-transform verification before any downstream computation.

CRS Inspection

Run this first to verify assumptions before any reprojection call.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
dem = wbe.read_raster('dem.tif')
roads = wbe.read_vector('roads.gpkg')
las = wbe.read_lidar('survey.las')

print('raster epsg:', dem.metadata().epsg_code)
print('vector epsg:', roads.metadata().crs_epsg)
print('lidar epsg:', las.metadata().crs_epsg)

Assigning Projection Metadata

Use CRS assignment only when the coordinates are already in the correct coordinate system but the file metadata is missing or wrong. Assignment does not move coordinates; it only changes the declared CRS. If you need to change the coordinate values themselves, use the reprojection methods shown later in this chapter.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
dem = wbe.read_raster('dem_without_crs.tif')
roads = wbe.read_vector('roads_without_crs.gpkg')
las = wbe.read_lidar('survey_without_crs.las')

# Assign by EPSG when you know the coordinates already use that CRS.
dem.set_crs_epsg(26917)
roads.set_crs_epsg(26917)

# Assign by WKT when that is the CRS information you have available.
utm_wkt = wbe.projection.to_ogc_wkt(26917)
las.set_crs_wkt(utm_wkt)

print('raster epsg after assignment:', dem.crs_epsg())
print('vector epsg after assignment:', roads.crs_epsg())
print('lidar epsg after assignment:', las.crs_epsg())

# If metadata is wrong rather than missing, clear it before reassigning.
roads.clear_crs()
roads.set_crs_epsg(26917)

This pattern is especially useful for legacy rasters, sidecar-free vector exports, and lidar files that have correct coordinates but incomplete CRS metadata.

Raster Reprojection

WbW-Py exposes six raster-object reprojection methods. Use the one that matches your grid-control needs:

  1. raster.reproject(...): General method with full control over extent, rows/cols, resolution, snap origin, nodata policy, antimeridian policy, grid-size policy, and destination footprint.
  2. raster.reproject_nearest(dst_epsg, ...): Convenience wrapper for nearest-neighbour reprojection.
  3. raster.reproject_bilinear(dst_epsg, ...): Convenience wrapper for bilinear reprojection.
  4. raster.reproject_to_match_grid(target_grid, ...): Reprojects and snaps exactly to another raster's grid geometry (extent, rows, cols, resolution).
  5. raster.reproject_to_match_resolution(reference_grid, ...): Reprojects while matching the reference raster's resolution and snap behavior.
  6. raster.reproject_to_match_resolution_in_epsg(dst_epsg, reference_grid, ...): Reprojects to a specified EPSG while deriving output resolution controls from a reference raster.

Available resampling methods (wbraster)

Use these values anywhere a reprojection method accepts resample:

  • nearest: category-safe nearest-neighbour.
  • bilinear: smooth linear interpolation.
  • cubic: bicubic interpolation.
  • lanczos: high-quality sinc-window interpolation.
  • average: 3x3 mean statistic.
  • min: 3x3 minimum statistic.
  • max: 3x3 maximum statistic.
  • mode: 3x3 modal statistic.
  • median: 3x3 median statistic.
  • stddev: 3x3 standard deviation statistic.

Practical defaults:

  • Categorical/class rasters: nearest (or mode for smoothing by majority).
  • Continuous surfaces (DEM, reflectance, temperature): bilinear, cubic, or lanczos.
  • Thematic/statistical resamples: average, min, max, median, stddev.

Example: full-control reprojection

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
dem = wbe.read_raster('dem.tif')

dem_utm = dem.reproject(
    dst_epsg=32618,
    resample='bilinear',
    x_res=10.0,
    y_res=10.0,
    nodata_policy='partial_kernel',
    antimeridian_policy='auto',
    grid_size_policy='expand',
    destination_footprint='none',
)

wbe.write_raster(dem_utm, 'dem_utm_10m.tif')

Example: grid-matching reprojection

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
src = wbe.read_raster('landcover_4326.tif')
target = wbe.read_raster('dem_utm_10m.tif')

# Categorical raster: nearest is typically required.
aligned = src.reproject_to_match_grid(target, resample='nearest')
wbe.write_raster(aligned, 'landcover_aligned.tif')

Automatic reprojection in raster-stack tools

Several stack-based tools now support automatic stack alignment with explicit controls:

  • auto_reproject (default true)
  • auto_reproject_method (optional override)

Current behavior:

  1. inputs[0] is treated as the reference raster.
  2. Any stack raster with mismatched CRS is auto-reprojected to match the reference grid when auto_reproject=true.
  3. If auto_reproject_method is unset:
    • categorical rasters infer nearest
    • continuous rasters infer bilinear
  4. If extents do not overlap after alignment, tools raise a hard error.

This is especially important for stack workflows (input_rasters/inputs) such as overlay operations, weighted sums, PCA, inverse PCA, raster calculator, image segmentation, and position-based stack selection.

Example: stack tool with automatic reprojection

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

result = wbe.run_tool(
    'weighted_sum',
    {
        'input_rasters': ['slope_utm.tif', 'landcover_4326.tif', 'distance_utm.tif'],
        'weights': [0.4, 0.35, 0.25],
        'auto_reproject': True,
        'auto_reproject_method': '',  # empty -> infer nearest/bilinear per raster
        'output': 'weighted_sum.tif',
    },
)
print(result)

Vector Reprojection

This pattern is appropriate when geometry validity and failure policy need to be explicit.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
roads = wbe.read_vector('roads.gpkg')

roads_utm = wbe.projection_georeferencing.general.reproject_vector(
    roads,
    dst_epsg=32618,
    failure_policy='error',
    topology_policy='none',
)

wbe.write_vector(roads_utm, 'roads_utm.gpkg')

Lidar Reprojection

Use explicit lidar reprojection settings to avoid silent dimensional or policy defaults.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
las = wbe.read_lidar('survey.las')

las_utm = wbe.projection_georeferencing.general.reproject_lidar(
    las,
    dst_epsg=32618,
    use_3d_transform=False,
    failure_policy='error',
)

wbe.write_lidar(las_utm, 'survey_utm.copc.laz')

Georeference Raster from Control Points

Use this tool when an image/raster has no reliable georeferencing and you have ground-control points (GCPs) linking image pixel coordinates to map coordinates.

Required CSV fields:

  • source_col
  • source_row
  • target_x
  • target_y
import whitebox_workflows as wb

wbe = wb.WbEnvironment()

result = wbe.projection_georeferencing.general.georeference_raster_from_control_points(
    input_raster='historical_scan.tif',
    control_points_csv='historical_scan_gcps.csv',
    epsg=32618,
    resample='bilinear',
    output='historical_scan_georef.tif',
    report='historical_scan_georef_report.json',  # optional diagnostics JSON
)

print(result)

If you need raw runtime invocation style, the equivalent tool ID is georeference_raster_from_control_points.

Projection Utility Namespace

This namespace is useful for CRS diagnostics and point-level coordinate transform tasks outside full dataset reprojection.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

wkt_3857 = wbe.projection.to_ogc_wkt(3857)
print('epsg from wkt:', wbe.projection.identify_epsg(wkt_3857))

pts = [{'x': -79.3832, 'y': 43.6532}]
pts_utm = wbe.projection.reproject_points(pts, src_epsg=4326, dst_epsg=32618)
print(pts_utm)

Parse a PROJ string

Use projection.from_proj_string when you have a PROJ4-style string (e.g., read from a legacy file header or third-party metadata) and need to identify the corresponding EPSG code or obtain an OGC WKT representation.

The method returns a dict with exactly one of these keys:

  • {'epsg': int} — EPSG code identified
  • {'wkt': str} — no EPSG match, WKT representation available
  • {'unknown': True} — PROJ string parsed but CRS could not be identified further
import whitebox_workflows as wb

wbe = wb.WbEnvironment()

proj_str = '+proj=utm +zone=17 +datum=NAD83 +units=m +no_defs'
result = wbe.projection.from_proj_string(proj_str)

if 'epsg' in result:
    print('identified EPSG:', result['epsg'])  # e.g. 26917
elif 'wkt' in result:
    print('WKT:', result['wkt'])
else:
    print('CRS unknown')

This is the recommended fallback for legacy data sources that carry only a PROJ4 metadata string. WbW-Py itself uses this path internally when reprojecting rasters whose CRS metadata does not include an EPSG code.

Area-of-use bounding box

Use projection.area_of_use to retrieve the geographic bounding box of valid use for an EPSG code. This is useful for validating that your data actually falls within the intended CRS domain before or after reprojection.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

bbox = wbe.projection.area_of_use(32618)  # UTM Zone 18N
if bbox is not None:
    print(f"valid lon: {bbox['lon_min']} to {bbox['lon_max']}")
    print(f"valid lat: {bbox['lat_min']} to {bbox['lat_max']}")

# Returns None for codes with no registered bounding box.
print(wbe.projection.area_of_use(9999))  # None

You can also pass the bounding box check as a pre-reprojection guard:

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
dem = wbe.read_raster('dem.tif')

dst_epsg = 32618
bbox = wbe.projection.area_of_use(dst_epsg)
if bbox is not None:
    ext = dem.metadata().extent
    # Quick geographic sanity check before committing to full reprojection.
    in_range = (
        ext.min_x >= bbox['lon_min'] and ext.max_x <= bbox['lon_max'] and
        ext.min_y >= bbox['lat_min'] and ext.max_y <= bbox['lat_max']
    )
    if not in_range:
        print('WARNING: DEM extent may fall outside area of use for EPSG:', dst_epsg)

dem_utm = dem.reproject(dst_epsg=dst_epsg, resample='bilinear')
wbe.write_raster(dem_utm, 'dem_utm.tif')

Best Practices

  • Confirm source CRS before any reprojection.
  • Use nearest for categorical raster data, bilinear/cubic for continuous data.
  • Re-open outputs and verify CRS metadata post-transform.
  • Keep transform options explicit for reproducible pipelines.

Topology Utilities

This chapter covers topology helper methods available under wbe.topology.

Topology utilities are best used as fast, script-level geometry checks and diagnostics. They answer questions like:

  • Do these two geometries intersect, touch, or overlap?
  • Is this polygon valid before a downstream operation?
  • What is the exact DE-9IM relationship between two geometries?

These methods are lightweight and ideal for validation gates. For large-scale overlay workflows and heavy production geometry processing, prefer dedicated vector tools in wbe.vector.*.

Topology Workflow Pattern

A robust pattern for topology-aware scripts is:

  1. Validate CRS compatibility first.
  2. Run quick topology predicates to detect obvious incompatibilities.
  3. Repair invalid geometries if required.
  4. Re-check relationships after repairs.
  5. Continue into heavier vector analysis only after topology checks pass.

This keeps failures early and localized, which is usually easier to debug than finding topology issues deep into a long processing chain.

WKT Predicate Checks

Use predicate checks for fast boolean decisions in filters and QA gates.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

poly = 'POLYGON((0 0,10 0,10 10,0 10,0 0))'
pt = 'POINT(5 5)'

print('contains:', wbe.topology.contains_wkt(poly, pt))
print('intersects:', wbe.topology.intersects_wkt(poly, pt))
print('within:', wbe.topology.within_wkt(pt, poly))
print('touches:', wbe.topology.touches_wkt(poly, pt))
print('disjoint:', wbe.topology.disjoint_wkt(poly, pt))

Relationship and Distance

Use relate_wkt when a simple predicate is not enough and you need the full topological relationship signature.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

a = 'LINESTRING(0 0, 10 0)'
b = 'LINESTRING(0 1, 10 1)'

print('distance:', wbe.topology.distance_wkt(a, b))
print('relate:', wbe.topology.relate_wkt(a, b))

The DE-9IM string returned by relate_wkt is useful for rule-based quality checks in advanced pipelines, especially when business logic depends on exact boundary/interior behavior.

Geometry Validation and Repair

Validation and repair are especially useful before overlays, dissolves, and buffer-heavy workflows where invalid rings can cause downstream failures.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()

invalid = 'POLYGON((0 0,4 4,4 0,0 4,0 0))'
print('is_valid:', wbe.topology.is_valid_polygon_wkt(invalid))

fixed = wbe.topology.make_valid_polygon_wkt(invalid)
print('fixed:', fixed)

buf = wbe.topology.buffer_wkt('LINESTRING(0 0, 10 0)', 1.5)
print('buffer:', buf)

When repairing, prefer writing repaired geometries to new outputs and keeping source data immutable for auditability.

Feature-to-Feature Relation

For vector feature comparisons, use vector_feature_relation(...).

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
roads = wbe.read_vector('roads.gpkg')

rel = wbe.topology.vector_feature_relation(roads, 0, roads, 1)
print(rel)

For batch validation across many feature pairs, call this in a controlled loop and log results with feature IDs for traceable QA reports.

Topology Namespace Method Reference

The table below summarizes the core wbe.topology methods.

MethodDescription
intersects_wktReturn True when two geometries share any portion of space.
contains_wktReturn True when geometry A strictly contains geometry B.
within_wktReturn True when geometry A is strictly within geometry B.
touches_wktReturn True when geometries meet at boundaries but interiors do not overlap.
disjoint_wktReturn True when geometries have no spatial intersection.
crosses_wktReturn True when geometries cross with dimensional reduction behavior.
overlaps_wktReturn True when same-dimension geometries partially overlap.
covers_wktReturn True when geometry A covers geometry B including boundary cases.
covered_by_wktReturn True when geometry A is covered by geometry B including boundary cases.
relate_wktReturn DE-9IM relationship text for exact topology-rule evaluation.
distance_wktReturn shortest distance between two geometries.
is_valid_polygon_wktCheck polygon validity before topology-sensitive workflows.
make_valid_polygon_wktRepair an invalid polygon WKT representation.
buffer_wktBuild a buffered geometry from WKT and distance.
vector_feature_relationEvaluate topology relation between indexed features in vector datasets.

Practical Guidance

  • Use topology utilities for fast geometry checks in validation pipelines.
  • For complex overlays and production transforms, prefer vector tool workflows.
  • Keep CRS explicit before topology checks; mismatched CRS can produce plausible but wrong relations.
  • Use relate_wkt when QA rules depend on exact boundary/interior semantics.

Interoperability

This chapter provides practical exchange patterns between WbW-Py and common Python geospatial tooling.

Interoperability is best thought of as controlled boundary crossing. Each conversion introduces potential differences in metadata conventions, numeric precision, CRS representation, and schema typing. The workflows in this chapter focus on explicit handoff points and roundtrip validation so multi-library pipelines remain trustworthy.

Copy-Boundary Model

  • to_numpy() / from_numpy() are explicit in-memory exchange boundaries.
  • Rasterio/GeoPandas/rioxarray flows are file-based boundaries.
  • Always validate metadata after roundtrip (metadata(), CRS, dimensions, schema).

NumPy Roundtrip

Use this when you need direct numeric control for custom raster math.

import numpy as np
import whitebox_workflows as wb

wbe = wb.WbEnvironment()
r = wbe.read_raster('dem.tif')
a = r.to_numpy(dtype='float64')
a = np.where(np.isfinite(a), a + 1.0, a)

r2 = wb.Raster.from_numpy(a, r, output_path='dem_plus1.tif')

Rasterio Roundtrip

Use this for compatibility with rasterio-centric ecosystems and workflows.

import rasterio
import whitebox_workflows as wb

wbe = wb.WbEnvironment()
r = wbe.read_raster('dem.tif')
wbe.write_raster(r, 'dem_for_rasterio.tif')

with rasterio.open('dem_for_rasterio.tif') as src:
    arr = src.read(1)
    profile = src.profile

arr = arr * 1.02
profile.update(dtype='float32', count=1)
with rasterio.open('dem_rio_out.tif', 'w', **profile) as dst:
    dst.write(arr.astype('float32'), 1)

r_back = wbe.read_raster('dem_rio_out.tif')
print(r_back.metadata())

GeoPandas and Shapely Roundtrip

Use this pattern for vector enrichment and geometry filtering in the broader Python geospatial stack.

import geopandas as gpd
import whitebox_workflows as wb

wbe = wb.WbEnvironment()
v = wbe.read_vector('roads.gpkg')
wbe.write_vector(v, 'roads_for_gpd.gpkg')

gdf = gpd.read_file('roads_for_gpd.gpkg')
gdf['length_m'] = gdf.length
gdf = gdf[gdf['length_m'] > 20.0]
gdf.to_file('roads_filtered.gpkg', driver='GPKG')

v_back = wbe.read_vector('roads_filtered.gpkg')
print(v_back.schema())

xarray/rioxarray Roundtrip

Use this when you need labeled-array operations or rolling-window processing.

import rioxarray as rxr
import whitebox_workflows as wb

wbe = wb.WbEnvironment()
r = wbe.read_raster('dem.tif')
wbe.write_raster(r, 'dem_for_xarray.tif')

da = rxr.open_rasterio('dem_for_xarray.tif').squeeze(drop=True)
da_smoothed = da.rolling(x=3, y=3, center=True).mean()
da_smoothed.rio.to_raster('dem_xarray_smoothed.tif')

r_back = wbe.read_raster('dem_xarray_smoothed.tif')
print(r_back.metadata())

pyproj CRS Workflow

Use this for explicit CRS introspection and conversion checks outside full file I/O steps.

from pyproj import CRS
import whitebox_workflows as wb

wbe = wb.WbEnvironment()
r = wbe.read_raster('dem.tif')

src = CRS.from_epsg(r.metadata().epsg_code)
dst = CRS.from_epsg(32618)
print(src.to_string(), '->', dst.to_string())

r_utm = wbe.projection_georeferencing.general.reproject_raster(r, dst_epsg=dst.to_epsg(), resample='bilinear')

Validation Checklist

  1. Check CRS after every roundtrip.
  2. Check dimensions and nodata for raster flows.
  3. Check schema and representative attributes for vector flows.
  4. Prefer stable formats (.tif, .gpkg) for routine exchange.

Licensing

This chapter documents licensing modes, tier models, and interactive activation workflows.

Licensing mode is a runtime dependency and should be treated like any other infrastructure requirement. The objective is to make startup behavior explicit, auditable, and resilient: scripts should clearly communicate which mode they expect, how fallback is handled, and what operational evidence is logged when activation succeeds or fails.

License Tiers

Whitebox NG is available in two tiers:

  • Open Tier (open): Governed by MIT/Apache 2.0 dual licensing. All Open-tier tools are free and open-source with no entitlement required.

  • Pro Tier (pro): Proprietary commercial software. Pro-tier tools require activation with a valid license key and are subject to End-User License Agreement (EULA) terms.

Open Mode

Use this for OSS-only workflows or environments where entitlement is not required. No activation needed.

import whitebox_workflows as wb

wbe = wb.WbEnvironment()
print('license type:', wbe.license_type())  # open

Interactive License Management

For end users and interactive workflows, activate and manage Pro licenses directly within Python. Activation persists local license state so subsequent runs automatically load the license without re-entry.

Activating a License

import whitebox_workflows as wb

# User calls activate with their license key
result = wb.activate_license(
    key='YOUR_LICENSE_KEY',
    firstname='John',
    lastname='Smith',
    email='john@example.com',
    agree_to_license_terms=True,
    # Optional: provider_url, machine_id, customer_id
)
print(result)  # Activation confirmation message

What happens:

  • Server validates the key and issues a signed entitlement.
  • Entitlement is verified and persisted to ~/.whitebox/wbw_ng_license_state.json.
  • Subsequent WbEnvironment() calls automatically load local state.
  • If local state is expired or invalid, fallback to open tier.

Checking License Status

import whitebox_workflows as wb

info = wb.license_info()
print(info)
# {
#   "valid": true,
#   "effective_tier": "pro",
#   "seconds_remaining": 2592000,
#   "expires_at_unix": 1234567890,
#   "now_unix": 1234567890
# }

Querying Time Remaining Directly

Use the dedicated helper when you only need remaining time values:

import whitebox_workflows as wb

remaining = wb.license_time_remaining()
print(remaining)
# {
#   "active": true,
#   "valid": true,
#   "seconds_remaining": 2592000,
#   "days_remaining": 30,
#   "expires_at_unix": 1234567890,
#   "now_unix": 1234567890
# }

If no local license exists, the call returns active: false with seconds_remaining: 0 and days_remaining: 0.

Transferring a License

To move a license to another machine, call transfer which returns a portable payload and clears local state:

import whitebox_workflows as wb

payload = wb.transfer_license()
print(payload)  # Contains activation credentials for the destination machine
# Local state is now cleared on this machine

Deactivating a License

import whitebox_workflows as wb

result = wb.deactivate_license()
print(result)  # Confirmation message
# Local state is cleared; future runs fall back to open tier

Local State Persistence

Active license state is stored at ~/.whitebox/wbw_ng_license_state.json (or override via WBW_LICENSE_STATE_PATH environment variable). On each startup:

  • If local state exists and is valid, it is automatically loaded.
  • If local state is expired or missing, the runtime falls back to open tier.
  • You do not need to re-authenticate on every run once activated.

Programmatic Modes: Signed Entitlement and Floating License

For automated or managed deployments, use programmatic licensing modes where entitlements are supplied directly at runtime.

Signed Entitlement (JSON)

Use this in managed deployments where entitlement payloads are supplied at runtime.

import whitebox_workflows as wb

signed_entitlement_json = '...'
public_key_kid = 'k1'
public_key_b64url = 'REPLACE_WITH_PROVIDER_KEY'

wbe = wb.WbEnvironment.from_signed_entitlement_json(
    signed_entitlement_json=signed_entitlement_json,
    public_key_kid=public_key_kid,
    public_key_b64url=public_key_b64url,
    include_pro=True,
    fallback_tier='open',
)

print(wbe.license_type())

Signed Entitlement (File)

Use this when entitlement bundles are provisioned as files by platform tooling.

import whitebox_workflows as wb

wbe = wb.WbEnvironment.from_signed_entitlement_file(
    entitlement_file='entitlement.json',
    public_key_kid='k1',
    public_key_b64url='REPLACE_WITH_PROVIDER_KEY',
    include_pro=True,
    fallback_tier='open',
)

Floating License

Use this when license allocation is centrally managed and leased per machine or session.

import whitebox_workflows as wb

wbe = wb.WbEnvironment.from_floating_license_id(
    floating_license_id='fl_12345',
    include_pro=True,
    fallback_tier='open',
    provider_url='https://license.example.com',
    machine_id='machine-01',
    customer_id='customer-abc',
)

print(wbe.license_info())

Failure Handling Guidance

  • Validate startup at script entry and fail early on licensing errors.
  • For entitlement/floating modes, keep diagnostics from thrown exceptions.
  • Use open mode fallback only where policy permits.

Security and Operations Notes

  • Do not hardcode real license secrets in source control.
  • Prefer environment variables or secure secret stores.
  • Log license mode and runtime version for reproducibility.

API Reference Strategy

Manual chapters provide narrative and recipes.

This split between narrative manual content and detailed tool references is intentional. Conceptual chapters answer workflow questions (what to do and why), while reference docs answer contract questions (exact parameters and return shape). Keeping these concerns separate improves readability without sacrificing precision.

Tool-by-tool parameter references remain in:

  • TOOLS.md
  • docs/tools_*.md

Reference Boundaries

  • This manual explains concepts, patterns, and end-to-end examples.
  • Tool parameter completeness lives in TOOLS.md and category tool docs.
  • Type signatures and IntelliSense details are reflected in package typing assets.

Terrain Analysis and Geomorphometry

Terrain analysis — or geomorphometry — is the quantitative characterization of land-surface form from digital elevation models (DEMs). It is one of the original strengths of the Whitebox platform and covers a broad spectrum: from simple first-order derivatives like slope and aspect, through classification of terrain form, to advanced multiscale and visibility analyses.

This chapter moves from foundational concepts through intermediate and advanced workflows, with example scripts throughout.


What is a DEM?

A digital elevation model (DEM) is a raster where each cell stores the elevation of the land surface at that location. DEMs underpin nearly every terrain analysis: slope gradients, drainage patterns, landform classification, solar radiation modelling, and viewshed analysis all begin with a DEM.

Common DEM sources include:

  • LiDAR-derived bare-earth surfaces (highest resolution and accuracy)
  • Photogrammetric DEMs from drone or aerial surveys
  • SRTM (global 30 m coverage)
  • ASTER GDEM and Copernicus DEM (global 30 m or 90 m)

DEM quality — vertical accuracy, spatial resolution, systematic artefacts from interpolation or acquisition — directly limits the quality of derivative products, so always inspect your DEM before computing derivatives.


Core Concepts

Terrain analysis relies on these fundamental concepts:

  • Slope: The gradient of the land surface, typically expressed in degrees (0–90) or percent. High values indicate steep terrain; low values indicate flat terrain. Critical for erosion, landslide, and hydrology models.
  • Aspect: Compass direction a slope faces (0–360°, measured clockwise from north). Flat cells are typically -1 or nodata. Controls solar radiation, drainage direction, and habitat quality.
  • Curvature: Rate of change of slope (usually planform and profile separately). Positive profile curvature indicates acceleration zones (ridges); negative indicates deceleration zones (valleys).
  • Flow direction: The steepest downslope direction from each cell. Forms the basis for flow accumulation, drainage network delineation, and watershed boundaries.
  • Flow accumulation: Number of upslope cells draining to each cell, proportional to contributing area. High values indicate stream channels and valley bottoms.
  • Topographic Wetness Index (TWI): Ln(upslope area / tan(slope)), predicts persistent moisture. High TWI indicates probable saturated zones; used in soil moisture and flood risk mapping.
  • Landform classification: Categorizing terrain into types (e.g. summits, ridges, valleys, footslopes) via multivariate analysis. Provides interpretable terrain structure without tuning thresholds.
  • Multiscale analysis: Deriving terrain metrics at multiple scales. Single-scale derivatives can miss important structure; multiscale approaches reveal process-relevant scales.
  • Viewshed: Set of cells visible from a vantage point. Used in landscape perception, military analysis, and wind farm siting.

First-Order Terrain Derivatives

First-order derivatives measure the rate of change of elevation over space.

Slope

Slope is the magnitude of the first derivative of elevation: the maximum rate of elevation change per unit horizontal distance. It is typically expressed in degrees (0–90) or as a percentage (rise over run × 100).

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

dem = wbe.read_raster('dem.tif')

# Degrees is the default units
slope_deg = wbe.terrain.derivatives.slope(dem, units='degrees')
wbe.write_raster(slope_deg, 'slope_degrees.tif')

# Percentage slope useful for agricultural and road applications
slope_pct = wbe.terrain.derivatives.slope(dem, units='percent')
wbe.write_raster(slope_pct, 'slope_percent.tif')

High values indicate steep terrain; low values indicate flat terrain. Slope is widely used in landslide hazard mapping, erosion modelling, habitat suitability analysis, and infrastructure routing.

Aspect

Aspect is the compass direction that a slope faces, measured clockwise from north (0°–360°). Flat cells conventionally receive a value of -1 or nodata.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

dem = wbe.read_raster('dem.tif')
aspect = wbe.terrain.derivatives.aspect(dem)
wbe.write_raster(aspect, 'aspect.tif')

Aspect controls solar insolation, snow persistence, soil moisture, and vegetation structure. South-facing slopes in the northern hemisphere receive more direct solar radiation and tend to be warmer and drier than north-facing slopes.

Hillshade

Hillshading simulates how the terrain would appear under a directional light source and is primarily a visualization aid rather than an analytical derivative.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

dem = wbe.read_raster('dem.tif')

# Single directional hillshade — azimuth and altitude in degrees
hs = wbe.terrain.general.hillshade(dem, azimuth=315.0, altitude=45.0)
wbe.write_raster(hs, 'hillshade.tif')

# Multidirectional hillshade (more even illumination, no shadow artefacts)
mdhs = wbe.terrain.general.multidirectional_hillshade(dem)
wbe.write_raster(mdhs, 'hillshade_multidirectional.tif')

# Hypsometrically tinted hillshade blends elevation colour and shading
hyp = wbe.terrain.general.hypsometrically_tinted_hillshade(dem)
wbe.write_raster(hyp, 'hillshade_hypsometric.tif')

Curvature

Curvature characterizes how the terrain surface bends in space. While slope measures the inclination of a surface, curvature measures how that inclination is changing — the "shape" of the surface rather than its steepness alone.

Whitebox provides a range of curvature types, each capturing a different geometric property.

Profile Curvature

Profile curvature is the curvature in the direction of steepest descent. Positive values correspond to convex terrain (accelerating flow); negative values to concave terrain (decelerating flow, area of convergence).

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem.tif')

profile_curv = wbe.terrain.derivatives.profile_curvature(dem)
wbe.write_raster(profile_curv, 'profile_curvature.tif')

Plan Curvature

Plan curvature is measured in the horizontal plane, perpendicular to the direction of slope. It indicates flow convergence (negative values) or flow divergence (positive values) and is a useful predictor of soil moisture, run-on areas, and erosion.

plan_curv = wbe.terrain.derivatives.plan_curvature(dem)
wbe.write_raster(plan_curv, 'plan_curvature.tif')

Full Curvature Suite

For detailed geomorphometric characterisation compute the full family. Each captures a slightly different geometric facet of the surface:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem.tif')

curv_types = {
    'mean_curvature':       wbe.terrain.derivatives.mean_curvature(dem),
    'gaussian_curvature':   wbe.terrain.derivatives.gaussian_curvature(dem),
    'minimal_curvature':    wbe.terrain.derivatives.minimal_curvature(dem),
    'maximal_curvature':    wbe.terrain.derivatives.maximal_curvature(dem),
    'casorati_curvature':   wbe.terrain.derivatives.casorati_curvature(dem),
    'accumulation_curvature': wbe.terrain.derivatives.accumulation_curvature(dem),
    'difference_curvature': wbe.terrain.derivatives.difference_curvature(dem),
    'generating_function':  wbe.terrain.derivatives.generating_function(dem),
    'curvedness':           wbe.terrain.derivatives.curvedness(dem),
}

for name, raster in curv_types.items():
    wbe.write_raster(raster, f'{name}.tif')

When to use each:

  • Mean curvature: general shape descriptor; used in hydrological and geomorphological studies.
  • Gaussian curvature: distinguishes synclinal, anteclinal, and saddle-point forms; positive on entirely convex surfaces, negative on saddle shapes.
  • Minimal/maximal curvature: principal curvatures; describe the extremes of bending in orthogonal directions.
  • Casorati curvature: root-mean-square of principal curvatures; a rotationally invariant roughness descriptor.
  • Accumulation curvature: amplifies high-curvature terrain features; useful for ridge–valley extraction.

Terrain Position and Landform Classification

Terrain position describes where a location sits relative to its surrounding landscape — ridge crest, upper slope, flat, valley floor, and so on. It forms the backbone of many landform classification workflows.

Topographic Position Index (TPI)

TPI compares the elevation of a cell to the mean elevation of its neighbourhood. Positive values indicate elevated positions (ridges, hilltops); negative values indicate depressed positions (valleys, channels); values near zero indicate planar slopes or mid-slope positions.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem.tif')

# Inner and outer radii in cells
tpi = wbe.terrain.landform_indices.relative_topographic_position(dem, filterx=11, filtery=11)
wbe.write_raster(tpi, 'tpi.tif')

Deviation from Mean Elevation

Similar to TPI but expressed as a Z-score: how many standard deviations the local elevation is from the neighbourhood mean.

dev = wbe.terrain.general.deviation_from_mean_elevation(dem, filterx=11, filtery=11)
wbe.write_raster(dev, 'deviation_from_mean_elev.tif')

Geomorphons

Geomorphons classify the local terrain into ten fundamental landform elements by analysing the directional horizon profile in eight compass directions: flat, peak, ridge, shoulder, spur, slope, hollow, footslope, valley, and pit. The approach is descriptive and does not require thresholds for individual derivatives.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem.tif')

geons = wbe.terrain.landform_indices.geomorphons(dem, search=50, threshold=1.0, flat=1.0, forms=True)
wbe.write_raster(geons, 'geomorphons.tif')

The search parameter sets the look-ahead distance in cells. Increasing it produces a more generalised classification that emphasises regional landform context; smaller values capture finer-scale topographic features.


Multiscale Terrain Analysis

Many geomorphic properties are scale-dependent: a feature that appears as a ridge at one scale is part of a larger plateau at another. Multiscale analysis explores how terrain derivatives vary with the scale (neighbourhood size) of computation.

Multiscale Roughness

Roughness quantifies the local complexity of the terrain surface. The multiscale variant computes roughness at a range of neighbourhood sizes and returns both the maximum roughness and the scale at which it occurs.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem.tif')

roughness, roughness_scale = wbe.terrain.multiscale_signatures.multiscale_roughness(
    dem,
    min_scale=1,
    max_scale=100,
    step=1
)

wbe.write_raster(roughness, 'multiscale_roughness.tif')
wbe.write_raster(roughness_scale, 'roughness_scale_of_max.tif')

The scale-of-maximum raster describes the spatial grain of the terrain texture: alluvial fans and smooth glacial valleys show large dominant scales; deeply dissected badlands and karst terrain show small scales.

Multiscale Elevation Percentile

Measures how often a cell's elevation is higher than nearby cells across a range of scales, identifying persistent topographic prominence regardless of local noise.

mep = wbe.terrain.multiscale_signatures.multiscale_elevation_percentile(
    dem,
    min_scale=1,
    num_steps=10,
    step_size=2,
    step_nonlinearity=1.0,
    sig_digits=3
)
wbe.write_raster(mep, 'multiscale_elevation_percentile.tif')

Multiscale Curvatures

Computes a suite of curvature metrics at multiple scales and returns the value at the scale of maximum local curvature for each cell.

multiscale_mean_curv = wbe.terrain.multiscale_signatures.multiscale_curvatures(
    dem,
    min_scale=1,
    max_scale=30,
    step=1
)
wbe.write_raster(multiscale_mean_curv, 'multiscale_curvatures.tif')

Multiscale Topographic Position Image

Produces an RGB composite where colour encodes topographic position at three nested scales, providing an intuitive visual summary of multi-level terrain structure.

mtp = wbe.terrain.multiscale_signatures.multiscale_topographic_position_image(
    dem,
    local=1,
    meso=11,
    broad=101
)
wbe.write_raster(mtp, 'multiscale_topo_position.tif')

Visibility Analysis

Visibility analysis determines which parts of the landscape can be seen from a given viewpoint or set of viewpoints.

Viewshed

A viewshed identifies the set of cells visible from an observer location.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

dem = wbe.read_raster('dem.tif')
observer_points = wbe.read_vector('observer_locations.shp')

viewshed = wbe.terrain.visibility.viewshed(dem, observer_points, height=1.8)
wbe.write_raster(viewshed, 'viewshed.tif')

Horizon Angle and Openness

Horizon angle measures the elevation angle to the skyline in a given direction, useful for solar modelling and local climate studies. Terrain openness (positive and negative) quantifies how open or enclosed a location is relative to its surroundings.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem.tif')

# Positive openness: how "exposed" each cell is
openness_pos = wbe.terrain.visibility.openness(dem, dist=50, pos_openness=True)
wbe.write_raster(openness_pos, 'openness_positive.tif')

# Negative openness: how "enclosed" each cell is
openness_neg = wbe.terrain.visibility.openness(dem, dist=50, pos_openness=False)
wbe.write_raster(openness_neg, 'openness_negative.tif')

Sky View Factor

Sky view factor measures the fraction of the sky hemisphere that is visible from a point on the surface, accounting for topographic obstruction. Values range from 0 (fully enclosed depression) to 1 (completely open flat surface). It is used in urban heat island modelling, long-wave radiation estimation, and snowmelt modelling.

svf = wbe.terrain.visibility.sky_view_factor(dem, num_directions=16, max_dist=200.0)
wbe.write_raster(svf, 'sky_view_factor.tif')

Terrain Smoothing

Raw DEMs often contain artefacts from acquisition and interpolation: striping, pitting, noisy micro-relief. Smoothing prior to derivative computation reduces these effects while ideally preserving genuine terrain features.

Feature-Preserving Smoothing (Multiscale)

Whitebox Next Gen now includes a newer multiscale feature-preserving smoother that is better suited to terrain-analysis workflows than the earlier single-scale smoother when you need to suppress short-wavelength DEM noise without flattening broader terrain form. The multiscale method works through a coarse-to-fine pyramid, smoothing at larger scales first and then refining the surface back toward the source DEM with explicit fidelity and edge-preservation controls.

At the moment this tool is most directly accessed through the generic tool-execution surface in WbW-Py.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('raw_dem.tif')

# Coarse-to-fine multiscale smoothing with explicit controls.
dem_smooth = wbe.run_tool(
  'feature_preserving_smoothing_multiscale',
  {
    'input': dem,
    'smoothing_amount': 0.65,
    'edge_preservation': 0.80,
    'scale_levels': 4,
    'fidelity': 0.45,
    'z_factor': 1.0,
  }
)
wbe.write_raster(dem_smooth, 'dem_smooth_multiscale.tif')

Use this before curvature, terrain-position, and landform-classification workflows, especially where acquisition artefacts or interpolation roughness would otherwise dominate second-derivative products.


WbW-Pro Spotlight: Terrain Constraint and Conflict Analysis

  • Problem: Screen terrain constraints early for siting and corridor decisions.
  • Tool: terrain_constraint_and_conflict_analysis
  • Typical inputs: DEM, optional wetness, optional flood-risk surface, optional land-cover penalty, slope threshold.
  • Typical outputs: Terrain-conflict score raster, conflict classes, and summary outputs.
import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

result = wbe.run_tool(
  'terrain_constraint_and_conflict_analysis',
  {
    'dem': 'dem.tif',
    'wetness': 'wetness_index_norm.tif',
    'flood_risk': 'flood_risk_norm.tif',
    'landcover_penalty': 'landcover_penalty_norm.tif',
    'slope_limit_deg': 15.0,
    'output_prefix': 'terrain_conflict_corridor_a'
  }
)
print(result)

Note: This workflow requires a WbEnvironment initialized with a valid Pro licence.

Pro Sweep Diagnostics for Siting Workflows

For scenario testing in Pro siting workflows, wind_turbine_siting and solar_site_suitability_analysis accept sweep_spec_json and produce additional sweep outputs:

  • run_matrix_summary (CSV)
  • sensitivity_report (JSON)
  • sensitivity_report_html (HTML)
  • stability_map (GeoTIFF; 3=high, 2=medium, 1=low)

The JSON sensitivity report includes a normalized primary metric span and stability class fields for quick robustness screening:

  • metrics.primary_metric
  • metrics.primary_relative_span
  • metrics.stability_class with values high, medium, or low

Example:

import json
import whitebox_workflows as wbw

wbe = wbw.WbEnvironment(include_pro=True, tier="pro")

sweep_spec = {
  "schema_version": "1.0.0",
  "sweep_mode": "grid",
  "parameters": [
    {"name": "candidate_threshold", "values": [0.65, 0.7, 0.75]},
  ],
}

score, conf, summary = wbe.wind_turbine_siting(
  dem="dem.tif",
  settlements="settlements.gpkg",
  sweep_spec_json=json.dumps(sweep_spec),
  output_prefix="wind_sweep",
)

Complete Terrain Analysis Pipeline

The following script assembles a typical geomorphometric analysis workflow moving from raw DEM conditioning through a suite of first- and second-order derivatives and terrain classification outputs.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.verbose = True

# --- 1. Read and inspect input ---
dem = wbe.read_raster('dem_raw.tif')
print(dem.metadata())

# --- 2. Fill missing data (voids from nodata gaps) ---
dem_filled = wbe.terrain.general.fill_missing_data(dem, filter_size=11)

# --- 3. Multiscale feature-preserving smoothing ---
dem_smooth = wbe.run_tool(
  'feature_preserving_smoothing_multiscale',
  {
    'input': dem_filled,
    'smoothing_amount': 0.65,
    'edge_preservation': 0.80,
    'scale_levels': 4,
    'fidelity': 0.45,
  }
)
wbe.write_raster(dem_smooth, 'dem_smooth_multiscale.tif')

# --- 4. First-order derivatives ---
slope   = wbe.terrain.derivatives.slope(dem_smooth, units='degrees')
aspect  = wbe.terrain.derivatives.aspect(dem_smooth)
hillshade = wbe.terrain.general.multidirectional_hillshade(dem_smooth)

wbe.write_raster(slope,     'slope.tif')
wbe.write_raster(aspect,    'aspect.tif')
wbe.write_raster(hillshade, 'hillshade.tif')

# --- 5. Curvatures ---
prof_curv = wbe.terrain.derivatives.profile_curvature(dem_smooth)
plan_curv = wbe.terrain.derivatives.plan_curvature(dem_smooth)
mean_curv = wbe.terrain.derivatives.mean_curvature(dem_smooth)

wbe.write_raster(prof_curv, 'curvature_profile.tif')
wbe.write_raster(plan_curv, 'curvature_plan.tif')
wbe.write_raster(mean_curv, 'curvature_mean.tif')

# --- 6. Terrain position ---
dev = wbe.terrain.general.deviation_from_mean_elevation(dem_smooth, filterx=11, filtery=11)
wbe.write_raster(dev, 'deviation_from_mean_elev.tif')

# --- 7. Landform classification ---
geomorphons = wbe.terrain.landform_indices.geomorphons(dem_smooth, search=50, threshold=1.0, flat=1.0)
wbe.write_raster(geomorphons, 'geomorphons.tif')

# --- 8. Multiscale roughness ---
roughness, roughness_scale = wbe.terrain.multiscale_signatures.multiscale_roughness(
    dem_smooth, min_scale=1, max_scale=50, step=1
)
wbe.write_raster(roughness, 'multiscale_roughness.tif')

# --- 9. Visibility ---
svf = wbe.terrain.visibility.sky_view_factor(dem_smooth, num_directions=16, max_dist=200.0)
wbe.write_raster(svf, 'sky_view_factor.tif')

print("Terrain analysis complete.")

Ridges and Valleys

Topographic ridges and channels are fundamental landform elements. Whitebox provides tools to extract them directly as vector features.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem_smooth.tif')

# Extract ridges
ridges = wbe.terrain.general.find_ridges(dem, line_thin=True)
wbe.write_raster(ridges, 'ridges.tif')

# Extract ridge and valley lines as vectors
ridge_lines = wbe.terrain.general.ridge_and_valley_vectors(dem)
wbe.write_vector(ridge_lines, 'ridge_valley_lines.gpkg')

For hydrographic applications, channel extraction is usually better served through the hydrology toolset (flow accumulation-based extraction), but ridge-line extraction complements watershed delineation by explicitly mapping the divide network.


Embankment and Road Mapping

In agricultural, infrastructure, and floodplain contexts, anthropogenic embankments (dykes, levees, road fills) appear as linear elevated features on DEMs. Whitebox provides dedicated tools for their detection:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem_lidar.tif')

# Map elevated linear features such as road embankments and dykes
embankments = wbe.terrain.general.embankment_mapping(
    dem,
    road_vec='roads.shp',
    search_dist=2.5,
    min_road_width=6.0,
    typical_width=30.0,
    max_height=2.5,
    spillout_slope=4.0,
    max_remove_fs_slope=0.1,
    min_bench_width=6.0,
    clean_up=True,
    output_type='filtered DEM'
)
wbe.write_raster(embankments, 'dem_embankments_filtered.tif')

Summary

Terrain analysis in WbW-Py follows a layered depth model:

  • First-order work: slope, aspect, hillshade are fast and interpretable.
  • Curvature analysis: adds information about surface bending and is critical for hydrological modelling, mass movement susceptibility, and landform process inference.
  • Classification: geomorphons and multiscale position methods provide stable, interpretable landform maps without manually tuned thresholds.
  • Multiscale methods: reveal scale-dependent structure that single-scale derivatives miss.
  • Visibility and openness: support solar, ecological, and landscape planning applications.

For most projects the right progression is: smooth → first-order derivatives → curvatures → classification → application-specific outputs.


Tips

  • Pre-process your DEM: Remove spikes, pits, and fill sinks before computing derivatives. Use breach_depressions_least_cost() (preserves real depressions) or fill_depressions() (infills) depending on your application.
  • Resolution matters: Coarse DEMs (e.g. 30 m) are smoothed and may miss local process features. Fine DEMs (e.g. 1 m LiDAR) can be noisy. Choose resolution to match your process scale.
  • Flow direction algorithms: D8 (8-directional) is faster but can cause artificial flow alignments. D-infinity and Dinf-Rho distribute flow more naturally and are preferred for continuous analyses.
  • Curvature is scale-dependent: Always compute curvature at the scale matching your DEM resolution. A 10 m window on a 1 m DEM can overinterpret noise; a 1 m window on a 30 m DEM misses important structure.
  • Multiscale position classification: Run classification at 3–5 scales (e.g. local, neighbourhood, regional) and examine layer coherence. Inconsistent multi-scale patterns suggest model overfitting.
  • Viewshed validation: Viewshed results are sensitive to DEM quality and observer height assumptions. Always validate against ground observation or high-res ortho imagery.
  • Hydrological thresholds are empirical: Contributing area thresholds for stream initiation vary by geology and climate (typically 0.5–5 km²). Calibrate against observed stream networks.
  • Openness and exposure: Sky-view factor (SVF) and terrain openness support better hillshading and visibility assessment than raw slope or aspect. Use for visual interpretation and photogrammetry.

Spatial Hydrology

Spatial hydrology covers the analysis of water movement and accumulation across the land surface using DEMs. It is one of Whitebox's deepest specializations and spans DEM conditioning, flow routing, watershed delineation, stream network extraction, and advanced connectivity and storage modelling.

This chapter progresses from fundamental concepts through a full watershed analysis pipeline, including advanced topics such as probabilistic depression analysis and hydrologic connectivity.


Core Concepts

The Hydrologic DEM Problem

Most raw DEMs contain surface depressions — cells or groups of cells enclosed by higher neighbours — that are real topographic basins, artefacts of acquisition noise, or pits introduced by interpolation. Standard single-flow- direction routing (D8) cannot route water out of these depressions. Before flow routing, the DEM must be conditioned to remove spurious depressions while preserving genuine ones (such as lakes and wetlands).

The two principal conditioning strategies are:

Filling: raises cells within each depression up to the spill elevation. This guarantees flat, drained surfaces but can significantly alter elevations and produce large flat areas that require secondary gradient enforcement.

Breaching: cuts a narrow channel of minimum cost through the barrier surrounding each depression, routing water to the nearest outside outlet. This preserves more of the original elevation field and is usually preferred when breachable barriers exist (roads, levees, embankments).

In practice, a hybrid approach — breach where possible, fill remaining depressions — is often optimal.

Flow Direction Algorithms

After DEM conditioning, a flow direction raster encodes which direction water flows from each cell. Whitebox supports multiple algorithms, each with different properties:

AlgorithmFlow typeBest use
D8Single (deterministic)Channel delineation, simple watersheds
Rho8Single (stochastic)Reduces D8 directional bias
D-infinity (DInf)Multiple (proportional split)Hillslope flux, dispersive flow
FD8Multiple (proportional)Shallow overland flow modelling
MD-infinityMultipleSubsurface/soil moisture modelling
Quinn et al. (FD8 variant)MultipleWetness index, slope-area workflows
Minimal Dispersion Flow AlgorithmAdaptiveBalance between D8 and DInf

DEM Conditioning

Breach Depressions

breach_depressions_least_cost is the recommended first-pass conditioning tool. It minimises the total vertical cost of breaching while constraining each breach path to a maximum length and depth.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

dem = wbe.read_raster('dem_raw.tif')

dem_breached = wbe.hydrology.depressions_storage.breach_depressions_least_cost(
    dem,
    dist=50,           # maximum breach channel length in cells
    max_cost=None,     # no cost ceiling (breach wherever needed)
    min_dist=True,     # prefer shorter breach paths
    flat_increment=None,
    fill_deps=True     # fill any remaining unbreachable depressions
)
wbe.write_raster(dem_breached, 'dem_conditioned.tif')

If your study area contains roads or embankments that act as real barriers to flow, consider mapping them as embankments first and then burning them into the DEM:

dem_burned = wbe.hydrology.depressions_storage.topological_breach_burn(dem, streams='streams_mapped.shp')
dem_conditioned = wbe.hydrology.depressions_storage.breach_depressions_least_cost(dem_burned, dist=50, fill_deps=True)

Fill Depressions

When you need guaranteed flat-free surfaces or are processing coarse-resolution DEMs where breaching introduces artefacts, full filling is appropriate:

dem_filled = wbe.hydrology.depressions_storage.fill_depressions(dem, flat_increment=0.001)
wbe.write_raster(dem_filled, 'dem_filled.tif')

The flat_increment adds a tiny gradient across flat areas to enable downstream flow routing.

Single-Cell Pit Removal

For DEMs that are mostly clean but contain isolated single-cell pits (from radiometric noise in LiDAR interpolation), a lightweight pre-pass avoids unnecessary full conditioning:

dem_pitless = wbe.hydrology.depressions_storage.fill_pits(dem)
dem_conditioned = wbe.hydrology.depressions_storage.breach_depressions_least_cost(dem_pitless, dist=50, fill_deps=True)

Flow Direction

D8 Flow Pointer

D8 assigns each cell a direction code to its steepest neighbour. The result is an integer pointer raster used by many downstream tools.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem_conditioned.tif')

d8_pntr = wbe.hydrology.flow_routing.d8_pointer(dem)
wbe.write_raster(d8_pntr, 'd8_pointer.tif')

D-Infinity Flow Pointer

DInf partitions flow between two downslope neighbours according to the slope angle, producing a continuous flow direction that reduces the directional bias of D8:

dinf_pntr = wbe.hydrology.flow_routing.dinf_pointer(dem)
wbe.write_raster(dinf_pntr, 'dinf_pointer.tif')

Flow Accumulation

Flow accumulation counts (or accumulates weighted values of) the upstream contributing area draining to each cell. It is the primary tool for identifying stream channels, delineating watersheds, and computing topographic wetness indices.

Basic D8 Flow Accumulation

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem_conditioned.tif')

d8_pntr = wbe.hydrology.flow_routing.d8_pointer(dem)

# Cell-count accumulation
flow_accum = wbe.hydrology.flow_routing.d8_flow_accum(d8_pntr, out_type='cells')
wbe.write_raster(flow_accum, 'flow_accum_d8.tif')

# Log-transform for visualisation (high dynamic range)
import math
flow_accum_log = flow_accum.log2()   # WbW raster operator
wbe.write_raster(flow_accum_log, 'flow_accum_d8_log.tif')

Specific Contributing Area (D-Infinity)

Specific contributing area normalises by cell width and is used in wetness index and erosion models:

dinf_pntr = wbe.hydrology.flow_routing.dinf_pointer(dem)
sca = wbe.hydrology.flow_routing.dinf_flow_accum(dinf_pntr, input_is_pointer=True, out_type='sca')
wbe.write_raster(sca, 'specific_contributing_area.tif')

Topographic Wetness Index (TWI)

TWI = ln(SCA / tan(slope)) is a widely used proxy for soil moisture and saturated area:

import whitebox_workflows as wbw
import math

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem_conditioned.tif')

dinf_pntr = wbe.hydrology.flow_routing.dinf_pointer(dem)
sca       = wbe.hydrology.flow_routing.dinf_flow_accum(dinf_pntr, out_type='sca')
slope_rad = wbe.terrain.derivatives.slope(dem, units='radians')

# Avoid log(0) by clamping minimum SCA
sca_clamped = sca.max(0.001)
slope_clamped = slope_rad.max(0.001)

twi = (sca_clamped / slope_clamped.tan()).log()
wbe.write_raster(twi, 'twi.tif')

Stream Network Extraction

Stream channels appear in the flow accumulation raster as cells with very large contributing areas. Thresholding the accumulation surface reveals the channel network.

Simple Threshold Extraction

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

d8_pntr   = wbe.read_raster('d8_pointer.tif')
flow_accum = wbe.read_raster('flow_accum_d8.tif')

# Extract stream cells above threshold (contributing area in cells)
streams = wbe.streams.network_extraction.extract_streams(flow_accum, threshold=1000.0)
wbe.write_raster(streams, 'streams_raster.tif')

# Convert to vector lines
stream_vec = wbe.streams.network_extraction.raster_streams_to_vector(streams, d8_pntr)
wbe.write_vector(stream_vec, 'streams.gpkg')

Stream links divide the network into segments between junctions, enabling per-segment statistics and Strahler order computation:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
streams   = wbe.read_raster('streams_raster.tif')
d8_pntr   = wbe.read_raster('d8_pointer.tif')

# Assign Strahler order to each channel cell
strahler = wbe.streams.ordering_metrics.strahler_stream_order(d8_pntr, streams)
wbe.write_raster(strahler, 'strahler_order.tif')

# Hack stream order (less sensitive to stubs)
hack = wbe.streams.ordering_metrics.hack_stream_order(d8_pntr, streams)
wbe.write_raster(hack, 'hack_order.tif')

# Horton's drainage composition ratios
horton_ratios = wbe.streams.ordering_metrics.horton_ratios(d8_pntr, streams)
print(horton_ratios)   # returns bifurcation ratio, length ratio, area ratio

Shreve and Topological Orders

For large network analysis requiring consistent junction-based numbering:

shreve = wbe.streams.ordering_metrics.shreve_stream_magnitude(d8_pntr, streams)
wbe.write_raster(shreve, 'shreve_magnitude.tif')

topo_order = wbe.streams.ordering_metrics.topological_stream_order(d8_pntr, streams)
wbe.write_raster(topo_order, 'topological_order.tif')

Watershed Delineation

Single-Outlet Watershed

Delineate the complete contributing area upstream of a single outlet point:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

d8_pntr   = wbe.read_raster('d8_pointer.tif')
outlet_pts = wbe.read_vector('outlet.shp')

watershed = wbe.hydrology.watersheds_basins.watershed(d8_pntr, outlet_pts)
wbe.write_raster(watershed, 'watershed.tif')

Multiple Outlet Points / Nested Basins

unnest_basins handles overlapping contributing areas when working with multiple outlets in a nested configuration:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
d8_pntr   = wbe.read_raster('d8_pointer.tif')
outlets   = wbe.read_vector('outlets_multiple.shp')

# Each outlet gets a unique basin ID; nested basins properly accounted for
nested = wbe.hydrology.watersheds_basins.unnest_basins(d8_pntr, outlets)
wbe.write_raster(nested, 'nested_watersheds.tif')

Subbasin Delineation

Delineate subbasins for all channel junctions simultaneously:

subbasins = wbe.hydrology.watersheds_basins.subbasins(d8_pntr, streams)
wbe.write_raster(subbasins, 'subbasins.tif')

Terrain Hydrologic Indices

Elevation Above Stream

Measures the vertical distance of each cell above the nearest channel, useful for floodplain mapping and valley-bottom delineation:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

dem     = wbe.read_raster('dem_conditioned.tif')
streams = wbe.read_raster('streams_raster.tif')
d8_pntr = wbe.read_raster('d8_pointer.tif')

height_above_stream = wbe.hydrology.hydrologic_indices.elevation_above_stream(dem, streams)
wbe.write_raster(height_above_stream, 'height_above_stream.tif')

Cells with values below 1.0 or 2.0 metres are candidate floodplain cells under typical storm recurrence intervals.

Downslope Distance to Stream

How far (path-following the flow network) is each cell from the nearest channel?

dist_to_stream = wbe.hydrology.hydrologic_indices.downslope_distance_to_stream(
    d8_pntr, streams, dist_type='path'
)
wbe.write_raster(dist_to_stream, 'distance_to_stream.tif')

Advanced: Stochastic Depression Analysis

Real DEMs have vertical uncertainty. A depression that appears in one DEM may not appear once that uncertainty is accounted for. stochastic_depression_ analysis runs a Monte Carlo simulation across an ensemble of stochastically perturbed DEMs to estimate the probability that each cell is part of a real depression rather than a noise artefact.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem = wbe.read_raster('dem.tif')

# rmse: vertical uncertainty of the DEM (e.g. 0.15 m for good LiDAR)
# range: spatial autocorrelation range of the error field in map units
# iterations: number of Monte Carlo realisations
depression_prob = wbe.hydrology.depressions_storage.stochastic_depression_analysis(
    dem,
    rmse=0.15,
    range=5.0,
    iterations=100
)
wbe.write_raster(depression_prob, 'depression_probability.tif')

Cells with probability > 0.5 are more likely to be real basins than artefacts. This output can drive a probability-weighted conditioned DEM or inform uncertainty quantification in flood inundation modelling.


Advanced: Hydrologic Connectivity

hydrologic_connectivity computes the probability (or frequency) that each cell contributes runoff to the watershed outlet, accounting for temporary storage and variable source areas:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem     = wbe.read_raster('dem_conditioned.tif')
streams = wbe.read_raster('streams_raster.tif')

connectivity = wbe.hydrology.hydrologic_indices.hydrologic_connectivity(dem, streams)
wbe.write_raster(connectivity, 'hydrologic_connectivity.tif')

Highly connected areas (near streams, with steep contributing slopes) respond quickly to rainfall; low-connectivity areas (flats, depressions, wetlands) buffer the hydrological response.


Advanced: Impoundment Analysis

Impoundment modelling simulates the water surface extent and volume that would result from a dam or weir at a specified location on a stream:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
dem     = wbe.read_raster('dem_conditioned.tif')
streams = wbe.read_raster('streams_raster.tif')

# impoundment height above stream bed in metres
impoundment_area = wbe.hydrology.depressions_storage.impoundment_size_index(
    dem, streams, damlength=1000.0
)
wbe.write_raster(impoundment_area, 'impoundment_size_index.tif')

Full Watershed Analysis Script

This end-to-end script conditions a DEM, routes flow, extracts a stream network, delineates a watershed, and produces a suite of hydrologic derivatives in a single automated workflow.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.verbose = True

# --- Inputs ---
dem_path    = 'dem_raw.tif'
outlet_path = 'outlet_point.shp'
streams_threshold = 2000   # cells (adjust to DEM resolution)

# --- 1. Read DEM ---
dem = wbe.read_raster(dem_path)

# --- 2. Fill missing data ---
dem = wbe.terrain.general.fill_missing_data(dem, filter_size=11)

# --- 3. Smooth ---
dem = wbe.terrain.general.feature_preserving_smoothing(dem, filter_size=11, num_iter=2)
wbe.write_raster(dem, 'dem_smooth.tif')

# --- 4. Condition DEM ---
dem_cond = wbe.hydrology.depressions_storage.breach_depressions_least_cost(dem, dist=50, fill_deps=True)
wbe.write_raster(dem_cond, 'dem_conditioned.tif')

# --- 5. Flow direction ---
d8_pntr  = wbe.hydrology.flow_routing.d8_pointer(dem_cond)
dinf_pntr = wbe.hydrology.flow_routing.dinf_pointer(dem_cond)
wbe.write_raster(d8_pntr, 'd8_pointer.tif')

# --- 6. Flow accumulation ---
flow_accum = wbe.hydrology.flow_routing.d8_flow_accum(d8_pntr, out_type='cells')
wbe.write_raster(flow_accum, 'flow_accum.tif')

# --- 7. Streams ---
streams = wbe.streams.network_extraction.extract_streams(flow_accum, threshold=streams_threshold)
wbe.write_raster(streams, 'streams.tif')
stream_vec = wbe.streams.network_extraction.raster_streams_to_vector(streams, d8_pntr)
wbe.write_vector(stream_vec, 'streams.gpkg')

# --- 8. Stream order ---
strahler = wbe.streams.ordering_metrics.strahler_stream_order(d8_pntr, streams)
wbe.write_raster(strahler, 'strahler_order.tif')

# --- 9. Watershed ---
outlet_pts = wbe.read_vector(outlet_path)
watershed  = wbe.hydrology.watersheds_basins.watershed(d8_pntr, outlet_pts)
wbe.write_raster(watershed, 'watershed.tif')

# --- 10. TWI ---
sca         = wbe.hydrology.flow_routing.dinf_flow_accum(dinf_pntr, out_type='sca')
slope_rad   = wbe.terrain.derivatives.slope(dem_cond, units='radians')
sca_c       = sca.max(0.001)
slope_c     = slope_rad.max(0.001)
twi         = (sca_c / slope_c.tan()).log()
wbe.write_raster(twi, 'twi.tif')

# --- 11. Height above stream ---
height_above = wbe.hydrology.hydrologic_indices.elevation_above_stream(dem_cond, streams)
wbe.write_raster(height_above, 'height_above_stream.tif')

print("Watershed analysis pipeline complete.")

Summary

Hydrological analysis in WbW-Py follows a clear preparation → routing → network → delineation progression:

  1. Condition the DEM (breach + fill) to ensure topologically correct flow.
  2. Route flow using the algorithm appropriate to your application (D8 for channels, DInf for hillslope flux).
  3. Accumulate flow and extract the stream network by thresholding.
  4. Delineate watersheds and subbasins from outlets or channel junctions.
  5. Compute terrain hydrologic indices (TWI, height above stream, connectivity).
  6. Advance to probabilistic or connectivity analysis for uncertainty quantification and dynamic source area modelling.

LiDAR Processing

LiDAR (Light Detection and Ranging) produces dense 3D point clouds from airborne or terrestrial laser scanning. Each point has a position (X, Y, Z), return number, intensity, classification, and optionally colour, waveform, and timestamp. Whitebox has one of the most comprehensive LiDAR processing pipelines available in any open or commercial GIS platform, covering the full chain from raw point cloud inspection through ground filtering, gridding, structural analysis, and individual tree segmentation.


LiDAR Data Model

A LiDAR file (LAS or LAZ) is a structured binary format with:

  • Header: file-level metadata — bounding box, CRS, point count, record format version, generating software, and global statistics.
  • Point records: per-point X/Y/Z, return metadata, intensity, classification (ground, low vegetation, buildings, water, etc.).
  • Variable length records (VLRs): CRS definitions, waveform lookups, extra byte descriptions, and custom metadata.

Modern point clouds can also be stored in COPC (Cloud-Optimized Point Cloud) format — an indexed, HTTP-range-request-friendly LAS-in-COPC wrapper — and in E57 (the ASTM exchange format) or PLY. Whitebox supports all of these through wblidar.


Core Concepts

Before processing LiDAR data, understand these foundational terms:

  • Return number: Which reflection from a single laser pulse. Pulse 1 is the first (often canopy top); pulse 2–5 capture midstory and ground returns.
  • Point classification: ASPRS standard categories — ground (2), low veg (3), medium veg (4), high veg (5), buildings (6), noise (7), overlap (12), and others.
  • Intensity: Reflectance value (0–65535) proportional to target brightness. Useful for vegetation density estimation and water detection.
  • Ground filtering: Separating terrain points (classification 2) from vegetation and buildings; critical for accurate digital terrain models (DTMs).
  • Digital terrain model (DTM): Raster surface of bare earth, computed from ground returns only. Used for hydrology, geomorphometry, and flood modelling.
  • Digital surface model (DSM): Raster surface of highest returns (canopy top). Used for building detection and volume calculations.
  • Canopy height model (CHM): DSM minus DTM; represents vegetation height above ground. Standard input for tree detection and segmentation.
  • Point density: Points per square unit (typically points/m²). Higher density enables finer segmentation; lower density requires smoothing.
  • Normalization: Converting raw Z-values to height-above-ground by subtracting DTM, creating a normalized point cloud for structural analysis.

Reading and Inspecting LiDAR Data

Reading a Single File

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

lidar = wbe.read_lidar('survey_tile.laz')
header = lidar.metadata()

print(f"Points:     {header.number_of_points}")
print(f"Bounds:     {header.min_x:.2f}, {header.min_y:.2f} to {header.max_x:.2f}, {header.max_y:.2f}")
print(f"Z range:    {header.min_z:.2f} – {header.max_z:.2f}")
print(f"Point fmt:  {header.point_format}")
print(f"CRS:        {header.wkt}")

Inspecting Individual Point Records

You can iterate over point records for custom filtering or analysis. For large files prefer the NumPy bridge or chunked streaming (see below).

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
lidar = wbe.read_lidar('tile.laz')

ground_count = 0
for i in range(lidar.header.number_of_points):
    pd, time, colour, waveform = lidar.get_point_record(i)
    if pd.classification == 2:   # 2 = ground in ASPRS classification
        ground_count += 1

print(f"Ground points: {ground_count}")

NumPy Bridge

For vectorised analysis, convert the entire point cloud to a structured NumPy array:

import whitebox_workflows as wbw
import numpy as np

wbe = wbw.WbEnvironment()
lidar = wbe.read_lidar('tile.laz')

arr = lidar.to_numpy(['x', 'y', 'z', 'intensity', 'classification'])

# Z statistics for ground points
ground_z = arr['z'][arr['classification'] == 2]
print(f"Ground Z mean: {ground_z.mean():.2f}")
print(f"Ground Z std:  {ground_z.std():.2f}")

Tile Footprints

When working with a tiled survey, produce a vector index of tile footprints before processing:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

footprints = wbe.lidar.interpolation_gridding.lidar_tile_footprint(
    input_directory='lidar_tiles/',
    output='tile_index.gpkg'
)
wbe.write_vector(footprints, 'tile_index.gpkg')

Ground Point Filtering

Separating ground returns from vegetation, buildings, and other objects is the most critical pre-processing step for DEM generation. Whitebox provides several algorithms.

Progressive Morphological Filter (PMF) / IMProved Ground Point Filter

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
lidar = wbe.read_lidar('raw_tile.laz')

# Classify ground returns using the improved ground point filter
lidar_classified = wbe.lidar.filtering_classification.improved_ground_point_filter(
    lidar,
    radius=0.5,       # initial search radius in metres
    min_z_range=0.1,  # minimum Z variation to be non-ground
    max_z_range=30.0  # maximum above-ground feature height
)

wbe.write_lidar(lidar_classified, 'tile_classified.laz')

Filtering by Percentile

Extract the subset of points that fall below a chosen height percentile within each local neighbourhood — useful for identifying near-ground returns without full classification:

near_ground = wbe.lidar.filtering_classification.filter_lidar_by_percentile(
    lidar,
    radius=2.0,
    percentile=5.0   # lowest 5% of heights locally
)
wbe.write_lidar(near_ground, 'near_ground_returns.laz')

Filtering by Reference Surface

Remove points above (or below) a reference raster surface by a tolerance:

ground_ref = wbe.read_raster('dtm_existing.tif')

filtered = wbe.lidar.filtering_classification.filter_lidar_by_reference_surface(
    lidar,
    reference=ground_ref,
    threshold=0.5,   # metres above reference surface
    criterion='above'
)
wbe.write_lidar(filtered, 'below_reference.laz')

Gridding: From Point Cloud to Raster

Gridding (or interpolation) converts the discrete point cloud into a continuous raster surface.

DTM — Digital Terrain Model

Interpolate from ground-classified returns to produce the bare-earth surface:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
lidar = wbe.read_lidar('tile_classified.laz')

dtm = wbe.lidar.interpolation_gridding.lidar_tin_gridding(
    lidar,
    parameter='elevation',
    returns_included='last',   # last returns best represent ground
    resolution=1.0,            # output cell size in metres
    exclude_cls='0,1,3,4,5,6,7,8,9'  # exclude all non-ground classes
)
wbe.write_raster(dtm, 'dtm_1m.tif')

TIN gridding triangulates the ground returns and interpolates elevations to the output raster grid, preserving break-lines and reducing artefacts relative to simple nearest-neighbour methods.

DSM — Digital Surface Model

The DSM captures the highest return in each cell, representing the top of the vegetation and building canopy:

dsm = wbe.lidar_pointdensity(
    lidar,
    parameter='elevation',
    returns_included='first',
    resolution=1.0
)
# Or use the high-point gridding:
dsm = wbe.lidar.interpolation_gridding.lidar_tin_gridding(
    lidar,
    parameter='elevation',
    returns_included='first',
    resolution=1.0
)
wbe.write_raster(dsm, 'dsm_1m.tif')

Canopy Height Model (CHM)

CHM = DSM − DTM, expressing the height of above-ground objects:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

dtm = wbe.read_raster('dtm_1m.tif')
dsm = wbe.read_raster('dsm_1m.tif')

chm = dsm - dtm
chm = chm.max(0.0)    # Prevent negative CHM values from DEM mismatch

wbe.write_raster(chm, 'chm_1m.tif')

Intensity Surface

Intensity records the strength of the laser return and is related to surface reflectance. Gridding intensity produces a pseudo-imagery product useful for feature detection:

intensity = wbe.lidar.interpolation_gridding.lidar_tin_gridding(
    lidar,
    parameter='intensity',
    returns_included='all',
    resolution=1.0
)
wbe.write_raster(intensity, 'intensity_1m.tif')

Point Density

Understanding point density helps identify data quality issues (sparse areas, flight overlap gaps) before gridding:

density = wbe.lidar_pointdensity(lidar, resolution=1.0)
wbe.write_raster(density, 'point_density.tif')

Multiple-Tile Workflows

Large surveys are tiled and must be processed together. Whitebox supports batch processing and tile joining.

Reading Multiple Tiles

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

tiles = wbe.read_lidars([
    'tiles/tile_001.laz',
    'tiles/tile_002.laz',
    'tiles/tile_003.laz',
])

Joining Tiles

merged = wbe.lidar.io_management.lidar_join(tiles)
wbe.write_lidar(merged, 'merged.laz')

Tiling Output

After processing a large cloud, re-tile for downstream storage:

wbe.lidar.io_management.lidar_tile(
    lidar,
    width=500.0,   # tile width in map units
    height=500.0,
    origin_x=0.0,
    origin_y=0.0,
    output_directory='retiled/'
)

Normalisation (Height Above Ground)

Normalised point clouds express each point's Z as height above the ground surface rather than absolute elevation — essential for vegetation metrics.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

lidar = wbe.read_lidar('tile_classified.laz')
dtm   = wbe.read_raster('dtm_1m.tif')

normalised = wbe.normalise_lidar(lidar, dtm)
wbe.write_lidar(normalised, 'tile_normalised.laz')

After normalisation: ground returns are at Z ≈ 0, understorey vegetation at 1–5 m, mid-storey at 5–20 m, and canopy top at 20+ m (depending on forest type).


Individual Tree Segmentation

Individual tree segmentation identifies and delineates individual tree crowns from the CHM or normalised point cloud. This is used in forestry inventory, carbon stock estimation, and urban tree management.

CHM-Based Segmentation

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
chm = wbe.read_raster('chm_smooth.tif')

tree_polygons = wbe.lidar.analysis_metrics.individual_tree_detection(
    chm,
    min_search_radius=1.0,   # minimum crown radius in metres
    min_height=2.0,           # ignore vegetation below this height
    max_search_radius=10.0
)
wbe.write_vector(tree_polygons, 'tree_crowns.gpkg')

A smooth CHM substantially improves segmentation quality — apply feature_preserving_smoothing or a moderate Gaussian to the raw CHM first:

chm_raw    = wbe.read_raster('chm_1m.tif')
chm_smooth = wbe.remote_sensing.filters.gaussian_filter(chm_raw, sigma=0.75)
wbe.write_raster(chm_smooth, 'chm_smooth.tif')

Point-Cloud-Based Segmentation

For higher-detail work, segment directly in 3D:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
lidar_norm = wbe.read_lidar('tile_normalised.laz')

tree_clouds = wbe.lidar.filtering_classification.lidar_segmentation(
    lidar_norm,
    radius=1.5,
    min_z_range=2.0
)
wbe.write_lidar(tree_clouds, 'trees_segmented.laz')

Structural Metrics

Canopy Cover and Closure

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
lidar = wbe.read_lidar('tile_normalised.laz')

# Canopy cover: fraction of cells with returns above height threshold
cover = wbe.lidar_canopy_cover(lidar, threshold=2.0, resolution=20.0)
wbe.write_raster(cover, 'canopy_cover_20m.tif')

Height Percentiles and Mean Height

p75 = wbe.lidar.filtering_classification.lidar_elevation_slice(lidar, min_h=0.0, max_h=None, cls=1)
wbe.write_raster(p75, 'canopy_p75.tif')

Building Detection

LiDAR height and planarity cues allow detection of building footprints:

wbe = wbw.WbEnvironment()
lidar = wbe.read_lidar('urban_tile.laz')
dtm   = wbe.read_raster('dtm_urban.tif')

buildings = wbe.lidar_building_detection(
    lidar,
    dtm,
    min_height=3.0,
    min_area=25.0
)
wbe.write_vector(buildings, 'detected_buildings.gpkg')

Chunked Lidar Streaming

For very large LiDAR datasets that exceed available RAM, WbW-Py supports chunked streaming via the NumPy bridge. Each chunk is a fixed-size window over the point array:

import whitebox_workflows as wbw
import numpy as np

wbe = wbw.WbEnvironment()
lidar = wbe.read_lidar('large_survey.copc.laz')

first_return_z = []

for chunk in lidar.to_numpy_chunks(['x', 'y', 'z', 'return_number'], chunk_size=500_000):
    mask = chunk['return_number'] == 1
    first_return_z.append(chunk['z'][mask])

all_first_z = np.concatenate(first_return_z)
print(f"First-return Z mean: {all_first_z.mean():.2f}")

Full Ground-Filter-to-DTM Pipeline

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.verbose = True

# --- 1. Read raw tile ---
lidar = wbe.read_lidar('raw_flight.laz')
print(lidar.metadata())

# --- 2. Remove outlier points ---
lidar_clean = wbe.lidar.filtering_classification.lidar_elevation_slice(lidar, min_h=-2.0, max_h=3000.0)

# --- 3. Ground point filtering ---
lidar_classified = wbe.lidar.filtering_classification.improved_ground_point_filter(lidar_clean, radius=0.5)
wbe.write_lidar(lidar_classified, 'classified.laz')

# --- 4. DTM ---
dtm = wbe.lidar.interpolation_gridding.lidar_tin_gridding(
    lidar_classified,
    parameter='elevation',
    returns_included='last',
    resolution=0.5,
    exclude_cls='0,1,3,4,5,6,7,8,9'
)
dtm = wbe.terrain.general.fill_missing_data(dtm, filter_size=11)
wbe.write_raster(dtm, 'dtm_0.5m.tif')

# --- 5. DSM ---
dsm = wbe.lidar.interpolation_gridding.lidar_tin_gridding(
    lidar_classified,
    parameter='elevation',
    returns_included='first',
    resolution=0.5
)
wbe.write_raster(dsm, 'dsm_0.5m.tif')

# --- 6. CHM ---
chm = (dsm - dtm).max(0.0)
wbe.write_raster(chm, 'chm_0.5m.tif')

# --- 7. Normalise ---
lidar_norm = wbe.normalise_lidar(lidar_classified, dtm)
wbe.write_lidar(lidar_norm, 'normalised.laz')

# --- 8. Tree segmentation ---
chm_smooth = wbe.remote_sensing.filters.gaussian_filter(chm, sigma=0.75)
trees = wbe.lidar.analysis_metrics.individual_tree_detection(chm_smooth, min_search_radius=1.0, min_height=2.0)
wbe.write_vector(trees, 'tree_crowns.gpkg')

print(f"Detected {trees.num_features()} tree crowns.")

WbW-Pro Spotlight: LiDAR Change and Disturbance Analysis

  • Problem: Compare repeat LiDAR epochs to detect disturbance in a repeatable way.
  • Tool: lidar_change_and_disturbance_analysis
  • Typical inputs: Baseline tile set, monitoring tile set, output resolution, minimum change threshold.
  • Typical outputs: Change rasters plus summary metrics for affected area, hotspot intensity, and QA review.
import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

result = wbe.run_tool(
    'lidar_change_and_disturbance_analysis',
    {
        'baseline_tiles': '/data/lidar_epoch_2022/',
        'monitor_tiles':  '/data/lidar_epoch_2025/',
        'resolution': 1.0,
        'min_change_m': 0.5,
        'output_prefix': 'disturbance_2025_vs_2022'
    }
)
print(result)

Note: This workflow requires a WbEnvironment initialized with a valid Pro licence.


Summary

LiDAR processing in WbW-Py covers the full pipeline:

  1. Inspect raw point cloud headers and per-file statistics.
  2. Filter outliers and classify ground returns.
  3. Grid ground, first returns, and intensity to produce DTM, DSM, and CHM.
  4. Normalise for vegetation structure analysis.
  5. Segment individual trees from the CHM or point cloud.
  6. Compute structural metrics at plot or landscape scale.
  7. Stream chunked data for very large surveys.

The high-performance wblidar backend supports LAS 1.0–1.5, LAZ, COPC, E57, and PLY natively, with no external dependencies or codec wrappers.


Tips

  • Choose your format wisely: LAS is universal and compact; LAZ adds compression and is ideal for archival or transmission. COPC is cloud-optimized and best for remote HTTP range-request access. Use LAZ or LAS for terrestrial/airborne surveys; COPC for cloud-native workflows.
  • Always validate classifications: Use lidar_histogram() and lidar_info() to inspect point distributions by return and classification. Misclassified ground points silently corrupt DTMs and downstream hydrology.
  • DTM vs. DSM vs. CHM: Generate all three at the same resolution so derivatives align. A common pitfall is mixing DTM and DSM resolution.
  • Ground filtering is critical: Outliers and noise (classification 7) should be excluded before gridding. Use lidar_filter_for_ground() to remove spikes and erratic points.
  • Normalization enables vegetation analysis: Always normalize point clouds (subtract DTM) before individual tree detection or canopy structural metrics.
  • Monitor memory for large surveys: Point clouds are memory-intensive. For datasets > 1 GB, use streaming APIs (lidar_read_chunked()) rather than loading the entire tile at once.
  • Coordinate reference systems matter: LAS headers carry CRS as WKT. Verify WKT matches your project CRS before gridding or merging tiles.
  • Density and grid resolution: If point density is < 0.5 pts/m², consider upsampling or smoothing the output grid to avoid isolated pits or peaks.

Remote Sensing Analysis

Remote sensing uses satellite imagery, aerial photography, and drone captures to characterise the Earth's surface. Whitebox Workflows for Python (WbW-Py) provides a comprehensive toolkit for image processing, spectral analysis, dimensionality reduction, supervised and unsupervised classification, change detection, and sensor-specific workflows. Because all tools operate directly on in-memory raster objects, complex multi-step image analysis pipelines can be written as concise Python scripts.

Sensor Bundle First: When working with industry-standard satellite products (Sentinel-2, Landsat, PlanetScope, SAR, and others), open the scene as a sensor bundle rather than loading individual band files by hand. Bundles provide automatic metadata discovery, key-based band access, and one-call true/false-colour composites that work across sensor families without hardcoding file names.


Core Concepts

Remote sensing image analysis requires familiarity with these core ideas:

  • Spectral bands: Distinct wavelength ranges (e.g. blue 450–510 nm, red 620–750 nm, NIR 750–900 nm). Different materials reflect different bands, enabling material discrimination.
  • Spectral indices: Normalized ratios of bands that isolate phenomena. NDVI (Normalized Difference Vegetation Index) uses NIR and red to measure greenness; NDWI uses NIR and SWIR for water content.
  • Spatial resolution: Pixel size in meters (Sentinel-2: 10 m for visible/NIR, 20 m for SWIR; Landsat: 30 m; SAR: 5–30 m depending on mode).
  • Temporal resolution: Revisit interval (Sentinel-2: 5 days; Landsat: 16 days; PlanetScope: daily or higher).
  • Atmospheric effects: Raw radiance is affected by aerosols, water vapour, and ozone. Surface reflectance products are atmospherically corrected; still require relative normalization for multi-date analysis.
  • Cloud masking: Cloud and cloud shadow pixels must be identified and excluded before classification or change detection using quality/QA bands.
  • Supervised classification: Training polygons with known labels teach a model to assign classes to unlabelled pixels. Training quality directly controls classification accuracy.
  • Unsupervised classification: Clustering algorithms (K-means, ISODATA) discover spectral clusters without training labels; useful for exploratory analysis.
  • Change detection: Comparing indices or classifications across dates to identify land-use changes, disturbance, or phenological shifts.

Sensor Bundles — Cross-Family Scene Ingestion

WbW-Py's Bundle class wraps a product folder (zip archive or extracted directory) and exposes a consistent API regardless of sensor family. This is the preferred starting point for any scene-level workflow.

Opening a Bundle

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

# Works with Sentinel-2, Landsat 8/9, PlanetScope, SPOT, SAR, etc.
bundle = wbe.read_bundle('/data/S2B_MSIL2A_20240601T105619_N0510_R094_T32TPT_20240601T142845.SAFE')

# Or from a zip archive — WbW extracts it automatically
bundle = wbe.read_bundle('/data/LC09_L2SP_017030_20240610_20240612_02_T1.tar')

Discovering Available Layers

Bundle surfaces keys for bands, derived measurements, QA/cloud masks, auxiliary layers, and additional assets:

print(bundle.family)               # e.g. 'sentinel2', 'landsat_c2', 'planetscope'
print(bundle.mission)              # e.g. 'Sentinel-2B'
print(bundle.tile_id())            # e.g. '32TPT'
print(bundle.processing_level())   # e.g. 'L2A', 'L2SP'
print(bundle.cloud_cover_percent())
print(bundle.acquisition_datetime_utc())

print(bundle.list_band_keys())        # spectral bands, e.g. ['B02', 'B03', 'B04', 'B08', ...]
print(bundle.list_measurement_keys()) # derived measurements, e.g. ['ndvi', 'ndwi']
print(bundle.list_qa_keys())          # quality/cloud layers, e.g. ['SCL', 'QA_PIXEL']
print(bundle.list_aux_keys())         # auxiliary rasters, e.g. ['AOT', 'WVP']
print(bundle.list_asset_keys())       # additional assets (angles, browse images, etc.)

Reading Bands by Key

# Keys follow sensor-native naming — no need to memorise file paths
blue  = bundle.read_band('B02')   # Sentinel-2 blue
green = bundle.read_band('B03')
red   = bundle.read_band('B04')
nir   = bundle.read_band('B08')
swir1 = bundle.read_band('B11')
swir2 = bundle.read_band('B12')

# Landsat 9 uses different key names
# blue  = bundle.read_band('SR_B2')
# green = bundle.read_band('SR_B3')
# red   = bundle.read_band('SR_B4')
# nir   = bundle.read_band('SR_B5')

# QA / cloud mask
cloud_mask = bundle.read_qa_layer('SCL')      # Sentinel-2 scene classification
# cloud_mask = bundle.read_qa_layer('QA_PIXEL')  # Landsat Collection 2

Quick-Look Composites

Bundles know which keys correspond to red/green/blue/NIR bands for their sensor family, so composites require no band-order bookkeeping:

# Write a true-colour GeoTIFF (auto-enhanced by default)
bundle.true_colour_composite(wbe, output_path='true_colour.tif')

# False-colour (NIR-Red-Green) composite
bundle.false_colour_composite(wbe, output_path='false_colour.tif')

Multi-Sensor Workflow Without Hardcoded Paths

The bundle API is deliberately family-agnostic. The same code body works on Sentinel-2 and Landsat scenes because both expose their bands by semantic key:

def compute_ndvi_from_bundle(wbe, bundle_path, output_path):
    b = wbe.read_bundle(bundle_path)
    if 'ndvi' in b.list_measurement_keys():
        ndvi = b.read_measurement('ndvi')
    else:
        # Fall back to band-based NDVI — works for any optical sensor
        red_band = b.read_band('B04') if b.family == 'sentinel2' else b.read_band('SR_B4')
        nir_band = b.read_band('B08') if b.family == 'sentinel2' else b.read_band('SR_B5')
        ndvi = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(nir_band, red_band)
    wbe.write_raster(ndvi, output_path, compress=True)

compute_ndvi_from_bundle(wbe, '/data/S2B_MSIL2A_20240601.SAFE', 'ndvi_s2.tif')
compute_ndvi_from_bundle(wbe, '/data/LC09_L2SP_017030.tar',     'ndvi_l9.tif')

Working with Individual Band Files

When bundles are not available (e.g., reprojected mosaics, custom composites, legacy archives), load individual band files in the conventional way. All raster objects returned by read_band() and read_raster() are interchangeable.

wbe.working_directory = '/data/sentinel2'

blue   = wbe.read_raster('B02_10m.tif')
green  = wbe.read_raster('B03_10m.tif')
red    = wbe.read_raster('B04_10m.tif')
nir    = wbe.read_raster('B08_10m.tif')

# Sentinel-2 bands 11/12 are 20 m; resample to match 10 m bands before analysis
swir1 = wbe.remote_sensing.enhancement_contrast.resample(wbe.read_raster('B11_20m.tif'), base_raster=red, method='bilinear')
swir2 = wbe.remote_sensing.enhancement_contrast.resample(wbe.read_raster('B12_20m.tif'), base_raster=red, method='bilinear')

Cloud and No-Data Masking

Before any analysis, mask clouds, cloud shadows, and saturated pixels.

# Sentinel-2 SCL classes: 3=cloud shadow, 8=medium cloud, 9=high cloud, 10=thin cirrus
scl = bundle.read_qa_layer('SCL')
cloud_free_red = wbe.raster.general.raster_calculator(
    "if('scl' == 3 or 'scl' == 8 or 'scl' == 9 or 'scl' == 10, nodata, 'red')",
    [scl, red]
)

# Landsat Collection 2 QA_PIXEL — dilated cloud = bit 1, cloud = bit 3, shadow = bit 4
# qa = bundle.read_qa_layer('QA_PIXEL')
# Use raster_calculator with bitwise masking expressions to isolate cloud/shadow pixels

Spectral Indices

Spectral indices are band-ratio transformations that suppress illumination variation and amplify specific surface properties. WbW-Py's normalized_difference_index() is a general-purpose tool for any normalised ratio, while raster_calculator() accommodates arbitrary expressions.

Normalised Difference Vegetation Index (NDVI)

NDVI measures photosynthetically active green biomass:

$$NDVI = \frac{NIR - Red}{NIR + Red}$$

Values range from −1 to +1; healthy vegetation typically exceeds 0.3.

ndvi = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(nir, red)
wbe.write_raster(ndvi, 'ndvi.tif', compress=True)

Common Water, Snow, and Urban Indices

# NDWI (McFeeters 1996) — open water; Green/NIR
ndwi  = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(green, nir)

# MNDWI — better for urban areas; Green/SWIR1
mndwi = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(green, swir1)

# NBR — fire severity and post-fire recovery; NIR/SWIR2
nbr   = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(nir, swir2)

# NDSI — snow and ice; Green/SWIR1
ndsi  = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(green, swir1)

# NDBI — normalised built-up index; SWIR1/NIR
ndbi  = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(swir1, nir)

Enhanced Vegetation Index (EVI)

EVI reduces soil background noise and atmospheric effects that affect NDVI in dense canopies:

$$EVI = 2.5 \cdot \frac{NIR - Red}{NIR + 6 \cdot Red - 7.5 \cdot Blue + 1}$$

evi = wbe.raster.general.raster_calculator(
    expression="2.5 * ('nir' - 'red') / ('nir' + 6.0 * 'red' - 7.5 * 'blue' + 1.0)",
    input_rasters=[nir, red, blue]
)
wbe.write_raster(evi, 'evi.tif', compress=True)

Soil Adjusted Vegetation Index (SAVI)

SAVI introduces a soil brightness correction factor L (commonly 0.5):

$$SAVI = \frac{(NIR - Red) \cdot (1 + L)}{NIR + Red + L}$$

L = 0.5
savi = wbe.raster.general.raster_calculator(
    expression="('nir' - 'red') * (1.0 + 0.5) / ('nir' + 'red' + 0.5)",
    input_rasters=[nir, red]
)

Image Enhancement and Contrast Stretching

Raw digital numbers often have narrow histogram ranges that make visualisation difficult. WbW-Py provides several enhancement methods:

# Percentage contrast stretch — clips top and bottom 2% of values
enhanced = wbe.remote_sensing.enhancement_contrast.percentage_contrast_stretch(red, clip_tail=2.0)

# Standard deviation stretch — maps ±2σ to display range
sd_stretched = wbe.remote_sensing.enhancement_contrast.standard_deviation_contrast_stretch(red, num_std_dev=2.0)

# Gaussian contrast stretch
gauss = wbe.remote_sensing.enhancement_contrast.gaussian_contrast_stretch(red, num_std_dev=2.0)

# Sigmoidal stretch — soft S-curve useful for optical imagery
sig = wbe.remote_sensing.enhancement_contrast.sigmoidal_contrast_stretch(red, cutoff=0.4, gain=4.0)

# Min-max linear stretch
linear = wbe.remote_sensing.enhancement_contrast.min_max_contrast_stretch(red, min_val=0.0, max_val=255.0)

# Gamma correction — lighten (gamma<1) or darken (gamma>1) an image
gamma = wbe.remote_sensing.enhancement_contrast.gamma_correction(red, gamma=0.6)

Histogram matching normalises a source image to match the distribution of a reference image — invaluable when mosaicking scenes acquired on different dates or under different atmospheric conditions:

# Match the histogram of a target scene to a reference scene
matched = wbe.remote_sensing.enhancement_contrast.histogram_matching(source_scene, reference_scene)

# You can also match between two images from the same sensor
matched2 = wbe.remote_sensing.enhancement_contrast.histogram_matching_two_images(img1, img2)

IHS Colour Space Transformations

Intensity-Hue-Saturation (IHS) space separates brightness from colour, enabling selective sharpening or enhancement without hue distortion:

# Convert RGB composite to IHS
(intensity, hue, saturation) = wbe.remote_sensing.enhancement_contrast.rgb_to_ihs(red, green, blue)

# Enhance the intensity channel
enhanced_i = wbe.remote_sensing.enhancement_contrast.standard_deviation_contrast_stretch(intensity, num_std_dev=2.0)

# Convert back to RGB
(r2, g2, b2) = wbe.remote_sensing.enhancement_contrast.ihs_to_rgb(enhanced_i, hue, saturation)

Panchromatic Sharpening (Pan-Sharpening)

When a high-resolution panchromatic band accompanies lower-resolution multispectral data, IHS pan-sharpening fuses the two:

panchromatic = bundle.read_band('B8A')   # Sentinel-2 red-edge as panchromatic proxy
# Landsat: panchromatic = bundle.read_band('PAN')   # 15 m Landsat panchromatic

(sharp_r, sharp_g, sharp_b) = wbe.remote_sensing.enhancement_contrast.panchromatic_sharpening(
    pan=panchromatic,
    red=red,
    green=green,
    blue=blue
)
wbe.write_raster(sharp_r, 'pan_sharp_r.tif')
wbe.write_raster(sharp_g, 'pan_sharp_g.tif')
wbe.write_raster(sharp_b, 'pan_sharp_b.tif')

Image Filtering

Spatial filters are used to suppress noise, enhance edges, or smooth imagery before classification.

# Gaussian smoothing — reduces sensor noise
smoothed = wbe.remote_sensing.filters.gaussian_filter(red, sigma=1.0)

# Bilateral filter — edge-preserving smoothing
bilateral = wbe.remote_sensing.filters.bilateral_filter(red, sigma_dist=2.0, sigma_int=25.0)

# High-pass filter — enhances fine texture
texture = wbe.remote_sensing.filters.high_pass_filter(red, filter_size_x=5, filter_size_y=5)

# Unsharp masking — sharpens blurred imagery
sharp = wbe.remote_sensing.filters.unsharp_masking(red, sigma=3.0, amount=1.0, threshold=0)

# Edge detection — Sobel gradient magnitude
edges_h = wbe.remote_sensing.edge_feature_detection.sobel_filter(red, variant='3x3')

# Canny edge detector — more precise edge localisation
canny_edges = wbe.remote_sensing.edge_feature_detection.canny_edge_detection(red, sigma=0.5,
                                        low_threshold=5.0, high_threshold=15.0)

# General-purpose GLCM texture (multiband output)
# Band names are recorded in raster metadata as band_1_name, band_2_name, ...
glcm = wbe.remote_sensing.filters.glcm_texture(
    red,
    window_size=9,
    distance=1,
    angles="0,45,90,135",
    features="contrast,homogeneity,entropy",
    direction_aggregation="mean",
    levels=32,
    output_path="glcm_texture.tif",
)

Principal Component Analysis (PCA)

PCA is an essential step in multispectral and hyperspectral analysis. It rotates correlated bands into decorrelated principal components (PCs) ordered by descending variance. The first few PCs typically capture most scene variance and are used to reduce data dimensionality before classification.

# Run PCA on all six bands
bands = [blue, green, red, nir, swir1]
pca_result = wbe.raster.general.principal_component_analysis(
    input_rasters=bands,
    output_html_file='pca_report.html',
    num_comp=4,
    standardize=True   # unit-variance standardisation recommended when bands
                       # have very different value ranges
)
# pca_result is a list of component rasters ordered PC1, PC2, ...
pc1, pc2, pc3, pc4 = pca_result[0], pca_result[1], pca_result[2], pca_result[3]
wbe.write_raster(pc1, 'pc1.tif')
wbe.write_raster(pc2, 'pc2.tif')

The HTML report includes the eigenvalue table, percentage of variance explained per component, and the component loadings matrix. PC1 often correlates strongly with overall brightness. PC2 frequently captures vegetation/non-vegetation contrast. Higher PCs capture moisture, soil, and urban differences.

Inverse PCA

Apply classification or enhancements to individual PCs and then reconstruct back to original band space:

# Modify pc1, then reconstruct
pc1_modified = wbe.remote_sensing.enhancement_contrast.standard_deviation_contrast_stretch(pc1, num_std_dev=2.0)
modified_components = [pc1_modified, pc2, pc3, pc4]

reconstructed_bands = wbe.raster.general.inverse_pca(
    modified_components,
    original_rasters=bands
)

Image Segmentation and Object-Based Analysis

Object-based image analysis (OBIA) groups pixels into meaningful spatial units (segments) before classifying them. This is superior to pixel-based classification for high-resolution imagery where individual objects span many pixels.

# Inspect the dedicated OBIA grouping under remote sensing.
print(wbe.remote_sensing.obia.list_tools())

# 1) Create baseline segments (SLIC-like open-core baseline).
segments = wbe.remote_sensing.obia.segment_slic_superpixels(
    inputs=[red, green, nir],
    region_size=18,
    compactness=12.0,
    output='segments_slic.tif'
)

# 2) Merge small regions for cleaner object topology.
segments_clean = wbe.remote_sensing.obia.segments_merge_small_regions(
    segments=segments,
    min_size=12,
    method='longest',
    output='segments_clean.tif'
)

# 3) Extract object-level features.
spectral_csv = wbe.remote_sensing.obia.object_features_spectral_basic(
    segments=segments_clean,
    inputs=[red, green, nir],
    output='object_features_spectral.csv'
)

shape_csv = wbe.remote_sensing.obia.object_features_shape_basic(
    segments=segments_clean,
    output='object_features_shape.csv'
)

texture_csv = wbe.remote_sensing.obia.object_features_texture_glcm_basic(
    segments=segments_clean,
    input=nir,
    levels=16,
    output='object_features_texture.csv'
)

# 4) Train/apply object RF model using segment-level training labels.
pred_csv = wbe.remote_sensing.obia.classify_objects_random_forest(
    features='object_features_all.csv',
    training='training_segments.csv',
    class_field='class',
    output='object_predictions.csv'
)

# 5) Evaluate object-level accuracy.
report = wbe.remote_sensing.obia.evaluate_object_classification_accuracy(
    predictions=pred_csv,
    reference='validation_segments.csv',
    output='object_accuracy.json'
)

# Optional one-call baseline pipeline.
outputs = wbe.remote_sensing.obia.obia_pipeline_basic(
    inputs=[red, green, nir],
    training='training_segments.csv',
    output_prefix='obia_field01',
    segment_method='slic',
)

This baseline stack is designed to be reproducible and script-friendly. For many projects, the one-call obia_pipeline_basic run is the fastest path to a validated Phase 1 workflow.

Advanced OBIA Capabilities (Open Tier)

The OBIA stack now includes advanced capabilities in open tier. These tools are available under wbe.remote_sensing.obia.* and are grouped here by workflow purpose.

Segmentation and scale control:

  • segment_watershed_markers
  • segment_multiresolution_hierarchical
  • segment_scale_parameter_optimizer
  • segments_split_low_cohesion

Object conversion and interoperability:

  • segments_to_polygons
  • polygons_to_segments

Advanced feature engineering:

  • object_features_context_neighbors
  • object_features_topology_relations

Advanced object classification:

  • classify_objects_svm
  • classify_objects_ensemble_pro
  • classify_objects_rules_basic
  • classify_objects_rules_hierarchical
  • object_class_probability_maps
  • object_uncertainty_diagnostics_pro

Hierarchy management and propagation:

  • build_object_hierarchy_multiscale
  • propagate_labels_across_hierarchy

Post-processing and quality:

  • objects_enforce_min_mapping_unit
  • objects_boundary_refinement_pro
  • evaluate_segmentation_quality_pro

Workflow orchestration and reporting:

  • obia_batch_orchestrator_pro
  • obia_audit_report_pro
# Build multi-scale objects and hierarchy links
hier = wbe.remote_sensing.obia.segment_multiresolution_hierarchical(
    inputs=[red, green, nir],
    coarse_k=900.0,
    fine_k=280.0,
    output_prefix='site01_hier'
)

# Add neighborhood + topology context features for difficult classes
context_csv = wbe.remote_sensing.obia.object_features_context_neighbors(
    segments=hier['segments_fine'],
    output='site01_context.csv'
)
topology_csv = wbe.remote_sensing.obia.object_features_topology_relations(
    segments=hier['segments_fine'],
    output='site01_topology.csv'
)

# Ensemble and rule-hierarchy options
ensemble_pred = wbe.remote_sensing.obia.classify_objects_ensemble_pro(
    features='site01_features_all.csv',
    training='site01_training_segments.csv',
    output='site01_pred_ensemble.csv'
)
rule_pred = wbe.remote_sensing.obia.classify_objects_rules_hierarchical(
    features='site01_features_all.csv',
    rules='site01_rules.csv',
    output='site01_pred_rules.csv'
)

# Probability and uncertainty diagnostics
prob_csv = wbe.remote_sensing.obia.object_class_probability_maps(
    predictions=ensemble_pred,
    output='site01_probabilities.csv'
)
unc_json = wbe.remote_sensing.obia.object_uncertainty_diagnostics_pro(
    probabilities=prob_csv,
    low_conf_threshold=0.7,
    output='site01_uncertainty.json'
)
# Batch orchestration + audit report for multi-scene production runs
batch = wbe.remote_sensing.obia.obia_batch_orchestrator_pro(
    jobs=[
        {
            'inputs': ['s1_red.tif', 's1_green.tif', 's1_nir.tif'],
            'training': 's1_training.csv',
            'output_prefix': 'prod/s1',
            'segment_method': 'graph',
        },
        {
            'inputs': ['s2_red.tif', 's2_green.tif', 's2_nir.tif'],
            'training': 's2_training.csv',
            'output_prefix': 'prod/s2',
            'segment_method': 'slic',
        },
    ],
    output='prod/obia_batch_report.json'
)

audit = wbe.remote_sensing.obia.obia_audit_report_pro(
    artifacts=[
        'prod/s1_object_predictions.csv',
        'prod/s2_object_predictions.csv',
        'prod/obia_batch_report.json',
    ],
    output='prod/obia_audit.json'
)

The segments_to_polygons and polygons_to_segments conversion tools are also valuable in edit-and-return workflows where analysts refine object boundaries in vector space and then rasterize updated objects back to segment grids.


Unsupervised Classification

Unsupervised classification groups pixels by spectral similarity without training data, making it useful for exploratory analysis.

K-Means Clustering

bands = [blue, green, red, nir, swir1]

kmeans_result = wbe.remote_sensing.classification.k_means_clustering(
    input_rasters=bands,
    num_classes=10,
    max_iterations=25,
    class_change_threshold=2.0,
    initialize='random'
)
wbe.write_raster(kmeans_result, 'kmeans_10class.tif')

K-means with 10–15 classes followed by manual merging of semantically similar classes is a common workflow. Visualise the clusters against known reference areas to assign land-cover labels.

Modified K-Means

Modified k-means iteratively merges clusters that are too small or spectrally indistinct, and splits clusters that are too dispersed:

mod_kmeans = wbe.remote_sensing.classification.modified_k_means_clustering(
    input_rasters=bands,
    start_num_classes=20,
    merge_distance=2.0,
    max_iterations=25
)
wbe.write_raster(mod_kmeans, 'modified_kmeans.tif')

DBSCAN Clustering

DBSCAN is a density-based algorithm that does not require specifying the number of clusters and handles irregularly shaped spectral clusters well:

dbscan_result = wbe.raster.general.dbscan(
    input_rasters=bands,
    scaling_method='min_max',
    search_distance=0.5,
    min_points=5
)
wbe.write_raster(dbscan_result, 'dbscan_clusters.tif')

Supervised Classification

Supervised classification uses labelled training areas to build a model that is then applied across the full image. WbW-Py supports several classifiers.

Evaluating Training Sites

Before training, check that each class is spectrally separable. Poor separability leads to confused outputs regardless of classifier choice:

training_polys = wbe.read_vector('training_polygons.shp')
wbe.remote_sensing.classification.evaluate_training_sites(
    input_rasters=bands,
    training_polygons=training_polys,
    class_field_name='CLASS',
    output_html_file='separability_report.html'
)

The HTML report shows histograms and box-plots per band per class. Classes whose histograms overlap substantially should be merged or split by adding more nuanced training polygons.

Minimum Distance Classification

The simplest linear classifier assigns each pixel to the nearest class mean in feature space:

classified_md = wbe.remote_sensing.classification.min_dist_classification(
    input_rasters=bands,
    polys=training_polys,
    class_field=True,
    class_field_name='CLASS',
    threshold=5.0  # optional: do not classify if z-score distance > 5
)
wbe.write_raster(classified_md, 'classified_min_dist.tif')

Parallelepiped Classification

The parallelepiped classifier assigns pixels that fall within all class-mean ± n×σ boxes. It is fast but can leave pixels unclassified when they fall outside all boxes:

classified_pp = wbe.remote_sensing.classification.parallelepiped_classification(
    input_rasters=bands,
    polys=training_polys,
    class_field_name='CLASS'
)
wbe.write_raster(classified_pp, 'classified_parallelepiped.tif')

K-Nearest Neighbour Classification

KNN classification is a non-parametric method well-suited to non-linear class boundaries. The k parameter is the number of neighbours to consider:

training_pts = wbe.read_vector('training_points.shp')

knn_model = wbe.remote_sensing.classification.knn_classification(
    input_rasters=bands,
    training_data=training_pts,
    field_name='CLASS',
    scaling_method='z_score',
    k=7,
    distance_weighting=True,
    test_proportion=0.2,
    create_output=True
)
wbe.write_raster(knn_model, 'classified_knn.tif')

Random Forest Classification

Random forest is an ensemble tree classifier that is robust to noise, handles multi-class problems, and produces out-of-bag accuracy estimates. All formerly Pro-tier classifier tools are now open-source:

# Step 1: fit the model and save it
rf_model = wbe.raster.general.random_forest_classification_fit(
    input_rasters=bands,
    training_data=training_polys,
    class_field_name='CLASS',
    n_trees=500,
    min_samples_leaf=1,
    test_proportion=0.2
)
# rf_model is a path to the saved model file

# Step 2: predict on the full image (can be a different image/date)
classified_rf = wbe.raster.general.random_forest_classification_predict(
    input_rasters=bands,
    model=rf_model
)
wbe.write_raster(classified_rf, 'classified_rf.tif')

The fit step prints an accuracy report including the confusion matrix, Kappa coefficient, and overall accuracy.

Support Vector Machine (SVM) Classification

SVMs find the maximum-margin hyperplane separating classes and are particularly effective with high-dimensional data and small training sets:

classified_svm = wbe.remote_sensing.classification.svm_classification(
    input_rasters=bands,
    training=training_polys,
    field='CLASS',
    scaling_method='z_score',
    test_proportion=0.2,
    create_output=True
)
wbe.write_raster(classified_svm, 'classified_svm.tif')

Logistic Regression Classification

Logistic regression is a fast and interpretable baseline classifier, useful as a benchmark before applying more complex models:

classified_lr = wbe.remote_sensing.classification.logistic_regression(
    input_rasters=bands,
    training_data=training_polys,
    class_field_name='CLASS',
    scaling_method='z_score',
    test_proportion=0.2,
    create_output=True
)

Accuracy Assessment

Compare a classified raster against a validation point dataset:

# Kappa index of agreement between classified image and reference data
accuracy = wbe.raster.general.kappa_index(
    classified=classified_rf,
    reference=wbe.read_raster('reference_classification.tif'),
    output_html_file='accuracy_report.html'
)

Generalising Classification Outputs

Classified rasters often contain salt-and-pepper noise — isolated pixels assigned to a class inconsistent with their neighbours. Two tools remove this:

# Remove small patches by merging them into surrounding majority class
generalised = wbe.remote_sensing.classification.generalize_classified_raster(
    classified=classified_rf,
    min_class_size=5  # minimum patch size in cells
)

# Alternative: merge small patches into most spectrally similar neighbour
generalised2 = wbe.remote_sensing.classification.generalize_with_similarity(
    input_rasters=bands,
    classified=classified_rf,
    min_class_size=5
)

# Remove holes (background pixels encircled by foreground)
no_holes = wbe.conversion.raster_vector_conversion.remove_raster_polygon_holes(
    input_raster=classified_rf,
    threshold=25          # cells
)
wbe.write_raster(generalised, 'classified_rf_clean.tif')

Change Detection

Multi-temporal analysis detects land-cover change between two acquisitions.

Image Differencing

The simplest approach differences co-registered images. NDVI differencing, for example, highlights vegetation gain and loss:

ndvi_t1 = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(nir_t1, red_t1)
ndvi_t2 = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(nir_t2, red_t2)
ndvi_change = wbe.raster.general.raster_calculator("'t2' - 't1'", [ndvi_t2, ndvi_t1])
wbe.write_raster(ndvi_change, 'ndvi_change.tif', compress=True)

Change Vector Analysis (CVA)

CVA computes the magnitude and direction of change in multidimensional feature space between two dates. The output direction angle indicates the type of change (e.g., vegetation gain vs. urban growth) while magnitude reflects its intensity:

magnitude, direction = wbe.remote_sensing.change_detection.change_vector_analysis(
    input_rasters_t1=[red_t1, nir_t1, swir1_t1],
    input_rasters_t2=[red_t2, nir_t2, swir1_t2]
)
wbe.write_raster(magnitude, 'cva_magnitude.tif')
wbe.write_raster(direction, 'cva_direction.tif')

Post-Classification Comparison

Classify both dates independently and difference the class maps. This preserves the semantics of each date's land-cover map:

# Train on t1 and apply to t2 using the same saved RF model
classified_t1 = wbe.raster.general.random_forest_classification_predict(bands_t1, rf_model)
classified_t2 = wbe.raster.general.random_forest_classification_predict(bands_t2, rf_model)

# Produce a change raster (unique code for each from-to class transition)
change_map = wbe.raster.general.raster_calculator(
    "'t1' * 100 + 't2'",
    [classified_t1, classified_t2]
)
wbe.write_raster(change_map, 'change_map.tif')

Advanced Change Detection Tools

The sprint additions expose dedicated change tools that return richer diagnostics than simple raster subtraction:

# Multiband image differencing with optional signed and mask outputs
diff_result = wbe.remote_sensing.change_detection.image_difference_change_detection(
    t1_inputs=[red_t1, nir_t1, swir1_t1],
    t2_inputs=[red_t2, nir_t2, swir1_t2],
    output='img_diff_mag.tif',
    options={
        'mode': 'magnitude',
        'threshold_sigma': 2.0,
        'output_signed': 'img_diff_signed.tif',
        'output_mask': 'img_diff_mask.tif'
    }
)

# Post-classification transition table + remap support
post_class_result = wbe.remote_sensing.change_detection.post_classification_change(
    t1_classified=classified_t1,
    t2_classified=classified_t2,
    output='post_class_transition.tif',
    options={
        'transition_scale': 1000,
        't1_class_remap': {'11': 1, '12': 1},
        't2_class_remap': {'41': 4, '42': 4}
    }
)

# PCA-based change with optional mask/report outputs
pca_change_result = wbe.remote_sensing.change_detection.pca_based_change_detection(
    t1_inputs=[red_t1, nir_t1, swir1_t1],
    t2_inputs=[red_t2, nir_t2, swir1_t2],
    output='pca_change_pc1.tif',
    options={
        'component': 1,
        'threshold_sigma': 2.0,
        'output_mask': 'pca_change_mask.tif',
        'output_report': 'pca_change_report.json'
    }
)

Radiometric and Thermal Emissivity Tools

For physically grounded thermal workflows, run radiometric correction and emissivity estimation before LST:

# Atmospheric haze reduction
dos_result = wbe.remote_sensing.radiometric_correction.dark_object_subtraction(
    inputs=[blue, green, red, nir],
    output='dos_stack.tif',
    options={'percentile': 1.0, 'output_diagnostic_offsets': 'dos_offsets.tif'}
)

# DN to TOA reflectance using bundle metadata where available
toa_result = wbe.remote_sensing.radiometric_correction.dn_to_toa_reflectance(
    inputs=[blue, green, red, nir],
    output='toa_stack.tif',
    options={'sensor_bundle_root': '/data/LC09_L1TP_017030_20240420_20240426_02_T1'}
)

# NDVI-based emissivity + LST products
emiss_result = wbe.remote_sensing.thermal_emissivity.ndvi_based_emissivity(
    red_input=red,
    nir_input=nir,
    output='emissivity.tif'
)

lst_sc_result = wbe.remote_sensing.thermal_emissivity.land_surface_temperature_single_channel(
    thermal_input=wbe.read_raster('LC09_B10.TIF'),
    output='lst_single_channel.tif',
    options={'sensor_bundle_root': '/data/LC09_L1TP_017030_20240420_20240426_02_T1'}
)

lst_sw_result = wbe.remote_sensing.thermal_emissivity.land_surface_temperature_split_window(
    thermal1_input=wbe.read_raster('LC09_B10.TIF'),
    thermal2_input=wbe.read_raster('LC09_B11.TIF'),
    output='lst_split_window.tif',
    options={'emissivity_mean_constant': 0.98, 'emissivity_delta_constant': 0.0}
)

Spectral Analytics and PolSAR Decomposition

The new spectral analytics subcategory covers endmember-driven and denoising workflows:

sam_result = wbe.remote_sensing.spectral_analytics.spectral_angle_mapper(
    inputs=[blue, green, red, nir],
    output='sam_classes.tif',
    options={
        'endmembers': [
            {'name': 'water', 'values': [0.03, 0.02, 0.01, 0.00]},
            {'name': 'veg', 'values': [0.05, 0.10, 0.06, 0.40]}
        ],
        'output_angle': 'sam_angle.tif'
    }
)

cont_result = wbe.remote_sensing.spectral_analytics.continuum_removal(
    inputs=[wbe.read_raster('hyp_b1.tif'), wbe.read_raster('hyp_b2.tif'), wbe.read_raster('hyp_b3.tif')],
    output='continuum_removed.tif',
    options={'wavelengths': [450.0, 550.0, 650.0]}
)

unmix_result = wbe.remote_sensing.spectral_analytics.linear_spectral_unmixing(
    inputs=[blue, green, red, nir],
    output='unmix_frac.tif',
    options={
        'endmembers': [
            {'name': 'soil', 'values': [0.18, 0.20, 0.22, 0.24]},
            {'name': 'veg', 'values': [0.05, 0.09, 0.06, 0.40]}
        ],
        'output_residual': 'unmix_residual.tif'
    }
)

mnf_result = wbe.remote_sensing.spectral_analytics.minimum_noise_fraction(
    inputs=[blue, green, red, nir],
    output='mnf_components.tif',
    options={'num_components': 3, 'output_inverse': 'mnf_inverse.tif'}
)

lib_result = wbe.remote_sensing.spectral_analytics.spectral_library_matching(
    inputs=[blue, green, red, nir],
    output='lib_class.tif',
    options={
        'metric': 'sam',
        'library': [
            {'name': 'water', 'values': [0.03, 0.02, 0.01, 0.00]},
            {'name': 'soil', 'values': [0.18, 0.20, 0.22, 0.24]}
        ],
        'output_score': 'lib_score.tif'
    }
)

# SAR decomposition tools are available under remote_sensing.sar
cp_result = wbe.remote_sensing.sar.cloude_pottier_decomposition(
    inputs=[wbe.read_raster('t11.tif'), wbe.read_raster('t22.tif'), wbe.read_raster('t33.tif')],
    output='cloude_pottier_haa.tif',
    options={'matrix_format': 'diag3'}
)

fd_result = wbe.remote_sensing.sar.freeman_durden_decomposition(
    inputs=[wbe.read_raster('c11.tif'), wbe.read_raster('c22.tif'), wbe.read_raster('c33.tif')],
    output='freeman_durden.tif',
    options={'matrix_format': 'diag3', 'output_clip_mask': 'freeman_clip.tif'}
)

Water Extraction and River Mapping

Combining NDWI thresholding with morphological post-processing extracts water bodies:

# Threshold MNDWI — water pixels where mndwi > 0
water = wbe.raster.general.raster_calculator("if('mndwi' > 0.0, 1.0, nodata)", [mndwi])

# Remove small spurious water patches
water_clean = wbe.raster.general.sieve(water, threshold=50)

# Remove islands in water polygons
water_filled = wbe.conversion.raster_vector_conversion.remove_raster_polygon_holes(water_clean, threshold=100)

# Extract river centrelines from the water raster
river_lines = wbe.streams.network_extraction.river_centerlines(water_filled, min_length=5, search_radius=9)
wbe.write_vector(river_lines, 'river_centerlines.shp')

Image Correlation and Regression

WbW-Py supports several image correlation and regression tools useful for sensor cross-calibration, quality control, and change analysis:

# Pearson correlation matrix between all bands
wbe.raster.general.image_correlation(bands, output_html_file='correlation_matrix.html')

# Spatial autocorrelation (Moran's I) per band
wbe.raster.general.image_autocorrelation(bands, contiguity='Rooks', output_html_file='autocorr.html')

# Neighbourhood correlation analysis between two images
wbe.raster.local_neighborhood.image_correlation_neighbourhood_analysis(
    img1=ndvi_t1, img2=ndvi_t2,
    filter_size=11,
    output_html_file='neighbourhood_corr.html'
)

# Simple linear regression of one image on another
slope, intercept, r2 = wbe.raster.general.image_regression(img1=ndvi_t1, img2=ndvi_t2,
                                              output_html_file='regression.html')

WbW-Pro Spotlight: In-Season Crop Stress Intervention Planning

  • Problem: Turn in-season crop stress signals into actionable intervention priorities.
  • Tool: in_season_crop_stress_intervention_planning
  • Typical inputs: NDVI raster, canopy-temperature raster, soil-moisture raster.
  • Typical outputs: Intervention-priority and intervention-class products with summary reporting.
import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

result = wbe.run_tool(
    'in_season_crop_stress_intervention_planning',
    {
        'ndvi': 'ndvi_latest.tif',
        'canopy_temperature': 'lst_latest.tif',
        'soil_moisture': 'soil_moisture_latest.tif',
        'output_prefix': 'field_07_stress'
    }
)
print(result)

Note: This workflow requires a WbEnvironment initialized with a valid Pro licence.


Complete Land-Cover Classification Workflow

The following end-to-end example uses a sensor bundle for ingestion and a random-forest classifier for land-cover mapping:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.verbose = True

# 1. Open scene as a bundle — works with S2, Landsat 9, PlanetScope, etc.
bundle = wbe.read_bundle('/data/S2B_MSIL2A_20240601T105619_N0510_R094_T32TPT.SAFE')
print(f"Family: {bundle.family}, tile: {bundle.tile_id()}, "
      f"clouds: {bundle.cloud_cover_percent():.1f}%")

# Skip scenes with excessive cloud cover
if bundle.cloud_cover_percent() > 20.0:
    raise RuntimeError("Too cloudy for classification")

# 2. Read bands by key (family-agnostic)
blue  = bundle.read_band('B02')
green = bundle.read_band('B03')
red   = bundle.read_band('B04')
nir   = bundle.read_band('B08')
swir1 = wbe.remote_sensing.enhancement_contrast.resample(bundle.read_band('B11'), base_raster=red, method='bilinear')
swir2 = wbe.remote_sensing.enhancement_contrast.resample(bundle.read_band('B12'), base_raster=red, method='bilinear')
bands = [blue, green, red, nir, swir1, swir2]

# 3. Mask clouds using SCL
scl = bundle.read_qa_layer('SCL')
cloud_mask = wbe.raster.general.raster_calculator(
    "if('scl' == 3 or 'scl' == 8 or 'scl' == 9 or 'scl' == 10, 1.0, 0.0)", [scl]
)

# 4. Compute indices
ndvi  = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(nir, red)
mndwi = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(green, swir1)
nbr   = wbe.remote_sensing.enhancement_contrast.normalized_difference_index(nir, swir2)

# 5. Build PCA decorrelation on 6 spectral bands
pc_rasters = wbe.raster.general.principal_component_analysis(
    input_rasters=bands,
    output_html_file='pca.html',
    num_comp=4,
    standardize=True
)

# 6. Combine spectral bands + indices + PCs into feature stack
feature_stack = bands + [ndvi, mndwi, nbr] + pc_rasters

# 7. Evaluate training separability
training = wbe.read_vector('training_polygons.shp')
wbe.remote_sensing.classification.evaluate_training_sites(feature_stack, training, 'CLASS', 'separability.html')

# 8. Train random forest
rf_model_path = wbe.raster.general.random_forest_classification_fit(
    input_rasters=feature_stack,
    training_data=training,
    class_field_name='CLASS',
    n_trees=500,
    test_proportion=0.25
)

# 9. Predict
classified = wbe.raster.general.random_forest_classification_predict(feature_stack, rf_model_path)

# 10. Generalise
classified_clean = wbe.remote_sensing.classification.generalize_classified_raster(classified, min_class_size=9)
classified_clean = wbe.conversion.raster_vector_conversion.remove_raster_polygon_holes(classified_clean, threshold=25)

# 11. Save and assess accuracy
wbe.write_raster(classified_clean, 'classified_final.tif', compress=True)
wbe.raster.general.kappa_index(classified_clean,
                wbe.read_raster('validation_reference.tif'),
                'accuracy_report.html')

# 12. Save a quick-look composite for QC
bundle.true_colour_composite(wbe, output_path='quicklook_truecolour.tif')

print('Land-cover classification complete.')

Tips for Effective Remote Sensing Workflows

Tips

  • Use bundles for scene-level ingestion: wbe.read_bundle() eliminates hardcoded band filenames and works identically across Sentinel-2, Landsat, PlanetScope, and other supported families.
  • Screen cloud cover before processing: Check bundle.cloud_cover_percent() before loading all bands to skip unusable scenes early in a batch workflow.
  • Atmospheric correction is iterative: Raw digital numbers and even surface reflectance products may still carry illumination and atmospheric artefacts. Histogram matching across scenes is a simple relative normalisation that works well when no reference spectrum is available.
  • Align band resolution before stacking: Sentinel-2 delivers 10 m (Blue, Green, Red, NIR) and 20 m (Red Edge, SWIR) bands. Upscale the 20 m bands to 10 m using bilinear resampling before combining them in a feature stack.
  • Always mask clouds and cloud shadows: Use quality bands (Sentinel-2 SCL or Landsat QA_PIXEL) and apply bitwise tests via raster_calculator() — e.g. "('qa' & 0b1000) != 0" — to isolate valid pixels.
  • Balance training data: Training datasets often over-represent dominant classes (e.g., forest) relative to rare classes (e.g., wetland). Sample training polygons proportional to expected class area, or oversample rare classes to improve model accuracy.
  • Check spectral separability: If the evaluate_training_sites report shows Jeffries-Matusita distance < 1.5 between any two classes, consider merging classes or collecting more spectrally diverse training polygons.
  • Use time-series techniques for change detection: Multi-date classifications can show spurious change due to atmospheric variation. Spectral indices are more robust; consider computing NDVI or NDWI time series and detecting change directly in index space.
  • Memory is your bottleneck: Large multispectral stacks fill RAM quickly. Use read_raster_band_by_band() or out-of-memory processing for scenes > 1 GB.

Raster Analysis

Raster data are the backbone of most geospatial analysis workflows. They encode continuous or categorical phenomena as regular grids of cell values, allowing mathematical, statistical, and spatial operations to be applied efficiently across an entire landscape. Whitebox Workflows for Python (WbW-Py) provides an extensive raster processing toolkit covering basic arithmetic, focal and zonal statistics, reclassification, resampling, interpolation, proximity analysis, morphological operations, and raster-to-vector conversion.


Core Concepts

Raster analysis requires understanding these fundamental terms:

  • Cell (pixel): The smallest unit of a raster. Each cell stores a single value (integer or floating-point) and has a uniform spatial extent (e.g. 10 m × 10 m).
  • Data type: Integer (Int8, Int16, Int32) for categorical data; Float32 or Float64 for continuous measurements. Data type affects precision, file size, and computation speed.
  • NoData value: Sentinel value representing missing or invalid data (e.g. -9999 or NaN). Critical for masking water, clouds, or invalid measurements in focal operations.
  • Spatial reference (CRS): Coordinate system and projection. Mismatched CRS between rasters causes silent misalignment; always verify before overlay operations.
  • Extent: The bounding box (xmin, ymin, xmax, ymax) of the raster in real-world coordinates.
  • Cell size (resolution): Cell width and height in map units. Coarser resolution is faster but loses detail; finer resolution requires more memory and computation.
  • Focal operation: Uses neighbourhood values (e.g. 3×3 kernel) to compute output. Examples: moving average, Sobel edge detection, local extrema.
  • Zonal operation: Aggregates grid values by zone (polygon or categorical layer). Examples: mean by land-cover class, sum by administrative boundary.
  • Reclassification: Reassigning cell values according to lookup rules. Common for categorizing continuous data (e.g. slope classes) or remapping land-cover codes.
  • Resampling: Changing cell size or alignment. Methods: nearest-neighbour (preserves categories), bilinear (smooth), cubic (smoother).

Reading and Writing Rasters

Every raster workflow begins with reading data into memory:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.working_directory = '/data/rasters'
wbe.verbose = True

dem = wbe.read_raster('dem.tif')
landcover = wbe.read_raster('landcover.tif')

# Read multiple files at once — returns a list
[slope_r, aspect_r, curvature_r] = wbe.read_rasters('slope.tif',
                                                      'aspect.tif',
                                                      'curvature.tif')

Write results with optional LZW compression (recommended for floating-point grids):

wbe.write_raster(dem, 'dem_processed.tif', compress=True)

Printing GeoTIFF Metadata

Inspect the spatial reference and encoding of any GeoTIFF:

wbe.raster.general.print_geotiff_tags(dem)

Creating New Rasters

Use new_raster() or new_raster_from_base_raster() to create blank grids for accumulation or custom calculations:

# Create a raster with the same extent and resolution as the DEM, filled with 0
output = wbe.conversion.raster_vector_conversion.new_raster_from_base_raster(base=dem, initial_value=0.0)

# Iterate cell-by-cell (illustrative; use raster_calculator for large grids)
for row in range(dem.configs.rows):
    for col in range(dem.configs.columns):
        val = dem[row, col]
        if val != dem.configs.nodata:
            output[row, col] = val * 0.001  # convert mm to m

For most cell-by-cell transforms, raster_calculator() is many times faster because it operates in compiled Rust without Python loop overhead.


Raster Calculator

raster_calculator() evaluates an expression string against one or more input rasters. Rasters are referenced by quoted variable names whose order must match the input_rasters list:

# Scale DEM from centimetres to metres
dem_m = wbe.raster.general.raster_calculator("'dem' / 100.0", [dem])

# Compute HAND (height above nearest drainage) from DEM and stream mask
hand_expr = "if('stream' > 0.0, 0.0, 'dem' - 'stream_elev')"
hand = wbe.raster.general.raster_calculator(hand_expr, [stream_raster, dem, stream_elev])

The expression supports:

  • Arithmetic operators: +, -, *, /, ^ (power), % (modulo)
  • Comparison operators: <, >, <=, >=, ==, !=
  • Logical operators: &&, ||, !
  • Mathematical functions: sqrt(), abs(), ln(), log(), exp(), sin(), cos(), tan(), atan(), atan2()
  • Conditional expression: if(condition, true_value, false_value)
  • Special tokens: nodata, null, rows, columns, row, column, cellsize, north, south, east, west

Practical Examples

# Mask out NoData and values below sea level
masked = wbe.raster.general.raster_calculator(
    "if('dem' == nodata || 'dem' < 0.0, nodata, 'dem')",
    [dem]
)

# Compute normalised slope (0–1)
slope = wbe.terrain.derivatives.slope(dem)
slope_norm = wbe.raster.general.raster_calculator(
    "'slope' / maxvalue",
    [slope]
)

# Create binary mask for steep slopes (>30°)
steep = wbe.raster.general.raster_calculator("if('slope' > 30.0, 1.0, 0.0)", [slope])

Reclassification

Reclassification converts continuous values into categorical classes, or remaps existing category codes.

Manual Reclassification

# reclass() takes new_value, min, max triplets
# Format: [[new_val, min_exclusive, max_inclusive], ...]
# Reclassify slope into five morphology classes
slope_class = wbe.raster.reclass_mask.reclass(
    raster=slope,
    reclass_vals=[
        [1, 0, 5],      # flat: 0–5°
        [2, 5, 15],     # gentle: 5–15°
        [3, 15, 30],    # moderate: 15–30°
        [4, 30, 45],    # steep: 30–45°
        [5, 45, 90],    # very steep: >45°
    ]
)
wbe.write_raster(slope_class, 'slope_classes.tif')

Equal-Interval Reclassification

# Automatically divide the value range into n equal-width intervals
dem_classes = wbe.raster.reclass_mask.reclass_equal_interval(dem, num_intervals=10)
wbe.write_raster(dem_classes, 'dem_equal_interval.tif')

Set NoData Values

# Convert all pixels equal to -9999 to NoData
corrected = wbe.conversion.raster_vector_conversion.set_nodata_value(dem, back_value=-9999.0)

# Convert NoData back to zero (useful for accumulation grids)
zero_bg = wbe.conversion.raster_vector_conversion.convert_nodata_to_zero(dem)

Focal (Neighbourhood) Statistics

Focal statistics compute a statistic within a moving window centred on each cell, producing a smoothed or derivative surface.

Filter Operations

# Mean filter (low-pass smoothing)
dem_smooth = wbe.remote_sensing.filters.mean_filter(dem, filter_size_x=5, filter_size_y=5)

# Gaussian filter — weighted mean using Gaussian kernel
dem_gauss = wbe.remote_sensing.filters.gaussian_filter(dem, sigma=2.0)

# Median filter — robust against outliers
dem_median = wbe.remote_sensing.filters.median_filter(dem, filter_size_x=5, filter_size_y=5)

# Feature-preserving smoothing — preserves ridges and channels
dem_fp = wbe.terrain.general.feature_preserving_smoothing(dem, filter_size=11,
                                           normal_diff_threshold=30.0)

# Standard deviation filter — highlights textural variation
sd_texture = wbe.remote_sensing.filters.standard_deviation_filter(dem, filter_size_x=9, filter_size_y=9)

# Range filter — local elevation range (micro-roughness)
local_range = wbe.remote_sensing.filters.range_filter(dem, filter_size_x=9, filter_size_y=9)

# Percentile filter — local quantile (robust position index)
ep = wbe.remote_sensing.filters.percentile_filter(dem, filter_size_x=11, filter_size_y=11)

Diversity and Majority Filters

# Count the number of unique values in a neighbourhood (for categorical rasters)
diversity = wbe.remote_sensing.filters.diversity_filter(landcover, filter_size_x=7, filter_size_y=7)

# Majority filter — replace each cell with the most frequent class in neighbourhood
smoothed_lc = wbe.remote_sensing.filters.majority_filter(landcover, filter_size_x=5, filter_size_y=5)

Morphological Filters (Binary/Raster Objects)

Morphological operations on raster objects (connected foreground regions) are essential for post-classification cleanup.

# Dilation — grow foreground by one cell in each direction
dilated = wbe.remote_sensing.filters.closing(binary_raster, filter_size_x=3, filter_size_y=3)

# Erosion — shrink foreground
eroded = wbe.remote_sensing.filters.opening(binary_raster, filter_size_x=3, filter_size_y=3)

# Top-hat transform — isolates bright or dark features within a local window
tophat_white = wbe.remote_sensing.filters.tophat_transform(dem, filter_size_x=21, filter_size_y=21,
                                    variant='white')  # bright features above background
tophat_black = wbe.remote_sensing.filters.tophat_transform(dem, filter_size_x=21, filter_size_y=21,
                                    variant='black')  # dark depressions below background

Summary and Zonal Statistics

Global Summary Statistics

wbe.raster.general.raster_summary_stats(dem)  # prints count, mean, std, min, max, range to console

Histogram

wbe.raster.general.raster_histogram(dem, output_html_file='dem_histogram.html')

Zonal Statistics

Zonal statistics compute per-zone summary metrics for one raster given a second raster that defines zones:

# Compute mean, St. dev., min, max of the DEM within each land-cover class
zonal_stats = wbe.raster.general.zonal_statistics(
    raster=dem,
    zones=landcover,
    stat='mean',           # 'mean', 'min', 'max', 'std', 'var', 'count', 'sum'
    cell_size_is_area=False
)
wbe.write_raster(zonal_stats, 'mean_elevation_per_class.tif')

Percent Overlays

# Fraction of neighbourhood cells below/above/equal to focal cell value
pct_less  = wbe.raster.overlay_math.percent_less_than(dem, filter_size_x=9, filter_size_y=9)
pct_great = wbe.raster.overlay_math.percent_greater_than(dem, filter_size_x=9, filter_size_y=9)
pct_equal = wbe.raster.overlay_math.percent_equal_to(dem, filter_size_x=9, filter_size_y=9)

Unique-Value Enumeration

wbe.raster.general.list_unique_values_raster(landcover)  # prints class codes and pixel counts

Overlay Operations

Overlay operations combine multiple rasters into a single output, handling conflicting values with a defined rule.

# Sum overlay — add all values at each cell
sum_result = wbe.raster.overlay_math.sum_overlay(rasters=[r1, r2, r3])

# Average overlay — mean of all rasters
avg = wbe.raster.overlay_math.average_overlay(rasters=[r1, r2, r3])

# Maximum overlay — highest value among all rasters
max_r = wbe.raster.overlay_math.max_overlay(rasters=[r1, r2, r3])

# Pick from a list based on an index raster (values 1..N index which raster to use)
selected = wbe.raster.overlay_math.pick_from_list(index=index_raster, rasters=[r1, r2, r3])

# Weighted sum — apply a weight to each contributing raster
weighted = wbe.raster.overlay_math.weighted_sum(rasters=[r1, r2], weights=[0.6, 0.4])

Multi-Criteria Weighted Overlay

weighted_overlay() normalises each factor raster, applies class weights, and sums the products — a standard MCE technique in site suitability analysis:

# Suitability analysis: each raster already reclassified 1–5 scale
suitability = wbe.raster.overlay_math.weighted_overlay(
    factor_rasters=[slope_suitability, aspect_suitability, soil_suitability],
    weights=[0.4, 0.2, 0.4],
    scale_max=5.0,
    cost_factors=[False, False, False]   # True = cost (lower = better)
)
wbe.write_raster(suitability, 'suitability.tif')

Aggregation and Resampling

Aggregate Raster

Reduce resolution by a specified factor using a summary statistic. Useful for multi-scale analyses:

# Aggregate to 5× coarser resolution using mean
coarse_dem = wbe.raster.general.aggregate_raster(dem, agg_factor=5, type='mean')

# Other types: 'sum', 'max', 'min'
coarse_lc = wbe.raster.general.aggregate_raster(landcover, agg_factor=5, type='majority')

Resample

Resample a raster to match the grid of another raster or to a specified cell size. Supports several interpolation methods:

# Bilinear interpolation for continuous data
resampled = wbe.remote_sensing.enhancement_contrast.resample(source_raster, base_raster=dem, method='bilinear')

# Nearest neighbour — preserves discrete categories
resampled_cat = wbe.remote_sensing.enhancement_contrast.resample(landcover, base_raster=dem, method='nn')

# Cubic convolution — smooth surfaces
resampled_cc = wbe.remote_sensing.enhancement_contrast.resample(dem, base_raster=fine_base, method='cc')

Proximity Analysis

Proximity tools compute distances, directions, and allocations based on the locations of source cells.

Euclidean Distance and Direction

# Distance of every cell from the nearest stream cell
streams_raster = wbe.streams.network_extraction.rasterize_streams(streams_vector, base_raster=dem)
dist_to_streams = wbe.raster.distance_cost.euclidean_distance(streams_raster)
wbe.write_raster(dist_to_streams, 'dist_to_streams.tif')

# Which stream cell is each cell nearest to?
allocated = wbe.raster.distance_cost.euclidean_allocation(streams_raster)

Cost-Distance Analysis

Cost-distance analysis finds the least-cost path across a landscape where movement is not uniform. The cost raster encodes the cost of traversing one cell:

# Build cost and source layers
cost_surface = wbe.raster.general.raster_calculator(
    "'slope' * 0.1 + 'landcover_cost'",
    [slope, landcover_cost]
)
# Accumulate cost from source points
cost_dist = wbe.raster.distance_cost.cost_distance(source_raster, cost_surface)
wbe.write_raster(cost_dist, 'cost_distance.tif')

# Allocate each cell to the nearest source by cost
cost_alloc = wbe.raster.distance_cost.cost_allocation(source_raster, cost_distance=cost_dist)

# Trace the optimal path between two points
cost_path = wbe.raster.distance_cost.cost_pathway(source=start_raster, destination=end_raster,
                              cost_surface=cost_surface)
wbe.write_raster(cost_path, 'optimal_path.tif')

Buffer Distance

Create a buffer zone around all features in a raster at a specified distance:

buffered = wbe.raster.distance_cost.buffer_raster(streams_raster, size=100.0, gridcells=False)
wbe.write_raster(buffered, 'stream_buffer_100m.tif')

Raster Object Analysis

The clump tool identifies connected groups of cells with the same value and assigns each group a unique identifier. This is analogous to spatial dissolve on a raster:

# Identify connected patches in the land-cover raster
clumped = wbe.raster.general.clump(landcover, diag=True, zero_back=True)

# Filter out small patches (<10 cells) from a binary mask
large_patches = wbe.raster.general.filter_raster_features_by_area(binary_mask,
                                                     threshold=10,
                                                     background_val=0)

# Shape complexity of each raster object
complexity = wbe.raster.general.shape_complexity_index_raster(clumped)

# Raster area per clump
clump_area = wbe.raster.general.raster_area(clumped, out_text=False)

# Highest or lowest position among several rasters
highest = wbe.raster.overlay_math.highest_position(rasters=[r1, r2, r3])   # which raster has max value?
lowest  = wbe.raster.overlay_math.lowest_position(rasters=[r1, r2, r3])

Raster-to-Vector Conversion

# Convert raster polygons to vector polygons
polygons = wbe.conversion.raster_vector_conversion.raster_to_vector_polygons(landcover)
wbe.write_vector(polygons, 'landcover_polygons.shp')

# Convert raster lines to vector lines
lines = wbe.conversion.raster_vector_conversion.raster_to_vector_lines(stream_raster)
wbe.write_vector(lines, 'stream_lines.shp')

# Convert raster points (non-zero cells) to vector points
pts = wbe.conversion.raster_vector_conversion.raster_to_vector_points(peak_cells)
wbe.write_vector(pts, 'peaks.shp')

Interpolation from Point Clouds

When field observation points or vector point layers represent a surface, interpolate them to a raster:

sample_pts = wbe.read_vector('soil_sample_points.shp')

# Inverse Distance Weighting (IDW)
idw = wbe.raster.general.idw_interpolation(points=sample_pts, field='OM_PCT',
                             cell_size=10.0, radius=250.0, weight=2.0)
wbe.write_raster(idw, 'soil_om_idw.tif')

# Natural Neighbour (Sibson) interpolation
nn = wbe.raster.local_neighborhood.natural_neighbour_interpolation(points=sample_pts, field='OM_PCT',
                                          cell_size=10.0)
wbe.write_raster(nn, 'soil_om_nn.tif')

# Radial Basis Function interpolation
rbf = wbe.raster.general.radial_basis_function_interpolation(points=sample_pts, field='OM_PCT',
                                               cell_size=10.0, radius=250.0)
wbe.write_raster(rbf, 'soil_om_rbf.tif')

# TIN interpolation from vector points
tin = wbe.raster.general.tin_interpolation(points=sample_pts, field_name='OM_PCT', cell_size=10.0)
wbe.write_raster(tin, 'soil_om_tin.tif')

Geostatistical Simulation

The turning-bands simulation creates spatially autocorrelated random fields — useful for uncertainty analysis and Monte Carlo simulation of landscape processes:

sim = wbe.raster.general.turning_bands_simulation(
    base_raster=dem,
    range=500.0,       # autocorrelation range (metres)
    sill_height=1.0,   # sill variance
    num_lines=1000
)
wbe.write_raster(sim, 'random_field_simulated.tif')

Statistical Tests on Raster Distributions

# Test whether a raster's values are Gaussian
wbe.raster.general.ks_normality_test(dem, output_html_file='normality_test.html', num_samples=5000)

# Paired-sample t-test to compare two rasters cell-by-cell
wbe.raster.general.paired_sample_t_test(dem, dem_lidar,
                          output_html_file='ttest.html', num_samples=5000)

# Two-sample KS test for distributional differences
wbe.raster.general.two_sample_ks_test(dem, dem_lidar,
                        output_html_file='ks_test.html', num_samples=5000)

# Wilcoxon signed-rank test (non-parametric alternative to t-test)
wbe.raster.general.wilcoxon_signed_rank_test(dem, dem_lidar,
                               output_html_file='wilcoxon.html', num_samples=5000)

Contour Generation

# Contours from DEM raster
contours = wbe.vector.sampling_gridding.contours_from_raster(dem, contour_interval=5.0, base_contour=0.0,
                                     smooth=9, tolerance=5.0)
wbe.write_vector(contours, 'contours_5m.shp')

WbW-Pro Spotlight: Field Trafficability and Operation Planning

  • Problem: Plan equipment timing and field access under variable moisture and weather conditions.
  • Tool: field_trafficability_and_operation_planning
  • Typical inputs: DEM, normalized soil-moisture raster, optional rainfall-risk raster.
  • Typical outputs: Trafficability score raster, operation-class raster, and summary outputs.
import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

result = wbe.run_tool(
    'field_trafficability_and_operation_planning',
    {
        'dem': 'field_dem.tif',
        'soil_moisture': 'soil_moisture_norm.tif',
        'rainfall_forecast': 'rainfall_risk_norm.tif',
        'output_prefix': 'field_12_trafficability'
    }
)
print(result)

Note: This workflow requires a WbEnvironment initialized with a valid Pro licence.


Complete Raster Analysis Pipeline

The following script demonstrates a typical DEM-derived terrain model workflow including correction, classification, and proximity analysis:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.working_directory = '/data/analysis'
wbe.verbose = True
wbe.max_procs = -1  # use all available cores

# 1. Load raw DEM and correct NoData encoding
dem = wbe.read_raster('raw_dem.tif')
dem = wbe.conversion.raster_vector_conversion.set_nodata_value(dem, back_value=-9999.0)
dem = wbe.terrain.general.fill_missing_data(dem, filter_size=11)

# 2. Smooth to suppress sensor noise
dem_smooth = wbe.terrain.general.feature_preserving_smoothing(dem, filter_size=11,
                                               normal_diff_threshold=30.0,
                                               iterations=3)

# 3. Derive primary terrain attributes
slope      = wbe.terrain.derivatives.slope(dem_smooth, units='degrees')
aspect     = wbe.terrain.derivatives.aspect(dem_smooth)
tpi        = wbe.terrain.general.deviation_from_mean_elevation(dem_smooth, filter_size_x=21,
                                               filter_size_y=21)
relrough   = wbe.terrain.multiscale_signatures.multiscale_roughness(dem_smooth, min_scale=1, max_scale=50, step=2)

# 4. Classify slope into stability classes
slope_class = wbe.raster.reclass_mask.reclass(slope, reclass_vals=[
    [1, 0, 10],   # low risk
    [2, 10, 25],  # moderate risk
    [3, 25, 90],  # high risk
])

# 5. Compute Euclidean distance from streams
streams = wbe.read_raster('streams.tif')
dist_streams = wbe.raster.distance_cost.euclidean_distance(streams)

# 6. Site suitability overlay
# Normalise all factors to a 1–5 scale first ...
suitability = wbe.raster.overlay_math.weighted_overlay(
    factor_rasters=[slope_class, dist_streams_class],
    weights=[0.7, 0.3],
    scale_max=3.0
)

wbe.write_raster(suitability, 'site_suitability.tif', compress=True)
print('Raster analysis pipeline complete.')

Tips

  • Choose your data type: Use integers for categorical data (land cover, classifications) to minimize file size and computation time. Use floating-point (Float32 or Float64) only for continuous measurements (elevation, temperature, probability).
  • Set NoData explicitly: Ensure your source rasters carry a valid NoData value. Missing NoData declarations can corrupt statistics and focal operations by including invalid pixels as zeros or false elevations.
  • Compress carefully: LZW and DEFLATE compression work well for most data; avoid if you need rapid random access to interior tiles. Use COMPRESS=JPEG for photographic data only (lossy, unsuitable for analysis).
  • Focal operations require buffering: Cells at raster edges cannot compute full neighbourhoods. Use expand() or accept edge effects; never assume borders are valid in derivative rasters.
  • Zonal statistics are only as good as your zones: Ensure zone boundaries are topologically clean (no overlaps, no gaps). Overlapping zones cause double-counting; gaps cause NoData regions in output.
  • Reclassification is fast but risky: Always validate output distributions (histogram) after reclassification. Off-by-one errors in class boundaries can silently produce wrong land-cover or suitability classes.
  • Memory is the constraint for large rasters: Tiles > 2 GB require out-of-core or streaming processing. Use read_by_block() for large files; avoid loading entire rasters into memory if they exceed available RAM.
  • Upsampling introduces artifacts: Never upsample (finer resolution) without a valid interpolation method. Nearest-neighbour upsampling creates blocky artefacts; bilinear is smoother but may violate data range (e.g. probability values > 1.0).

Vector Analysis

Vector data are the primary format for discrete geographic features — points, lines, and polygons representing everything from sample locations and road networks to property parcels and watershed boundaries. Whitebox Workflows for Python (WbW-Py) provides a comprehensive set of vector analysis tools covering attribute management, geometric measurement, spatial overlay, proximity analysis, topology repair, shape analysis, and vector-to-raster conversion.


Core Concepts

Vector analysis depends on understanding these core concepts:

  • Feature geometry: Points (single coordinate pairs), lines (ordered sequences of coordinate pairs), and polygons (rings of coordinates forming closed boundaries). Each feature type supports different analyses.
  • Topology: The spatial relationships between features (adjacency, containment, intersection). Topological errors (overshoots, undershoots, self-intersections) corrupt overlay operations and topology queries.
  • Attribute table: The database associated with each feature layer, carrying descriptive fields and values. Attribute queries filter features; joins link external tables.
  • Spatial index: Internal index structure (R-tree or quadtree) enabling fast spatial queries. Used by intersection, containment, and proximity operations. Always build spatial indices on frequently queries layers.
  • Envelope (bounding box): The minimum rectangular boundary of a feature or layer; used for quick spatial culling before geometry tests.
  • Buffer: A polygon created at a fixed distance around a feature. Buffers model proximity zones and are fundamental to distance-based analysis.
  • Overlay (intersection, union, difference): Combining two polygon layers to create a new layer. Union merges boundaries; intersection keeps overlapping area only; difference removes one layer from another.
  • Dissolve (aggregation): Merging adjacent features with identical attribute values, creating larger aggregate features. Reduces feature count and simplifies geometry.
  • Spatial join: Associating features from one layer with features from another based on spatial relationship (overlap, containment, proximity). Assigns attributes across layers.
  • Proximity analysis: Finding nearest features, distances, or connectivity. Foundation for network analysis, market analysis, and accessibility studies.

Reading and Writing Vectors

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.working_directory = '/data/vectors'

# Read a single vector
watersheds = wbe.read_vector('watersheds.shp')
streams    = wbe.read_vector('streams.shp')
outlets    = wbe.read_vector('outlets.shp')

# Read multiple at once
[roads, buildings, parks] = wbe.read_vectors('roads.shp', 'buildings.shp', 'parks.shp')

# Write results
wbe.write_vector(watersheds, 'watersheds_processed.shp')

Attribute Table Management

Adding and Removing Fields

# Add a new numeric field
watersheds = wbe.vector.attribute_analysis.add_field(watersheds, field_name='AREA_KM2', field_type='Float')

# Rename a field
watersheds = wbe.vector.attribute_analysis.rename_field(watersheds,
                               old_field_name='OBJECTID',
                               new_field_name='WS_ID')

# Delete a field
watersheds = wbe.vector.attribute_analysis.delete_field(watersheds, field_name='TEMP_FIELD')

# Reset the entire attribute table to only an FID column
watersheds = wbe.conversion.vector_table_io.reinitialize_attribute_table(watersheds)

Filtering by Attribute

# Select features where upstream area exceeds 50 km²
large_ws = wbe.vector.attribute_analysis.extract_by_attribute(watersheds,
                                     field_name='AREA_KM2',
                                     operator='>',
                                     value=50.0)
wbe.write_vector(large_ws, 'large_watersheds.shp')

Joining Tables

# Merge a CSV attribute table into a vector by a shared key field
import csv
merged = wbe.conversion.vector_table_io.merge_table_with_csv(watersheds,
                                   csv_file='watershed_stats.csv',
                                   join_field='WS_ID')

Exporting the Attribute Table

wbe.conversion.vector_table_io.export_table_to_csv(watersheds, 'watershed_attributes.csv')

Listing Unique Values

wbe.vector.attribute_analysis.list_unique_values(watersheds, field_name='REGION')

Geometric Measurement

Polygon Area and Perimeter

# Compute polygon area (adds AREA field to attribute table)
watersheds = wbe.vector.shape_metrics.polygon_area(watersheds)

# Compute perimeter (adds PERIMETER field)
watersheds = wbe.vector.shape_metrics.polygon_perimeter(watersheds)

Shape Indices

Shape indices quantify the geometric complexity and elongation of polygon features. They are widely used in ecology (patch metrics), hydrology (watershed form), and urban analysis:

# Compactness ratio — measures how closely a polygon approximates a circle
# Perfectly circular = 1.0; lower values = more elongated
watersheds = wbe.vector.shape_metrics.compactness_ratio(watersheds)

# Elongation ratio — based on minimum bounding box dimensions
watersheds = wbe.vector.shape_metrics.elongation_ratio(watersheds)

# Linearity index — R² of an RMA regression through hull vertices
# Higher values indicate long, narrow linear shapes
watersheds = wbe.vector.shape_metrics.linearity_index(watersheds)

# Related circumscribing circle
watersheds = wbe.vector.shape_metrics.related_circumscribing_circle(watersheds)

# Boundary shape complexity
watersheds = wbe.raster.general.boundary_shape_complexity(watersheds)

# Hole proportion — fraction of polygon area that is holes
watersheds = wbe.vector.shape_metrics.hole_proportion(watersheds)

# Shape complexity (vector)
watersheds = wbe.vector.shape_metrics.shape_complexity_index_vector(watersheds)

# Patch orientation (degrees from north of long axis)
watersheds = wbe.vector.shape_metrics.patch_orientation(watersheds)

# Narrowness index
watersheds = wbe.vector.shape_metrics.narrowness_index(watersheds)

# Radius of gyration (area-weighted centroid distance)
watersheds = wbe.raster.general.radius_of_gyration(watersheds)

Point Coordinate Addition

# Add X, Y (and optionally Z) coordinate columns to a point vector
sample_pts = wbe.conversion.vector_table_io.add_point_coordinates_to_table(sample_pts)

Centroids, Bounding Boxes, and Convex Hulls

# Point centroid of each polygon
centroids = wbe.vector.geometry_processing.centroid_vector(watersheds)
wbe.write_vector(centroids, 'watershed_centroids.shp')

# Minimum bounding box for each polygon
bboxes = wbe.vector.geometry_processing.minimum_bounding_box(watersheds)

# Minimum bounding circle
circles = wbe.vector.geometry_processing.minimum_bounding_circle(watersheds)

# Minimum bounding envelope (overall)
envelope = wbe.vector.geometry_processing.minimum_bounding_envelope(watersheds)

# Minimum convex hull
hull = wbe.vector.geometry_processing.minimum_convex_hull(watersheds)

# Layer footprint (bounding box of entire layer)
footprint = wbe.vector.sampling_gridding.layer_footprint_vector(watersheds)

# Long axis and short axis lines
long_axis  = wbe.vector.shape_metrics.polygon_long_axis(watersheds)
short_axis = wbe.vector.shape_metrics.polygon_short_axis(watersheds)

Smoothing, Simplification, and Geometry Operations

# Smooth vertices by averaging — reduces digitising artefacts
smooth = wbe.vector.geometry_processing.smooth_vectors(streams, filter_size=5)

# Douglas-Peucker line simplification
simplified = wbe.vector.geometry_processing.simplify_features(streams, snap_distance=10.0)

# Split long lines into segments of maximum length
segmented = wbe.vector.geometry_processing.split_vector_lines(streams, segment_length=1000.0)

# Extend line endpoints by a specified distance
extended = wbe.vector.geometry_processing.extend_vector_lines(streams, dist=50.0, extend_type='both')

# Split polygons or lines using another line layer
split_polys = wbe.vector.geometry_processing.split_with_lines(watersheds, split_lines=roads)

# Lines to polygon conversion (close and fill each polyline)
polys_from_lines = wbe.conversion.geometry_topology.lines_to_polygons(outline_lines)

# Polygons to lines (extract boundary lines)
lines_from_polys = wbe.conversion.geometry_topology.polygons_to_lines(watersheds)

# Convert multipart features to singlepart
single = wbe.conversion.geometry_topology.multipart_to_singlepart(watersheds)

# Convert singlepart to multipart by shared attribute value
multi = wbe.conversion.geometry_topology.singlepart_to_multipart(parcels, field_name='OWNER_ID')

# Merge all features in two or more files into one layer
merged_streams = wbe.conversion.vector_table_io.merge_vectors(streams_a, streams_b)

# Clean topology (remove duplicate vertices and degenerate features)
cleaned = wbe.conversion.vector_table_io.clean_vector(streams)

# Remove polygon holes smaller than a threshold
no_holes = wbe.conversion.geometry_topology.remove_polygon_holes(watersheds)

Spatial Overlay

WbW-Py supports the full suite of vector set-theoretic overlay operations:

Clip

Cuts one layer to the extent of another, retaining only features within the clip polygon:

# Clip roads to a study area polygon
roads_clipped = wbe.vector.overlay_analysis.clip(input=roads, clip=study_area)
wbe.write_vector(roads_clipped, 'roads_study_area.shp')

Intersect

Returns the geometric intersection of two layers, keeping the portions where they overlap and combining attributes from both:

soil_in_watershed = wbe.vector.overlay_analysis.intersect(input=soil_polygons, overlay=watershed_boundary,
                                   snap_tolerance=1e-6)
wbe.write_vector(soil_in_watershed, 'soil_in_watershed.shp')

Erase (Difference)

Removes the area of one layer from another:

# Remove urban areas from the vegetation layer
rural_veg = wbe.vector.overlay_analysis.erase(input=vegetation, erase_layer=urban_boundaries)

Union

Combines two polygon layers and divides overlapping areas, retaining all features from both:

combined = wbe.vector.overlay_analysis.union(input=zoning, overlay=flood_zones)
wbe.write_vector(combined, 'zoning_flood_overlay.shp')

Symmetrical Difference

Returns only the non-overlapping portions of each layer:

sym_diff = wbe.vector.overlay_analysis.symmetrical_difference(input=year1_polygons, overlay=year2_polygons)

Dissolve

Merges features that share a common attribute value:

# Merge all polygons of the same land-cover class
dissolved = wbe.vector.overlay_analysis.dissolve(input=landcover_polygons, field_name='CLASS')
wbe.write_vector(dissolved, 'landcover_dissolved.shp')

Proximity and Near Analysis

Euclidean Distance to Nearest Feature

# Find the distance from each sample point to the nearest road
near_result = wbe.vector.overlay_analysis.near(input=sample_pts, feature=roads)
# Adds NEAR_DIST and NEAR_FID fields to sample_pts
wbe.write_vector(near_result, 'samples_near_roads.shp')

Voronoi Diagram (Thiessen Polygons)

Thiessen polygons partition space so every location is assigned to the nearest source point:

voronoi = wbe.vector.sampling_gridding.voronoi_diagram(sample_pts)
wbe.write_vector(voronoi, 'thiessen_polygons.shp')

Convex Hull and Medoid

hull_pts = wbe.vector.geometry_processing.minimum_convex_hull(sample_pts)  # minimum convex hull of point set
med      = wbe.vector.sampling_gridding.medoid(sample_pts)               # geometric median of a set of points

Select by Location

Spatial queries allow selection of features based on their geometric relationship to a second layer:

# Select all stream segments that intersect wetland polygons
streams_in_wetlands = wbe.vector.overlay_analysis.select_by_location(
    input=streams,
    comparison=wetlands,
    geometry_type='intersects'
)
wbe.write_vector(streams_in_wetlands, 'streams_in_wetlands.shp')

Spatial Join

Spatial join transfers attributes from a join layer to an input layer based on spatial proximity or overlap:

# Join soil class to sample points based on the polygon they fall within
pts_with_soil = wbe.vector.overlay_analysis.spatial_join(
    input=sample_pts,
    join_layer=soil_polygons,
    join_type='within',       # 'within', 'intersects', 'nearest'
    strategy='first',         # 'first', 'last', 'count', 'sum', 'mean', 'min', 'max'
    field_name='SOIL_CLASS'
)
wbe.write_vector(pts_with_soil, 'samples_with_soil.shp')

Vector Grids

Create regular grids of vector polygons covering a raster or vector extent. Useful for stratified sampling and landscape analysis at fixed spatial scales:

# Hexagonal grid with resolution based on an existing raster
hex_grid = wbe.vector.sampling_gridding.hexagonal_grid_from_raster_base(dem)
wbe.write_vector(hex_grid, 'hexgrid.shp')

# Rectangular grid
rec_grid = wbe.vector.sampling_gridding.rectangular_grid_from_raster_base(dem)
wbe.write_vector(rec_grid, 'recgrid.shp')

# Hexagonal grid with resolution based on a vector extent
hex_v = wbe.vector.sampling_gridding.hexagonal_grid_from_vector_base(watersheds, width=500.0)

Vector-to-Raster Conversion

# Rasterize polygon layer (burn polygon attribute value into grid cells)
lc_raster = wbe.conversion.raster_vector_conversion.vector_polygons_to_raster(
    input=landcover_polygons,
    field_name='CLASS_ID',
    cell_size=30.0
)
wbe.write_raster(lc_raster, 'landcover_raster.tif')

# Rasterize line layer
roads_raster = wbe.conversion.raster_vector_conversion.vector_lines_to_raster(
    input=roads,
    field_name='FID',
    cell_size=10.0
)

# Rasterize point layer
pts_raster = wbe.conversion.raster_vector_conversion.vector_points_to_raster(
    input=sample_pts,
    field_name='YIELD',
    assign_op='mean',  # 'first', 'last', 'min', 'max', 'sum', 'mean', 'number'
    cell_size=5.0
)

Field Calculator

The field calculator supports SQL-style and expression-style updates for vector attributes, including:

  • searched CASE WHEN ... THEN ... ELSE ... END
  • simple CASE field WHEN value THEN ... END
  • UPDATE ... SET ... [WHERE ...] wrapper syntax
  • SQL operators (=, <>, AND, OR, NOT), plus IS NULL / IS NOT NULL
  • CAST(... AS integer|float|text|boolean)
  • preview-first execution with preview_rows (no output write required)

Use it to create or update a field from existing attributes and geometry variables ($area, $length, $perimeter, centroid coordinates).

# Compute/update SPEED from TYPE using SQL-style CASE
watersheds = wbe.vector.attribute_analysis.field_calculator(
    input=watersheds,
    field='SPEED',
    field_type='integer',
    expression="CASE WHEN TYPE == 'motorway' THEN 100 WHEN TYPE == 'primary' THEN 80 ELSE 60 END",
    overwrite=True,
    output='watersheds_speed.gpkg'
)

# Preview-only evaluation (returns preview payload, omits output write)
preview = wbe.vector.attribute_analysis.field_calculator(
    input=watersheds,
    field='SPEED',
    field_type='integer',
    expression="CASE TYPE WHEN 'motorway' THEN 100 ELSE 60 END",
    overwrite=True,
    preview_rows=10
)

When preview_rows > 0, the tool returns preview records and normalized expression details that can be surfaced in UI workflows before committing final output writes.


Topological Utilities

Repair and Validation

# Snap nearby line endpoints to within a tolerance distance
streams_clean = wbe.streams.network_extraction.repair_stream_vector_topology(streams, snap_distance=1.0)

# Fix dangling arcs (lines that overshoot or undershoot intersections)
fixed = wbe.conversion.geometry_topology.fix_dangling_arcs(streams, snap_distance=1.0)

Line Intersections

# Find all intersection points between two line layers
intersections = wbe.vector.overlay_analysis.line_intersections(roads, rivers)
wbe.write_vector(intersections, 'road_river_crossings.shp')

Extract Nodes

# Extract all vertices of a line layer as points
nodes = wbe.vector.sampling_gridding.extract_nodes(streams)
wbe.write_vector(nodes, 'stream_nodes.shp')

Polygon Topology

# Polygonise a raster — convert raster regions to vector polygons
polys = wbe.vector.geometry_processing.polygonize(classified_raster)
wbe.write_vector(polys, 'class_polygons.shp')

Point Cluster Analysis

# Heat map (kernel density estimation)
density = wbe.raster.general.heat_map(sample_pts, bandwidth=500.0)
wbe.write_raster(density, 'point_density.tif')

# Vector hexagonal binning — count points per hexagon
hex_counts = wbe.vector.sampling_gridding.vector_hex_binning(sample_pts, width=1000.0, orientation='vertical')
wbe.write_vector(hex_counts, 'hex_counts.shp')

WbW-Pro Spotlight: Market Access and Site Intelligence

  • Problem: Rank candidate sites using repeatable network-access and demand logic.
  • Tool: market_access_and_site_intelligence_workflow
  • Typical inputs: Network, existing sites, candidate sites, demand surface, drive-time rings.
  • Typical outputs: Catchment polygons, competitive-overlap layer, candidate-ranking CSV, executive summary JSON.
import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

result = wbe.run_tool(
    'market_access_and_site_intelligence_workflow',
    {
        'network': 'street_network.shp',
        'sites_existing': 'existing_sites.shp',
        'sites_candidates': 'candidate_sites.shp',
        'demand_surface': 'demand_points.shp',
        'ring_costs': [5.0, 10.0, 15.0],
        'catchments_output': 'candidate_catchments.shp',
        'overlap_analysis_output': 'competitive_overlap.shp',
        'candidate_rank_csv': 'candidate_rankings.csv',
        'executive_summary_json': 'market_summary.json'
    }
)
print(result)

Note: This workflow requires a WbEnvironment initialized with a valid Pro licence.


Complete Vector Analysis Workflow

The following script illustrates a full workflow from raw survey points to a clipped, dissolved, and enriched polygon layer:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.working_directory = '/data/vector_analysis'
wbe.verbose = True

# 1. Load layers
parcels       = wbe.read_vector('parcels.shp')
study_boundary = wbe.read_vector('study_boundary.shp')
soil_map      = wbe.read_vector('soil_types.shp')
sample_pts    = wbe.read_vector('sample_points.shp')

# 2. Clip parcels to study boundary
parcels_clip = wbe.vector.overlay_analysis.clip(input=parcels, clip=study_boundary)

# 3. Compute geometric attributes
parcels_clip = wbe.vector.shape_metrics.polygon_area(parcels_clip)
parcels_clip = wbe.vector.shape_metrics.polygon_perimeter(parcels_clip)
parcels_clip = wbe.vector.shape_metrics.compactness_ratio(parcels_clip)

# 4. Spatial join — assign soil type to each parcel
parcels_with_soil = wbe.vector.overlay_analysis.spatial_join(
    input=parcels_clip,
    join_layer=soil_map,
    join_type='intersects',
    strategy='first',
    field_name='SOIL_CODE'
)

# 5. Dissolve by soil code to get soil extents within study area
soil_dissolved = wbe.vector.overlay_analysis.dissolve(input=parcels_with_soil, field_name='SOIL_CODE')
wbe.write_vector(soil_dissolved, 'soil_study_area.shp')

# 6. Spatial join sample points with soil polygons
samples_enriched = wbe.vector.overlay_analysis.spatial_join(
    input=sample_pts,
    join_layer=soil_dissolved,
    join_type='within',
    strategy='first',
    field_name='SOIL_CODE'
)
samples_enriched = wbe.conversion.vector_table_io.add_point_coordinates_to_table(samples_enriched)

# 7. Export for external analysis
wbe.conversion.vector_table_io.export_table_to_csv(samples_enriched, 'samples_with_soil.csv')
print('Vector analysis pipeline complete.')

Tips

  • Always validate topology before analysis: Run check_vector_topology() to detect overshoots, undershoots, self-intersections, and sliver polygons. Topological errors propagate through overlay and spatial join operations.
  • Build spatial indices on large layers: Large datasets (> 10,000 features) benefit from spatial indexing. Use build_spatial_index() explicitly before repeated spatial queries; operations like containment or proximity are fast with indices.
  • Choose your overlay operation carefully: Union retains all boundaries and combines attributes (can create many small slivers). Intersection keeps only overlapping regions. Difference retains Polygon A minus Polygon B. Test on small subsets first.
  • Dissolve reduces feature count and file size: After overlay, dissolve by ownership or category to collapse unnecessary edges. Dissolved layers render faster and are cleaner for publication.
  • Spatial joins are sensitive to alignment: Ensure both input layers use the same CRS and are free of topology errors. Reproject to equal-area projection before computing buffer distances or areas for analysis.
  • Buffer distance and units matter: Buffer distances are in map units (meters, feet, degrees). Use an equal-area projection if precise areas or distances are critical. Negative buffers can collapse small polygons (inset); test with small buffer values first.
  • Attribute table size is a memory constraint: Attribute tables with millions of rows and dozens of fields consume RAM. Export to CSV or database for large tables; work with summaries or samples when memory is limited.
  • Point-in-polygon operations scale with complexity: Containment tests are O(n) per point; on large datasets (> 1 million points), consider spatial index binning or vector-to-raster conversion for speed.

Online Data Downloads

This chapter focuses on downloading vector data directly from online providers into Whitebox workflows. The initial implementation is OpenStreetMap (OSM) via Overpass API using download_osm_vector.

Scope and Current Provider

Current tool:

  • wbe.vector.online_data.download_osm_vector(...)

The tool downloads OSM features within a longitude/latitude bounding box (EPSG:4326), optionally filters by theme, and writes output in any supported vector format based on output extension.

Quick Start

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

roads = wbe.vector.online_data.download_osm_vector(
    west=-80.54,
    south=43.41,
    east=-80.47,
    north=43.47,
    filter_preset="roads",
    include_points=False,
    include_lines=True,
    include_polygons=False,
)

wbe.write_vector(roads, "kitchener_roads.geojson")

Presets and Filters

Preset classes:

  • all
  • roads
  • buildings
  • water
  • landuse
  • trails
  • parks
  • rail
  • amenities
  • boundaries
  • transit
  • poi

Optional custom filters:

  • filter_key="amenity"
  • filter_key_value="amenity=school"

If custom filters are supplied, they take precedence over preset filtering.

Geometry Controls

Use geometry toggles to reduce result size and parsing overhead:

  • include_points
  • include_lines
  • include_polygons

Typical examples:

  • Road centerlines only: points off, lines on, polygons off
  • Building footprints only: points off, lines off, polygons on

Phase 2 options:

  • split_output_by_geometry=True writes separate files with _points, _lines, and _polygons suffixes.

Caching and Provenance

Use optional caching when iterating on the same AOI/filter query repeatedly:

  • cache_dir=".wbw_cache/osm"
  • cache_ttl_hours=24 (set 0 to disable TTL checks)

Use provenance_output to write a JSON sidecar with endpoint, bbox, filters, feature counts, and cache usage metadata.

roads = wbe.vector.online_data.download_osm_vector(
    west=-80.54,
    south=43.41,
    east=-80.47,
    north=43.47,
    filter_preset="trails",
    include_points=False,
    include_lines=True,
    include_polygons=False,
    split_output_by_geometry=True,
    cache_dir=".wbw_cache/osm",
    cache_ttl_hours=24,
    provenance_output="kitchener_trails_provenance.json",
    output="kitchener_trails.geojson",
)

Projection and Output

Rules:

  • Query extent is interpreted as EPSG:4326 (lon/lat).
  • Set input_extent_epsg to provide west/south/east/north in another CRS (the bbox is transformed to EPSG:4326 before querying Overpass).
  • Output stays EPSG:4326 unless output_epsg is provided.
  • Output format is inferred from filename extension (.shp, .gpkg, .geojson, .topojson, ...).

Endpoint selection:

  • overpass_profile supports: main, kumi, fr, custom.
  • overpass_url overrides the selected profile URL when provided.

Large-AOI chunking:

  • chunk_large_aoi=True (default) automatically tiles large query extents.
  • chunk_max_area_deg2=4.0 controls maximum area per chunk.
  • max_chunk_count=64 caps the number of generated chunk requests.
  • chunk_parallel_requests=1 (default) controls bounded parallel chunk fetch; set >1 to fetch chunks concurrently.
buildings = wbe.vector.online_data.download_osm_vector(
    west=-80.54,
    south=43.41,
    east=-80.47,
    north=43.47,
    filter_preset="buildings",
    include_points=False,
    include_lines=False,
    include_polygons=True,
    output_epsg=32617,
)

wbe.write_vector(buildings, "kitchener_buildings_utm17n.gpkg")

Operational Guidance

Overpass public endpoints enforce rate limits. Prefer smaller AOIs and bounded requests.

Recommended practice:

  • Keep AOIs compact
  • Use thematic filters
  • Set max_elements defensively
  • Increase timeout_seconds for denser urban queries

Attribution and Licensing

OSM data are provided under ODbL. When distributing derived datasets or maps, ensure proper OpenStreetMap attribution and verify downstream licensing obligations for your use case.

Companion Example

See:

  • crates/wbw_python/examples/osm_download_vector.py

Network Analysis

Whitebox Next Gen has deep capabilities across the full network-analysis spectrum: topology auditing, point-to-point routing, service areas, closest facility, OD cost matrices, location-allocation, accessibility metrics, sensitivity analysis, multimodal transit modelling, map matching, and fleet dispatch optimization. This chapter walks through those capabilities in the order you would encounter them in a real project.

Capability Note (Open Tier)

The workflows in this chapter target the open-tier engine and include advanced network controls such as turn/u-turn penalties, node-entry costs, and optional time-dependent edge profiles (temporal_cost_profile + departure_time) for scenario and peak-period analysis.


Core Concepts You Should Know First

Before running tools, it helps to align on a few core terms used throughout this chapter.

  • Network: A graph made of edges (road or transit segments) and nodes (intersections, stops, junctions).
  • Cost / impedance: The value minimized during routing. This can be distance, travel time, generalized cost, or another friction metric.
  • Origin / destination (OD): Origins are trip start points; destinations are trip end points.
  • OD matrix: A table of costs from many origins to many destinations. This is the standard structure for accessibility, market access, and assignment analyses.
  • Shortest path: The minimum-cost path between one origin and one destination.
  • K-shortest paths: The best k distinct alternatives between the same OD pair, useful for resilience and choice modelling.
  • Service area (isochrone): The portion of the network reachable from an origin within a cost threshold (for example 10 minutes).
  • Closest facility: For each incident or demand point, the least-cost route to the nearest candidate facility on the network.
  • Location-allocation: Selecting facility sites that optimize a demand objective, such as minimizing total travel cost or maximizing coverage.
  • Connectivity: Whether all required origins and destinations are in the same connected component. Disconnected components cause failed routes.
  • Node degree: The number of edges touching a node. Degree supports basic network centrality interpretation and QA for odd junction structure.
  • Multimodal routing: Pathfinding across multiple transport modes (walk/bus/rail) with transfer penalties and mode constraints.
  • Map matching: Snapping GPS trajectories to the most plausible sequence of network edges.

If you keep these definitions in mind, each workflow step below becomes easier to interpret and validate.

Modeling Intersection Delay With Node Costs

Network tools in this chapter support optional node-entry cost modeling for intersections, gates, crossings, or turn-heavy junctions:

  • node_cost_points: point layer containing node-cost observations.
  • node_cost_field: numeric field in node_cost_points with non-negative entry cost values.
  • node_cost_snap_distance: optional max assignment distance from each node-cost point to the nearest network node.

Use node costs when edge impedance alone underestimates urban delay at intersections.


Step 1 — Prepare and Audit the Network

Every routing workflow should begin with a topology check. A single dangling endpoint or disconnected island can silently invalidate a shortest-path result.

Topology Audit

network_topology_audit() scans a line network for common errors — dead ends, pseudo-nodes, overshoots, and isolated islands — and writes each flagged location as a point feature. It also produces an optional text report.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.working_directory = '/data/network'
wbe.verbose = True

roads = wbe.read_vector('roads.shp')

errors, report_path = wbe.vector.network_analysis.network_topology_audit(
    roads,
    snap_tolerance=0.5,
    one_way_field='ONEWAY',
    report='topology_report.txt'
)
wbe.write_vector(errors, 'topology_errors.shp')

Review topology_report.txt before continuing. Understanding the error count and class distribution will guide how much cleaning the network needs.

Connected Components

An isolated cluster of road segments that cannot reach the main network will cause any OD or closest-facility query to fail for origins or destinations on that cluster. network_connected_components() labels every edge with its component identifier.

roads_comps = wbe.vector.network_analysis.network_connected_components(roads, snap_tolerance=0.5)
wbe.write_vector(roads_comps, 'roads_components.shp')
# Edges not in the dominant component are candidates for removal or bridging.

Node Degree

network_node_degree() writes the degree (number of connected edges) of every node as a point layer. Degree-1 nodes are dead ends; unusually high-degree nodes may indicate duplicate arcs.

nodes = wbe.vector.network_analysis.network_node_degree(roads, snap_tolerance=0.5)
wbe.write_vector(nodes, 'node_degree.shp')

Building Network Topology

If your raw network lacks proper topology (node points, edge connectivity structure), use build_network_topology() to construct nodes and validate edges. This is essential before running advanced analysis like service areas or facility location.

edges, nodes = wbe.vector.network_analysis.build_network_topology(
    roads,
    snap_tolerance=0.5,
    output_nodes=True
)
wbe.write_vector(edges, 'roads_noded.shp')
wbe.write_vector(nodes, 'network_nodes.shp')

Snapping Facilities and Demand Points

Before routing from facilities or demand points, snap them to the nearest network location. This ensures routing queries don't fail on "off-network" origins.

facilities = wbe.read_vector('fire_stations.shp')

snapped = wbe.vector.network_analysis.snap_points_to_network(
    network=edges,
    points=facilities,
    snap_distance=50.0,  # meters
    search_by_nearest=True
)
wbe.write_vector(snapped, 'fire_stations_snapped.shp')
# Output includes SNAP_DIST (offset to network) and snapped geometry.

Step 2 — Shortest Path and Alternatives

Single Shortest Path

shortest_path_network() finds the minimum-cost path between two coordinates using Dijkstra's algorithm. Supply an edge_cost_field to use travel-time or impedance; omit it to route by Euclidean arc length.

path = wbe.vector.network_analysis.shortest_path_network(
    roads,
    start_x=454230.0, start_y=4823150.0,
    end_x=458900.0,   end_y=4819700.0,
    snap_tolerance=20.0,
    edge_cost_field='MINUTES',
    one_way_field='ONEWAY'
)
wbe.write_vector(path, 'route_shortest.shp')

Turn penalties model the real-world cost of left, right, and U-turns — these can substantially alter optimal routes in dense urban networks.

path_turns = wbe.vector.network_analysis.shortest_path_network(
    roads,
    start_x=454230.0, start_y=4823150.0,
    end_x=458900.0,   end_y=4819700.0,
    snap_tolerance=20.0,
    edge_cost_field='MINUTES',
    one_way_field='ONEWAY',
    node_cost_points='intersection_delay_points.shp',
    node_cost_field='DELAY_MIN',
    node_cost_snap_distance=25.0,
    turn_penalty=0.5,
    u_turn_penalty=3.0,
    forbid_u_turns=True
)
wbe.write_vector(path_turns, 'route_with_turns.shp')

K-Shortest Alternative Paths

k_shortest_paths_network() returns the k least-cost distinct paths between the same endpoints. Use this for resilience analysis, route-choice modelling, or presenting alternatives to planners.

alt_paths = wbe.vector.network_analysis.k_shortest_paths_network(
    roads,
    start_x=454230.0, start_y=4823150.0,
    end_x=458900.0,   end_y=4819700.0,
    k=3,
    snap_tolerance=20.0,
    edge_cost_field='MINUTES',
    one_way_field='ONEWAY'
)
wbe.write_vector(alt_paths, 'routes_k3.shp')
# Each feature carries a PATH_RANK attribute (1 = shortest).

Step 3 — Service Areas

network_service_area() delineates every part of the network reachable within a cost threshold from one or more origins. Typical uses include drive-time catchments for emergency services, walking isochrones for transit stops, and delivery zones.

fire_stations = wbe.read_vector('fire_stations.shp')

catchment = wbe.vector.network_analysis.network_service_area(
    roads,
    origins=fire_stations,
    max_cost=5.0,               # 5 minutes
    snap_tolerance=20.0,
    output_mode='polygon',      # 'nodes', 'edges', or 'polygon'
    polygon_merge_origins=True, # dissolve overlapping catchments
    edge_cost_field='MINUTES',
    one_way_field='ONEWAY'
)
wbe.write_vector(catchment, 'fire_catchment_5min.shp')

If your network encodes one-way streets, pass one_way_field and use FT/TF/B style values when available (FT = first-to-last only, TF = last-to-first only, B = bidirectional). Legacy boolean-style values are still accepted.

To model rush-hour conditions, pass a temporal speed profile and apply turn penalties. Edge speeds are scaled by the profile multipliers at the specified departure time.

catchment_peak = wbe.vector.network_analysis.network_service_area(
    roads,
    origins=fire_stations,
    max_cost=5.0,
    snap_tolerance=20.0,
    output_mode='polygon',
    polygon_merge_origins=True,
    edge_cost_field='MINUTES',
    one_way_field='ONEWAY',
    node_cost_points='intersection_delay_points.shp',
    node_cost_field='DELAY_MIN',
    node_cost_snap_distance=25.0,
    turn_penalty=0.3,
    u_turn_penalty=2.0,
    forbid_u_turns=True,
    temporal_cost_profile='rush_hour_profiles.csv',
    temporal_edge_id_field='EDGE_ID',
    departure_time='2024-06-15T08:00:00Z',
    temporal_mode='multiplier',
    temporal_fallback='static_cost'
)
wbe.write_vector(catchment_peak, 'fire_catchment_5min_am_peak.shp')

Use output_mode='edges' to retain the actual road arcs inside the catchment rather than fill a polygon — more appropriate when the network is sparse or when the arc-level result is needed for reporting.


Step 4 — Closest Facility

closest_facility_network() routes each incident point to its nearest facility, measuring cost along the network rather than in straight-line distance. This is the core tool for emergency-response siting, healthcare access studies, and school-catchment delineation.

accidents   = wbe.read_vector('accidents.shp')
hospitals   = wbe.read_vector('hospitals.shp')

routes_to_hosp = wbe.vector.network_analysis.closest_facility_network(
    roads,
    incidents=accidents,
    facilities=hospitals,
    snap_tolerance=20.0,
    edge_cost_field='MINUTES',
    one_way_field='ONEWAY'
)
wbe.write_vector(routes_to_hosp, 'routes_to_hospital.shp')
# Output carries INCIDENT_FID, FACILITY_FID, and COST fields per route.

If your network uses explicit one-way encodings, closest_facility_network() accepts FT/TF/B values as well as legacy boolean-style one-way fields.

For peak-hour response-time analysis, combine turn penalties with a temporal speed profile.

routes_peak = wbe.vector.network_analysis.closest_facility_network(
    roads,
    incidents=accidents,
    facilities=hospitals,
    snap_tolerance=20.0,
    edge_cost_field='MINUTES',
    one_way_field='ONEWAY',
    node_cost_points='intersection_delay_points.shp',
    node_cost_field='DELAY_MIN',
    node_cost_snap_distance=25.0,
    turn_penalty=0.5,
    u_turn_penalty=3.0,
    forbid_u_turns=True,
    temporal_cost_profile='rush_hour_profiles.csv',
    temporal_edge_id_field='EDGE_ID',
    departure_time='2024-06-15T08:00:00Z',
    temporal_mode='multiplier',
    temporal_fallback='static_cost'
)
wbe.write_vector(routes_peak, 'routes_to_hospital_am_peak.shp')

Step 5 — OD Cost Matrix and Batch Route Export

When you need costs between many origins and many destinations simultaneously, an OD matrix is far more efficient than looping over shortest_path_network().

OD Cost Matrix

network_od_cost_matrix() solves all pairwise paths and writes the results to a CSV. Each row contains an origin identifier, a destination identifier, and the network cost between them.

schools   = wbe.read_vector('schools.shp')
libraries = wbe.read_vector('libraries.shp')

cost_csv = wbe.vector.network_analysis.network_od_cost_matrix(
    roads,
    origins=schools,
    destinations=libraries,
    snap_tolerance=20.0,
    edge_cost_field='MINUTES',
    one_way_field='ONEWAY'
)
print('OD matrix written to:', cost_csv)

If your network uses explicit one-way encodings, network_od_cost_matrix() accepts FT/TF/B values as well as legacy boolean-style one-way fields.

The CSV is directly usable in pandas or any tabular analysis tool.

import pandas as pd
df = pd.read_csv(cost_csv)
print(df.groupby('ORIGIN_FID')['COST'].min().describe())

For a time-of-day comparison, run a second matrix at AM-peak departure and compare cost distributions.

cost_csv_am = wbe.vector.network_analysis.network_od_cost_matrix(
    roads,
    origins=schools,
    destinations=libraries,
    snap_tolerance=20.0,
    edge_cost_field='MINUTES',
    one_way_field='ONEWAY',
    node_cost_points='intersection_delay_points.shp',
    node_cost_field='DELAY_MIN',
    node_cost_snap_distance=25.0,
    turn_penalty=0.5,
    temporal_cost_profile='am_peak_profiles.csv',
    temporal_edge_id_field='EDGE_ID',
    departure_time='2024-06-15T08:00:00Z',
    temporal_mode='multiplier',
    temporal_fallback='static_cost'
)
print('AM-peak OD matrix written to:', cost_csv_am)

Materializing OD Routes as Geometry

To visualize or spatially analyse the actual path lines between OD pairs, use network_routes_from_od().

od_routes = wbe.vector.network_analysis.network_routes_from_od(
    roads,
    origins=schools,
    destinations=libraries,
    snap_tolerance=20.0,
    edge_cost_field='MINUTES',
    one_way_field='ONEWAY'
)
wbe.write_vector(od_routes, 'od_routes_schools_to_libraries.shp')

If your network uses explicit one-way encodings, network_routes_from_od() accepts FT/TF/B values as well as legacy boolean-style one-way fields.


Step 6 — Location-Allocation

location_allocation_network() solves the classic p-median problem: given candidate facility locations and weighted demand points, which p facilities minimise total travel cost? Use this for clinic siting, school consolidation, warehouse network design, and similar strategic planning problems.

demand     = wbe.read_vector('demand_points.shp')  # population-weighted
candidates = wbe.read_vector('candidate_sites.shp')

sited = wbe.vector.network_analysis.location_allocation_network(
    roads,
    demand_points=demand,
    facilities=candidates,
    facility_count=4,
    solver_mode='minimize_impedance',
    demand_weight_field='POP',
    snap_tolerance=20.0,
    edge_cost_field='MINUTES'
)
wbe.write_vector(sited, 'selected_facilities.shp')
# SELECTED == 1 on the four chosen candidate sites.
# Demand points carry ASSIGNED_FID linking each to its nearest selected site.

To optimise facility placement under peak-hour travel times rather than free-flow speeds, pass a temporal profile.

sited_peak = wbe.vector.network_analysis.location_allocation_network(
    roads,
    demand_points=demand,
    facilities=candidates,
    facility_count=4,
    solver_mode='minimize_impedance',
    demand_weight_field='POP',
    snap_tolerance=20.0,
    edge_cost_field='MINUTES',
    node_cost_points='intersection_delay_points.shp',
    node_cost_field='DELAY_MIN',
    node_cost_snap_distance=25.0,
    temporal_cost_profile='am_peak_profiles.csv',
    temporal_edge_id_field='EDGE_ID',
    departure_time='2024-06-15T08:00:00Z',
    temporal_mode='multiplier',
    temporal_fallback='static_cost'
)
wbe.write_vector(sited_peak, 'selected_facilities_am_peak.shp')

Solver modes include minimize_impedance (p-median), maximize_coverage, and maximize_attendance. Required and forbidden facility flags let you fix certain sites open or closed before the solver runs.


Step 7 — Network Accessibility Metrics

compute_network_accessibility() measures how accessible a set of destinations is from each origin, applying a decay function that down-weights distant facilities. The result is a gravity-model or cumulative-opportunity accessibility score per origin — a standard indicator in transport equity analysis.

residents    = wbe.read_vector('resident_centroids.shp')
supermarkets = wbe.read_vector('supermarkets.shp')

accessibility = wbe.compute_network_accessibility(
    roads,
    origins=residents,
    destinations=supermarkets,
    edge_cost_field='MINUTES',
    one_way_field='DIR',
    node_cost_points='intersection_delay_points.shp',
    node_cost_field='DELAY_MIN',
    node_cost_snap_distance=25.0,
    impedance_cutoff=30.0,
    decay_function='negative_exponential',
    decay_parameter=0.1
)
wbe.write_vector(accessibility, 'food_accessibility.shp')
# Each origin point carries an ACCESS_SCORE field.

When a one_way_field is provided, one-way values can use FT/TF/B conventions (FT = first-to-last only, TF = last-to-first only, B = bidirectional), as well as legacy boolean-style encodings.


Step 8 — OD Sensitivity Analysis

analyze_od_cost_sensitivity() quantifies how stable OD costs are under stochastic perturbation of edge weights. Use it to stress-test a routing model against uncertainty in travel-time estimates, or to assess the impact of hypothetical congestion or road-closure scenarios.

sensitivity = wbe.analyze_od_cost_sensitivity(
    roads,
    origins=schools,
    destinations=libraries,
    edge_cost_field='MINUTES',
    node_cost_points='intersection_delay_points.shp',
    node_cost_field='DELAY_MIN',
    node_cost_snap_distance=25.0,
    impedance_disturbance_range=0.2,  # ±20 % perturbation
    monte_carlo_samples=500
)
wbe.write_vector(sensitivity, 'od_sensitivity.shp')

Step 9 — Multimodal Analysis

Whitebox Next Gen supports networks that carry a MODE field on each edge (e.g. walk, cycle, bus, rail). The multimodal tools honour mode-specific speeds, transfer penalties, and time-of-day profiles.

Multimodal OD Scenarios

analyze_multimodal_od_scenarios() runs a batch of named scenarios defined by a CSV, each specifying different mode allowances, speed overrides, or departure times. The output is a combined cost table across all scenarios for rapid before/after or modal-mix comparisons.

transit_net    = wbe.read_vector('transit_network.shp')
bus_stops      = wbe.read_vector('bus_stops.shp')
destinations   = wbe.read_vector('key_destinations.shp')

result = wbe.analyze_multimodal_od_scenarios(
    input=transit_net,
    origins=bus_stops,
    destinations=destinations,
    output='multimodal_od_scenarios.shp',
    mode_field='MODE',
    allowed_modes='walk,bus,rail',
    transfer_penalty=3.0,
    edge_cost_field='MINUTES',
    scenario_bundle_csv='scenarios.csv',
    departure_time='08:00',
    temporal_mode='scheduled'
)

Exporting Multimodal Route Geometry

export_multimodal_routes_for_od_pairs() materializes the optimal multimodal route for each OD pair as a line feature with per-segment mode attributes.

mm_routes = wbe.export_multimodal_routes_for_od_pairs(
    input=transit_net,
    origins=bus_stops,
    destinations=destinations,
    output='multimodal_routes.shp',
    mode_field='MODE',
    allowed_modes='walk,bus,rail',
    transfer_penalty=3.0,
    edge_cost_field='MINUTES'
)

Step 10 — Map Matching

map_matching_v1() snaps a raw GPS trajectory to the most plausible sequence of network edges using a hidden Markov model with candidate expansion. It is the first step in any floating-vehicle data or probe-data workflow.

gps_points = wbe.read_vector('gps_probe_points.shp')

matched_path, match_report = wbe.vector.network_analysis.map_matching_v1(
    roads,
    trajectory_points=gps_points,
    timestamp_field='TIMESTAMP',
    search_radius=30.0,
    candidate_k=5,
    snap_tolerance=10.0,
    edge_cost_field='MINUTES',
    matched_points_output='matched_points.shp',
    match_report='match_report.txt'
)
wbe.write_vector(matched_path, 'matched_route.shp')

The match report summarises per-point confidence scores and the percentage of trajectory points that were successfully snapped to the network.


Step 11 — Fleet and Vehicle Routing (Pro)

fleet_routing_and_dispatch_optimizer solves Capacitated Vehicle Routing Problems (CVRP) and Vehicle Routing with Time Windows (VRPTW): given a fleet of vehicles at one or more depots and a set of service or delivery stops, it assigns stops to vehicles and sequences each route to minimise total travel cost subject to capacity and time-window constraints.

result = wbe.run_tool(
    'fleet_routing_and_dispatch_optimizer',
    {
        'network':               'roads.shp',
        'depots':                'depots.shp',
        'stops':                 'delivery_stops.shp',
        'vehicles_csv':          'fleet.csv',
        'route_output':          'fleet_routes.shp',
        'route_kpis_csv_output': 'fleet_kpis.csv',
        'edge_cost_field':       'MINUTES',
        'one_way_field':         'ONEWAY',
        'vrp_mode':              'VRPTW'
    }
)
print(result)

The KPI CSV reports per-route capacity utilization, total distance, time, and stop count — ready for import into logistics dashboards.

Note: This tool requires a WbEnvironment initialised with a valid Pro licence.


Complete Workflow: Emergency Response Planning

The following example chains topology audit → service-area catchment → closest-facility → location-allocation into a single emergency-planning analysis.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.working_directory = '/data/emergency_planning'

roads      = wbe.read_vector('roads.shp')
stations   = wbe.read_vector('fire_stations.shp')
candidates = wbe.read_vector('candidate_stations.shp')
incidents  = wbe.read_vector('historical_incidents.shp')

# 1. Audit topology before running any queries.
errors, _ = wbe.vector.network_analysis.network_topology_audit(
    roads, snap_tolerance=0.5, report='topology_report.txt'
)
wbe.write_vector(errors, 'topology_errors.shp')

# 2. Map 5-minute drive catchments from existing stations.
catchment = wbe.vector.network_analysis.network_service_area(
    roads,
    origins=stations,
    max_cost=5.0,
    output_mode='polygon',
    polygon_merge_origins=True,
    edge_cost_field='MINUTES',
    snap_tolerance=20.0
)
wbe.write_vector(catchment, 'existing_catchment_5min.shp')

# 3. Route each historical incident to its nearest station.
routes = wbe.vector.network_analysis.closest_facility_network(
    roads,
    incidents=incidents,
    facilities=stations,
    snap_tolerance=20.0,
    edge_cost_field='MINUTES'
)
wbe.write_vector(routes, 'incident_routes.shp')

# 4. Find two additional station locations that maximise coverage.
sited = wbe.vector.network_analysis.location_allocation_network(
    roads,
    demand_points=incidents,
    facilities=candidates,
    facility_count=2,
    solver_mode='maximize_coverage',
    snap_tolerance=20.0,
    edge_cost_field='MINUTES'
)
wbe.write_vector(sited, 'new_station_sites.shp')

Tips

  • Always run network_topology_audit() first — even one disconnected segment can cause a path query to return no result without an explicit error.
  • Use network_connected_components() to confirm that all origins and destinations belong to the same component before running OD queries.
  • Supply edge_cost_field pointing to a pre-computed travel-time field for realistic routing; omit it only for pure geometric distance problems.
  • For time-sensitive routing, use temporal_cost_profile and departure_time to load scheduled speeds at the time of travel.
  • For multimodal networks, store the mode identifier in a field called MODE and use allowed_modes to control which modes are permitted per query.
  • The fleet_routing_and_dispatch_optimizer Pro tool requires a WbEnvironment initialised with a valid Pro licence.

Linear Referencing

Linear referencing is the practice of locating observations, measurements, or events along a route using a distance-based measure rather than absolute coordinates. A road asset database might record a pothole at 2.4 km from the start of route R-12; a utility corridor might flag a pressure anomaly at 847 m along a pipeline. Whitebox Next Gen provides the tools to build measured routes, locate observations against them, place events from tables or spatial layers, and — with a Pro licence — audit the consistency and governance of large linear asset datasets.


Core Concepts

A linear-referencing workflow has three parts:

  1. Routes — line features that define the measurement axis. Each route has a unique identifier and carries M-values (measures) representing cumulative distance from its start point.
  2. Measures — the distance value used to locate a position along a route.
  3. Events — point or line observations located by (route_id, measure) or (route_id, from_measure, to_measure) pairs.

Common Whitebox Next Gen applications include:

  • road-pavement condition assessment by segment
  • pipeline corridor integrity monitoring
  • trail difficulty and visibility reporting
  • environmental sampling along survey transects
  • GPS probe data quality control and kilometrage reporting

Route Calibration and M-Value Management

Measures are only useful when anchored to real-world control points. If your raw routes lack calibration, or have been edited, use the calibration tools to establish stable, field-verified measures.

Initial Calibration from Control Points

route_calibrate() establishes measure values on routes using control points with known measures. For example, if you have kilometre posts at known distances along a highway, calibration ensures your event locations align with field reality.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.working_directory = '/data/linear_referencing'

routes = wbe.read_vector('highway_centerlines.shp')
km_posts = wbe.read_vector('km_post_locations.shp')  # with ROUTE_ID and KNOWN_MEASURE fields

calibrated = wbe.vector.linear_referencing.route_calibrate(
    routes=routes,
    control_points=km_posts,
    control_measure_field='KNOWN_MEASURE',
    route_id_field='ROUTE_ID',
    snap_tolerance=10.0  # max control-point offset from route (meters)
)
wbe.write_vector(calibrated, 'highway_calibrated.shp')
# Output includes FROM_MEASURE and TO_MEASURE fields.

Recalibration After Edits

If you edit routes (split, merge, or regeometrize), use route_recalibrate() to scale measures proportionally and maintain event alignment.

edited_routes = wbe.read_vector('highway_edited.shp')  # after geometric changes

recalibrated = wbe.vector.linear_referencing.route_recalibrate(
    original_routes=calibrated,     # reference with valid measures
    edited_routes=edited_routes,
    route_id_field='ROUTE_ID'
)
wbe.write_vector(recalibrated, 'highway_recalibrated.shp')

Validating Event Snapping

Before placing events on routes, diagnose snapping issues with snap_events_to_routes(). This reports snap distance, offset, and any unmatched events.

obs_points = wbe.read_vector('field_observations.shp')

diag = wbe.vector.linear_referencing.snap_events_to_routes(
    routes=calibrated,
    events=obs_points,
    max_offset_distance=15.0
)
wbe.write_vector(diag, 'observations_snap_diagnostics.shp')
# Output includes ROUTE_ID, MEASURE, and OFFSET fields; unmatched features are excluded.

Step 1 — Understand Your Route Geometry

Routes must be single-part polylines with a consistent digitizing direction. Before dropping events, confirm:

  • Each route has a unique identifier stored in a field (e.g. ROUTE_ID).
  • No route self-intersects.
  • Routes that form a corridor are merged into one feature per route identifier.

Use snap_endnodes and merge_line_segments to clean ragged street-centreline inputs before treating them as routes.


Step 2 — Locate Points Along Routes

locate_points_along_routes() takes an existing point layer and finds the nearest position on each matching route, writing back the M-value (measure) for every point. This is the first tool to reach for when your field team has collected GPS observation points and you need to convert them to route-distance offsets for further analysis.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.working_directory = '/data/linear_referencing'

routes      = wbe.read_vector('roads_measured.shp')
obs_points  = wbe.read_vector('field_observations.shp')

located = wbe.vector.linear_referencing.locate_points_along_routes(
    routes=routes,
    points=obs_points,
    max_offset_distance=15.0   # max perpendicular snap distance in map units
)
wbe.write_vector(located, 'observations_located.shp')
# Output adds ROUTE_ID, MEASURE, and OFFSET fields to every input point.

The MEASURE field in the output is the along-route distance from the route start. OFFSET is the perpendicular distance from the point to the route. Points beyond max_offset_distance are not matched and are excluded from the output.


Step 3 — Place Events from a Table

Point Events

route_event_points_from_table() reads a CSV (or other tabular file) where each row specifies a route identifier and a measure value, and places a point feature at that position along the matching route. This is the standard import path for lab results, inspection records, or maintenance logs stored in external databases.

# events.csv columns: ROUTE_ID, MEASURE, SEVERITY, NOTES
pts = wbe.vector.linear_referencing.route_event_points_from_table(
    routes=routes,
    events='pavement_defects.csv',
    event_route_field='ROUTE_ID',
    measure_field='MEASURE'
)
wbe.write_vector(pts, 'pavement_defects_located.shp')

Line (Interval) Events

route_event_lines_from_table() works the same way but reads FROM_MEASURE and TO_MEASURE columns to produce line segments — useful for pavement condition ratings, speed zones, or any attribute that applies to a stretch of route rather than a single point.

# condition.csv columns: ROUTE_ID, FROM_MEASURE, TO_MEASURE, IRI, CONDITION
segs = wbe.vector.linear_referencing.route_event_lines_from_table(
    routes=routes,
    events='pavement_condition.csv',
    event_route_field='ROUTE_ID',
    from_measure_field='FROM_MEASURE',
    to_measure_field='TO_MEASURE'
)
wbe.write_vector(segs, 'pavement_condition_segments.shp')

Step 4 — Place Events from a Spatial Layer

When your event data is already a vector layer rather than a plain table, use the _from_layer variants. These carry across all attributes of the source feature and can optionally write the feature's original FID and XY coordinates into the output.

Point Events from a Layer

inspections = wbe.read_vector('manhole_inspections.shp')

pts_from_layer = wbe.vector.linear_referencing.route_event_points_from_layer(
    routes=routes,
    events=inspections,
    event_route_field='ROUTE_ID',
    measure_field='MEASURE',
    write_event_fid=True,   # retain original feature ID in output
    write_event_xy=True     # also store original XY in output
)
wbe.write_vector(pts_from_layer, 'manholes_on_routes.shp')

Line Events from a Layer

speed_zones = wbe.read_vector('speed_zone_events.shp')

segs_from_layer = wbe.vector.linear_referencing.route_event_lines_from_layer(
    routes=routes,
    events=speed_zones,
    event_route_field='ROUTE_ID',
    from_measure_field='FROM_M',
    to_measure_field='TO_M',
    write_event_fid=True
)
wbe.write_vector(segs_from_layer, 'speed_zones_on_routes.shp')

Step 5 — Linear Asset Governance (Pro)

route_event_governance_for_linear_assets audits a complete linear asset dataset for measure gaps, overlaps, duplicate events, orphaned route references, and out-of-range measures. It produces a flagged event output and a structured report — essential for maintaining the integrity of a production road or utility asset database.

result = wbe.run_tool(
    'route_event_governance_for_linear_assets',
    {
        'routes':             'roads_measured.shp',
        'events':             'pavement_condition.shp',
        'route_id_field':     'ROUTE_ID',
        'from_measure_field': 'FROM_MEASURE',
        'to_measure_field':   'TO_MEASURE',
        'flagged_output':     'event_flags.shp',
        'report':             'governance_report.csv'
    }
)
print(result)

The flagged output marks each event with an error code and description. The report CSV summarises error counts by class and route, ready for integration into a maintenance management system.

Note: This tool requires a WbEnvironment initialised with a valid Pro licence.


Complete Workflow: Road Pavement Assessment

The following example takes a road network, locates inspection points collected in the field, overlays a condition rating table, and exports both a point layer and a segment layer ready for pavement management reporting.

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.working_directory = '/data/pavement_assessment'

routes         = wbe.read_vector('roads_measured.shp')
field_gps      = wbe.read_vector('field_inspection_gps.shp')

# Step 1: Snap GPS observation points onto routes and extract M-values.
located = wbe.vector.linear_referencing.locate_points_along_routes(
    routes=routes,
    points=field_gps,
    max_offset_distance=10.0
)
wbe.write_vector(located, 'gps_on_routes.shp')

# Step 2: Place point defect records from the inspection database.
defects = wbe.vector.linear_referencing.route_event_points_from_table(
    routes=routes,
    events='defect_records.csv',
    event_route_field='ROUTE_ID',
    measure_field='MEASURE'
)
wbe.write_vector(defects, 'defects_located.shp')

# Step 3: Place condition rating intervals from the same database.
condition = wbe.vector.linear_referencing.route_event_lines_from_table(
    routes=routes,
    events='condition_ratings.csv',
    event_route_field='ROUTE_ID',
    from_measure_field='FROM_M',
    to_measure_field='TO_M'
)
wbe.write_vector(condition, 'condition_segments.shp')

# Step 4 (Pro): Audit the condition layer for gaps and overlaps.
result = wbe.run_tool(
    'route_event_governance_for_linear_assets',
    {
        'routes':             'roads_measured.shp',
        'events':             'condition_segments.shp',
        'route_id_field':     'ROUTE_ID',
        'from_measure_field': 'FROM_M',
        'to_measure_field':   'TO_M',
        'flagged_output':     'condition_flags.shp',
        'report':             'governance_report.csv'
    }
)
print('Governance report:', result)

Tips

  • Routes must have a consistent digitizing direction. If your source network was assembled from multiple editors, run snap_endnodes and check that all segments in a single route are digitized in the same direction before computing M-values.
  • locate_points_along_routes() excludes points that exceed max_offset_distance. Inspect unmatched points to identify GPS outliers or route coverage gaps.
  • Use route_event_points_from_table() and route_event_lines_from_table() for bulk imports from asset management databases where the event locations are already stored as route+measure pairs.
  • Use the _from_layer variants when you have existing vector event layers that already carry a route identifier and measure fields.
  • The route_event_governance_for_linear_assets Pro tool scales to production road-asset databases with millions of events and produces actionable error reports ready for integration into maintenance workflows.

Script Index

Use this index to map manual chapters to runnable examples.

Use this chapter as a bridge from concept to execution. When adapting examples, start from the script closest to your data type and operational goal, then copy its structure before changing parameters. This tends to produce safer edits than assembling a workflow from isolated snippets.

Quick Start and Core API

  • crates/wbw_python/examples/quickstart_harmonized_api.py - minimal end-to-end startup and first tool run
  • crates/wbw_python/examples/wbenvironment_example.py - environment configuration and discovery baseline
  • crates/wbw_python/examples/current_api_data_handling_demo.py - data object handling patterns across core types

Raster Workflows

  • crates/wbw_python/examples/raster_numpy_roundtrip.py - single-band array conversion and writeback
  • crates/wbw_python/examples/raster_numpy_multiband_roundtrip.py - multiband array access and persistence
  • crates/wbw_python/examples/unary_raster_tools_example.py - tool-driven raster transform chain

Vector Workflows

  • crates/wbw_python/examples/vector_attributes_harmonized_api.py - schema inspection and attribute updates
  • crates/wbw_python/examples/vector_multifile_write_demo.py - format-aware vector export behavior
  • crates/wbw_python/examples/vector_topojson_roundtrip.py - TopoJSON read/write roundtrip workflow
  • crates/wbw_python/examples/osm_download_vector.py - OpenStreetMap download workflow via Overpass API

Lidar Workflows

  • crates/wbw_python/examples/lidar_write_options.py - LAZ and COPC output option tuning
  • crates/wbw_python/examples/lidar_numpy_roundtrip_smoke_test.py - point-field numpy roundtrip validation
  • crates/wbw_python/examples/lidar_chunked_pipeline.py - chunked processing pattern for large clouds

Sensor Bundle Workflows

  • crates/wbw_python/examples/sensor_bundle_overview.py - cross-family inspection and key discovery

Interoperability and Validation

  • crates/wbw_python/examples/interop_roundtrip_smoke_test.py - cross-library roundtrip sanity checks

Licensing

  • crates/wbw_python/examples/licensing_offline_example.py - local entitlement startup pattern
  • crates/wbw_python/examples/licensing_floating_online_example.py - floating-license provider bootstrap pattern

Dynamic Output Smoke Tests

  • crates/wbw_python/examples/dynamic_single_output_smoke_test.py - runtime single-output behavior checks
  • crates/wbw_python/examples/dynamic_multi_output_smoke_test.py - runtime multi-output behavior checks