Vector Analysis

Vector data are the primary format for discrete geographic features — points, lines, and polygons representing everything from sample locations and road networks to property parcels and watershed boundaries. Whitebox Workflows for Python (WbW-Py) provides a comprehensive set of vector analysis tools covering attribute management, geometric measurement, spatial overlay, proximity analysis, topology repair, shape analysis, and vector-to-raster conversion.


Core Concepts

Vector analysis depends on understanding these core concepts:

  • Feature geometry: Points (single coordinate pairs), lines (ordered sequences of coordinate pairs), and polygons (rings of coordinates forming closed boundaries). Each feature type supports different analyses.
  • Topology: The spatial relationships between features (adjacency, containment, intersection). Topological errors (overshoots, undershoots, self-intersections) corrupt overlay operations and topology queries.
  • Attribute table: The database associated with each feature layer, carrying descriptive fields and values. Attribute queries filter features; joins link external tables.
  • Spatial index: Internal index structure (R-tree or quadtree) enabling fast spatial queries. Used by intersection, containment, and proximity operations. Always build spatial indices on frequently queries layers.
  • Envelope (bounding box): The minimum rectangular boundary of a feature or layer; used for quick spatial culling before geometry tests.
  • Buffer: A polygon created at a fixed distance around a feature. Buffers model proximity zones and are fundamental to distance-based analysis.
  • Overlay (intersection, union, difference): Combining two polygon layers to create a new layer. Union merges boundaries; intersection keeps overlapping area only; difference removes one layer from another.
  • Dissolve (aggregation): Merging adjacent features with identical attribute values, creating larger aggregate features. Reduces feature count and simplifies geometry.
  • Spatial join: Associating features from one layer with features from another based on spatial relationship (overlap, containment, proximity). Assigns attributes across layers.
  • Proximity analysis: Finding nearest features, distances, or connectivity. Foundation for network analysis, market analysis, and accessibility studies.

Reading and Writing Vectors

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.working_directory = '/data/vectors'

# Read a single vector
watersheds = wbe.read_vector('watersheds.shp')
streams    = wbe.read_vector('streams.shp')
outlets    = wbe.read_vector('outlets.shp')

# Read multiple at once
[roads, buildings, parks] = wbe.read_vectors('roads.shp', 'buildings.shp', 'parks.shp')

# Write results
wbe.write_vector(watersheds, 'watersheds_processed.shp')

Attribute Table Management

Adding and Removing Fields

# Add a new numeric field
watersheds = wbe.vector.attribute_analysis.add_field(watersheds, field_name='AREA_KM2', field_type='Float')

# Rename a field
watersheds = wbe.vector.attribute_analysis.rename_field(watersheds,
                               old_field_name='OBJECTID',
                               new_field_name='WS_ID')

# Delete a field
watersheds = wbe.vector.attribute_analysis.delete_field(watersheds, field_name='TEMP_FIELD')

# Reset the entire attribute table to only an FID column
watersheds = wbe.conversion.vector_table_io.reinitialize_attribute_table(watersheds)

Filtering by Attribute

# Select features where upstream area exceeds 50 km²
large_ws = wbe.vector.attribute_analysis.extract_by_attribute(watersheds,
                                     field_name='AREA_KM2',
                                     operator='>',
                                     value=50.0)
wbe.write_vector(large_ws, 'large_watersheds.shp')

Joining Tables

# Merge a CSV attribute table into a vector by a shared key field
import csv
merged = wbe.conversion.vector_table_io.merge_table_with_csv(watersheds,
                                   csv_file='watershed_stats.csv',
                                   join_field='WS_ID')

Exporting the Attribute Table

wbe.conversion.vector_table_io.export_table_to_csv(watersheds, 'watershed_attributes.csv')

Listing Unique Values

wbe.vector.attribute_analysis.list_unique_values(watersheds, field_name='REGION')

Geometric Measurement

Polygon Area and Perimeter

# Compute polygon area (adds AREA field to attribute table)
watersheds = wbe.vector.shape_metrics.polygon_area(watersheds)

# Compute perimeter (adds PERIMETER field)
watersheds = wbe.vector.shape_metrics.polygon_perimeter(watersheds)

Shape Indices

Shape indices quantify the geometric complexity and elongation of polygon features. They are widely used in ecology (patch metrics), hydrology (watershed form), and urban analysis:

# Compactness ratio — measures how closely a polygon approximates a circle
# Perfectly circular = 1.0; lower values = more elongated
watersheds = wbe.vector.shape_metrics.compactness_ratio(watersheds)

# Elongation ratio — based on minimum bounding box dimensions
watersheds = wbe.vector.shape_metrics.elongation_ratio(watersheds)

# Linearity index — R² of an RMA regression through hull vertices
# Higher values indicate long, narrow linear shapes
watersheds = wbe.vector.shape_metrics.linearity_index(watersheds)

# Related circumscribing circle
watersheds = wbe.vector.shape_metrics.related_circumscribing_circle(watersheds)

# Boundary shape complexity
watersheds = wbe.raster.general.boundary_shape_complexity(watersheds)

# Hole proportion — fraction of polygon area that is holes
watersheds = wbe.vector.shape_metrics.hole_proportion(watersheds)

# Shape complexity (vector)
watersheds = wbe.vector.shape_metrics.shape_complexity_index_vector(watersheds)

# Patch orientation (degrees from north of long axis)
watersheds = wbe.vector.shape_metrics.patch_orientation(watersheds)

# Narrowness index
watersheds = wbe.vector.shape_metrics.narrowness_index(watersheds)

# Radius of gyration (area-weighted centroid distance)
watersheds = wbe.raster.general.radius_of_gyration(watersheds)

Point Coordinate Addition

# Add X, Y (and optionally Z) coordinate columns to a point vector
sample_pts = wbe.conversion.vector_table_io.add_point_coordinates_to_table(sample_pts)

Centroids, Bounding Boxes, and Convex Hulls

# Point centroid of each polygon
centroids = wbe.vector.geometry_processing.centroid_vector(watersheds)
wbe.write_vector(centroids, 'watershed_centroids.shp')

# Minimum bounding box for each polygon
bboxes = wbe.vector.geometry_processing.minimum_bounding_box(watersheds)

# Minimum bounding circle
circles = wbe.vector.geometry_processing.minimum_bounding_circle(watersheds)

# Minimum bounding envelope (overall)
envelope = wbe.vector.geometry_processing.minimum_bounding_envelope(watersheds)

# Minimum convex hull
hull = wbe.vector.geometry_processing.minimum_convex_hull(watersheds)

# Layer footprint (bounding box of entire layer)
footprint = wbe.vector.sampling_gridding.layer_footprint_vector(watersheds)

# Long axis and short axis lines
long_axis  = wbe.vector.shape_metrics.polygon_long_axis(watersheds)
short_axis = wbe.vector.shape_metrics.polygon_short_axis(watersheds)

Smoothing, Simplification, and Geometry Operations

# Smooth vertices by averaging — reduces digitising artefacts
smooth = wbe.vector.geometry_processing.smooth_vectors(streams, filter_size=5)

# Douglas-Peucker line simplification
simplified = wbe.vector.geometry_processing.simplify_features(streams, snap_distance=10.0)

# Split long lines into segments of maximum length
segmented = wbe.vector.geometry_processing.split_vector_lines(streams, segment_length=1000.0)

# Extend line endpoints by a specified distance
extended = wbe.vector.geometry_processing.extend_vector_lines(streams, dist=50.0, extend_type='both')

# Split polygons or lines using another line layer
split_polys = wbe.vector.geometry_processing.split_with_lines(watersheds, split_lines=roads)

# Lines to polygon conversion (close and fill each polyline)
polys_from_lines = wbe.conversion.geometry_topology.lines_to_polygons(outline_lines)

# Polygons to lines (extract boundary lines)
lines_from_polys = wbe.conversion.geometry_topology.polygons_to_lines(watersheds)

# Convert multipart features to singlepart
single = wbe.conversion.geometry_topology.multipart_to_singlepart(watersheds)

# Convert singlepart to multipart by shared attribute value
multi = wbe.conversion.geometry_topology.singlepart_to_multipart(parcels, field_name='OWNER_ID')

# Merge all features in two or more files into one layer
merged_streams = wbe.conversion.vector_table_io.merge_vectors(streams_a, streams_b)

# Clean topology (remove duplicate vertices and degenerate features)
cleaned = wbe.conversion.vector_table_io.clean_vector(streams)

# Remove polygon holes smaller than a threshold
no_holes = wbe.conversion.geometry_topology.remove_polygon_holes(watersheds)

Spatial Overlay

WbW-Py supports the full suite of vector set-theoretic overlay operations:

Clip

Cuts one layer to the extent of another, retaining only features within the clip polygon:

# Clip roads to a study area polygon
roads_clipped = wbe.vector.overlay_analysis.clip(input=roads, clip=study_area)
wbe.write_vector(roads_clipped, 'roads_study_area.shp')

Intersect

Returns the geometric intersection of two layers, keeping the portions where they overlap and combining attributes from both:

soil_in_watershed = wbe.vector.overlay_analysis.intersect(input=soil_polygons, overlay=watershed_boundary,
                                   snap_tolerance=1e-6)
wbe.write_vector(soil_in_watershed, 'soil_in_watershed.shp')

Erase (Difference)

Removes the area of one layer from another:

# Remove urban areas from the vegetation layer
rural_veg = wbe.vector.overlay_analysis.erase(input=vegetation, erase_layer=urban_boundaries)

Union

Combines two polygon layers and divides overlapping areas, retaining all features from both:

combined = wbe.vector.overlay_analysis.union(input=zoning, overlay=flood_zones)
wbe.write_vector(combined, 'zoning_flood_overlay.shp')

Symmetrical Difference

Returns only the non-overlapping portions of each layer:

sym_diff = wbe.vector.overlay_analysis.symmetrical_difference(input=year1_polygons, overlay=year2_polygons)

Dissolve

Merges features that share a common attribute value:

# Merge all polygons of the same land-cover class
dissolved = wbe.vector.overlay_analysis.dissolve(input=landcover_polygons, field_name='CLASS')
wbe.write_vector(dissolved, 'landcover_dissolved.shp')

Proximity and Near Analysis

Euclidean Distance to Nearest Feature

# Find the distance from each sample point to the nearest road
near_result = wbe.vector.overlay_analysis.near(input=sample_pts, feature=roads)
# Adds NEAR_DIST and NEAR_FID fields to sample_pts
wbe.write_vector(near_result, 'samples_near_roads.shp')

Voronoi Diagram (Thiessen Polygons)

Thiessen polygons partition space so every location is assigned to the nearest source point:

voronoi = wbe.vector.sampling_gridding.voronoi_diagram(sample_pts)
wbe.write_vector(voronoi, 'thiessen_polygons.shp')

Convex Hull and Medoid

hull_pts = wbe.vector.geometry_processing.minimum_convex_hull(sample_pts)  # minimum convex hull of point set
med      = wbe.vector.sampling_gridding.medoid(sample_pts)               # geometric median of a set of points

Select by Location

Spatial queries allow selection of features based on their geometric relationship to a second layer:

# Select all stream segments that intersect wetland polygons
streams_in_wetlands = wbe.vector.overlay_analysis.select_by_location(
    input=streams,
    comparison=wetlands,
    geometry_type='intersects'
)
wbe.write_vector(streams_in_wetlands, 'streams_in_wetlands.shp')

Spatial Join

Spatial join transfers attributes from a join layer to an input layer based on spatial proximity or overlap:

# Join soil class to sample points based on the polygon they fall within
pts_with_soil = wbe.vector.overlay_analysis.spatial_join(
    input=sample_pts,
    join_layer=soil_polygons,
    join_type='within',       # 'within', 'intersects', 'nearest'
    strategy='first',         # 'first', 'last', 'count', 'sum', 'mean', 'min', 'max'
    field_name='SOIL_CLASS'
)
wbe.write_vector(pts_with_soil, 'samples_with_soil.shp')

Vector Grids

Create regular grids of vector polygons covering a raster or vector extent. Useful for stratified sampling and landscape analysis at fixed spatial scales:

# Hexagonal grid with resolution based on an existing raster
hex_grid = wbe.vector.sampling_gridding.hexagonal_grid_from_raster_base(dem)
wbe.write_vector(hex_grid, 'hexgrid.shp')

# Rectangular grid
rec_grid = wbe.vector.sampling_gridding.rectangular_grid_from_raster_base(dem)
wbe.write_vector(rec_grid, 'recgrid.shp')

# Hexagonal grid with resolution based on a vector extent
hex_v = wbe.vector.sampling_gridding.hexagonal_grid_from_vector_base(watersheds, width=500.0)

Vector-to-Raster Conversion

# Rasterize polygon layer (burn polygon attribute value into grid cells)
lc_raster = wbe.conversion.raster_vector_conversion.vector_polygons_to_raster(
    input=landcover_polygons,
    field_name='CLASS_ID',
    cell_size=30.0
)
wbe.write_raster(lc_raster, 'landcover_raster.tif')

# Rasterize line layer
roads_raster = wbe.conversion.raster_vector_conversion.vector_lines_to_raster(
    input=roads,
    field_name='FID',
    cell_size=10.0
)

# Rasterize point layer
pts_raster = wbe.conversion.raster_vector_conversion.vector_points_to_raster(
    input=sample_pts,
    field_name='YIELD',
    assign_op='mean',  # 'first', 'last', 'min', 'max', 'sum', 'mean', 'number'
    cell_size=5.0
)

Field Calculator

The field calculator supports SQL-style and expression-style updates for vector attributes, including:

  • searched CASE WHEN ... THEN ... ELSE ... END
  • simple CASE field WHEN value THEN ... END
  • UPDATE ... SET ... [WHERE ...] wrapper syntax
  • SQL operators (=, <>, AND, OR, NOT), plus IS NULL / IS NOT NULL
  • CAST(... AS integer|float|text|boolean)
  • preview-first execution with preview_rows (no output write required)

Use it to create or update a field from existing attributes and geometry variables ($area, $length, $perimeter, centroid coordinates).

# Compute/update SPEED from TYPE using SQL-style CASE
watersheds = wbe.vector.attribute_analysis.field_calculator(
    input=watersheds,
    field='SPEED',
    field_type='integer',
    expression="CASE WHEN TYPE == 'motorway' THEN 100 WHEN TYPE == 'primary' THEN 80 ELSE 60 END",
    overwrite=True,
    output='watersheds_speed.gpkg'
)

# Preview-only evaluation (returns preview payload, omits output write)
preview = wbe.vector.attribute_analysis.field_calculator(
    input=watersheds,
    field='SPEED',
    field_type='integer',
    expression="CASE TYPE WHEN 'motorway' THEN 100 ELSE 60 END",
    overwrite=True,
    preview_rows=10
)

When preview_rows > 0, the tool returns preview records and normalized expression details that can be surfaced in UI workflows before committing final output writes.


Topological Utilities

Repair and Validation

# Snap nearby line endpoints to within a tolerance distance
streams_clean = wbe.streams.network_extraction.repair_stream_vector_topology(streams, snap_distance=1.0)

# Fix dangling arcs (lines that overshoot or undershoot intersections)
fixed = wbe.conversion.geometry_topology.fix_dangling_arcs(streams, snap_distance=1.0)

Line Intersections

# Find all intersection points between two line layers
intersections = wbe.vector.overlay_analysis.line_intersections(roads, rivers)
wbe.write_vector(intersections, 'road_river_crossings.shp')

Extract Nodes

# Extract all vertices of a line layer as points
nodes = wbe.vector.sampling_gridding.extract_nodes(streams)
wbe.write_vector(nodes, 'stream_nodes.shp')

Polygon Topology

# Polygonise a raster — convert raster regions to vector polygons
polys = wbe.vector.geometry_processing.polygonize(classified_raster)
wbe.write_vector(polys, 'class_polygons.shp')

Point Cluster Analysis

# Heat map (kernel density estimation)
density = wbe.raster.general.heat_map(sample_pts, bandwidth=500.0)
wbe.write_raster(density, 'point_density.tif')

# Vector hexagonal binning — count points per hexagon
hex_counts = wbe.vector.sampling_gridding.vector_hex_binning(sample_pts, width=1000.0, orientation='vertical')
wbe.write_vector(hex_counts, 'hex_counts.shp')

WbW-Pro Spotlight: Market Access and Site Intelligence

  • Problem: Rank candidate sites using repeatable network-access and demand logic.
  • Tool: market_access_and_site_intelligence_workflow
  • Typical inputs: Network, existing sites, candidate sites, demand surface, drive-time rings.
  • Typical outputs: Catchment polygons, competitive-overlap layer, candidate-ranking CSV, executive summary JSON.
import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()

result = wbe.run_tool(
    'market_access_and_site_intelligence_workflow',
    {
        'network': 'street_network.shp',
        'sites_existing': 'existing_sites.shp',
        'sites_candidates': 'candidate_sites.shp',
        'demand_surface': 'demand_points.shp',
        'ring_costs': [5.0, 10.0, 15.0],
        'catchments_output': 'candidate_catchments.shp',
        'overlap_analysis_output': 'competitive_overlap.shp',
        'candidate_rank_csv': 'candidate_rankings.csv',
        'executive_summary_json': 'market_summary.json'
    }
)
print(result)

Note: This workflow requires a WbEnvironment initialized with a valid Pro licence.


Complete Vector Analysis Workflow

The following script illustrates a full workflow from raw survey points to a clipped, dissolved, and enriched polygon layer:

import whitebox_workflows as wbw

wbe = wbw.WbEnvironment()
wbe.working_directory = '/data/vector_analysis'
wbe.verbose = True

# 1. Load layers
parcels       = wbe.read_vector('parcels.shp')
study_boundary = wbe.read_vector('study_boundary.shp')
soil_map      = wbe.read_vector('soil_types.shp')
sample_pts    = wbe.read_vector('sample_points.shp')

# 2. Clip parcels to study boundary
parcels_clip = wbe.vector.overlay_analysis.clip(input=parcels, clip=study_boundary)

# 3. Compute geometric attributes
parcels_clip = wbe.vector.shape_metrics.polygon_area(parcels_clip)
parcels_clip = wbe.vector.shape_metrics.polygon_perimeter(parcels_clip)
parcels_clip = wbe.vector.shape_metrics.compactness_ratio(parcels_clip)

# 4. Spatial join — assign soil type to each parcel
parcels_with_soil = wbe.vector.overlay_analysis.spatial_join(
    input=parcels_clip,
    join_layer=soil_map,
    join_type='intersects',
    strategy='first',
    field_name='SOIL_CODE'
)

# 5. Dissolve by soil code to get soil extents within study area
soil_dissolved = wbe.vector.overlay_analysis.dissolve(input=parcels_with_soil, field_name='SOIL_CODE')
wbe.write_vector(soil_dissolved, 'soil_study_area.shp')

# 6. Spatial join sample points with soil polygons
samples_enriched = wbe.vector.overlay_analysis.spatial_join(
    input=sample_pts,
    join_layer=soil_dissolved,
    join_type='within',
    strategy='first',
    field_name='SOIL_CODE'
)
samples_enriched = wbe.conversion.vector_table_io.add_point_coordinates_to_table(samples_enriched)

# 7. Export for external analysis
wbe.conversion.vector_table_io.export_table_to_csv(samples_enriched, 'samples_with_soil.csv')
print('Vector analysis pipeline complete.')

Tips

  • Always validate topology before analysis: Run check_vector_topology() to detect overshoots, undershoots, self-intersections, and sliver polygons. Topological errors propagate through overlay and spatial join operations.
  • Build spatial indices on large layers: Large datasets (> 10,000 features) benefit from spatial indexing. Use build_spatial_index() explicitly before repeated spatial queries; operations like containment or proximity are fast with indices.
  • Choose your overlay operation carefully: Union retains all boundaries and combines attributes (can create many small slivers). Intersection keeps only overlapping regions. Difference retains Polygon A minus Polygon B. Test on small subsets first.
  • Dissolve reduces feature count and file size: After overlay, dissolve by ownership or category to collapse unnecessary edges. Dissolved layers render faster and are cleaner for publication.
  • Spatial joins are sensitive to alignment: Ensure both input layers use the same CRS and are free of topology errors. Reproject to equal-area projection before computing buffer distances or areas for analysis.
  • Buffer distance and units matter: Buffer distances are in map units (meters, feet, degrees). Use an equal-area projection if precise areas or distances are critical. Negative buffers can collapse small polygons (inset); test with small buffer values first.
  • Attribute table size is a memory constraint: Attribute tables with millions of rows and dozens of fields consume RAM. Export to CSV or database for large tables; work with summaries or samples when memory is limited.
  • Point-in-polygon operations scale with complexity: Containment tests are O(n) per point; on large datasets (> 1 million points), consider spatial index binning or vector-to-raster conversion for speed.