Files
gh-k-dense-ai-claude-scient…/skills/geopandas/references/spatial-analysis.md
2025-11-30 08:30:10 +08:00

4.7 KiB

Spatial Analysis

Attribute Joins

Combine datasets based on common variables using standard pandas merge:

# Merge on common column
result = gdf.merge(df, on='common_column')

# Left join
result = gdf.merge(df, on='common_column', how='left')

# Important: Call merge on GeoDataFrame to preserve geometry
# This works: gdf.merge(df, ...)
# This doesn't: df.merge(gdf, ...) # Returns DataFrame, not GeoDataFrame

Spatial Joins

Combine datasets based on spatial relationships.

Binary Predicate Joins (sjoin)

Join based on geometric predicates:

# Intersects (default)
joined = gpd.sjoin(gdf1, gdf2, how='inner', predicate='intersects')

# Available predicates
joined = gpd.sjoin(gdf1, gdf2, predicate='contains')
joined = gpd.sjoin(gdf1, gdf2, predicate='within')
joined = gpd.sjoin(gdf1, gdf2, predicate='touches')
joined = gpd.sjoin(gdf1, gdf2, predicate='crosses')
joined = gpd.sjoin(gdf1, gdf2, predicate='overlaps')

# Join types
joined = gpd.sjoin(gdf1, gdf2, how='left')   # Keep all from left
joined = gpd.sjoin(gdf1, gdf2, how='right')  # Keep all from right
joined = gpd.sjoin(gdf1, gdf2, how='inner')  # Intersection only

The how parameter determines which geometries are retained:

  • left: Retains left GeoDataFrame's index and geometry
  • right: Retains right GeoDataFrame's index and geometry
  • inner: Uses intersection of indices, keeps left geometry

Nearest Joins (sjoin_nearest)

Join to nearest features:

# Find nearest neighbor
nearest = gpd.sjoin_nearest(gdf1, gdf2)

# Add distance column
nearest = gpd.sjoin_nearest(gdf1, gdf2, distance_col='distance')

# Limit search radius (significantly improves performance)
nearest = gpd.sjoin_nearest(gdf1, gdf2, max_distance=1000)

# Find k nearest neighbors
nearest = gpd.sjoin_nearest(gdf1, gdf2, k=5)

Overlay Operations

Set-theoretic operations combining geometries from two GeoDataFrames:

# Intersection - keep areas where both overlap
intersection = gpd.overlay(gdf1, gdf2, how='intersection')

# Union - combine all areas
union = gpd.overlay(gdf1, gdf2, how='union')

# Difference - areas in first not in second
difference = gpd.overlay(gdf1, gdf2, how='difference')

# Symmetric difference - areas in either but not both
sym_diff = gpd.overlay(gdf1, gdf2, how='symmetric_difference')

# Identity - intersection + difference
identity = gpd.overlay(gdf1, gdf2, how='identity')

Result includes attributes from both input GeoDataFrames.

Dissolve (Aggregation)

Aggregate geometries based on attribute values:

# Dissolve by attribute
dissolved = gdf.dissolve(by='region')

# Dissolve with aggregation functions
dissolved = gdf.dissolve(by='region', aggfunc='sum')
dissolved = gdf.dissolve(by='region', aggfunc={'population': 'sum', 'area': 'mean'})

# Dissolve all into single geometry
dissolved = gdf.dissolve()

# Preserve internal boundaries
dissolved = gdf.dissolve(by='region', as_index=False)

Clipping

Clip geometries to boundary of another geometry:

# Clip to polygon boundary
clipped = gpd.clip(gdf, boundary_polygon)

# Clip to another GeoDataFrame
clipped = gpd.clip(gdf, boundary_gdf)

Appending

Combine multiple GeoDataFrames:

import pandas as pd

# Concatenate GeoDataFrames (CRS must match)
combined = pd.concat([gdf1, gdf2], ignore_index=True)

# With keys for identification
combined = pd.concat([gdf1, gdf2], keys=['source1', 'source2'])

Spatial Indexing

Improve performance for spatial operations:

# GeoPandas uses spatial index automatically for most operations
# Access the spatial index directly
sindex = gdf.sindex

# Query geometries intersecting a bounding box
possible_matches_index = list(sindex.intersection((xmin, ymin, xmax, ymax)))
possible_matches = gdf.iloc[possible_matches_index]

# Query geometries intersecting a polygon
possible_matches_index = list(sindex.query(polygon_geometry))
possible_matches = gdf.iloc[possible_matches_index]

Spatial indexing significantly speeds up:

  • Spatial joins
  • Overlay operations
  • Queries with geometric predicates

Distance Calculations

# Distance between geometries
distances = gdf1.geometry.distance(gdf2.geometry)

# Distance to single geometry
distances = gdf.geometry.distance(single_point)

# Minimum distance to any feature
min_dist = gdf.geometry.distance(point).min()

Area and Length Calculations

For accurate measurements, ensure proper CRS:

# Reproject to appropriate projected CRS for area/length calculations
gdf_projected = gdf.to_crs(epsg=3857)  # Or appropriate UTM zone

# Calculate area (in CRS units, typically square meters)
areas = gdf_projected.geometry.area

# Calculate length/perimeter (in CRS units)
lengths = gdf_projected.geometry.length