4.7 KiB
4.7 KiB
Spatial Analysis
Attribute Joins
Combine datasets based on common variables using standard pandas merge:
# Merge on common column
result = gdf.merge(df, on='common_column')
# Left join
result = gdf.merge(df, on='common_column', how='left')
# Important: Call merge on GeoDataFrame to preserve geometry
# This works: gdf.merge(df, ...)
# This doesn't: df.merge(gdf, ...) # Returns DataFrame, not GeoDataFrame
Spatial Joins
Combine datasets based on spatial relationships.
Binary Predicate Joins (sjoin)
Join based on geometric predicates:
# Intersects (default)
joined = gpd.sjoin(gdf1, gdf2, how='inner', predicate='intersects')
# Available predicates
joined = gpd.sjoin(gdf1, gdf2, predicate='contains')
joined = gpd.sjoin(gdf1, gdf2, predicate='within')
joined = gpd.sjoin(gdf1, gdf2, predicate='touches')
joined = gpd.sjoin(gdf1, gdf2, predicate='crosses')
joined = gpd.sjoin(gdf1, gdf2, predicate='overlaps')
# Join types
joined = gpd.sjoin(gdf1, gdf2, how='left') # Keep all from left
joined = gpd.sjoin(gdf1, gdf2, how='right') # Keep all from right
joined = gpd.sjoin(gdf1, gdf2, how='inner') # Intersection only
The how parameter determines which geometries are retained:
- left: Retains left GeoDataFrame's index and geometry
- right: Retains right GeoDataFrame's index and geometry
- inner: Uses intersection of indices, keeps left geometry
Nearest Joins (sjoin_nearest)
Join to nearest features:
# Find nearest neighbor
nearest = gpd.sjoin_nearest(gdf1, gdf2)
# Add distance column
nearest = gpd.sjoin_nearest(gdf1, gdf2, distance_col='distance')
# Limit search radius (significantly improves performance)
nearest = gpd.sjoin_nearest(gdf1, gdf2, max_distance=1000)
# Find k nearest neighbors
nearest = gpd.sjoin_nearest(gdf1, gdf2, k=5)
Overlay Operations
Set-theoretic operations combining geometries from two GeoDataFrames:
# Intersection - keep areas where both overlap
intersection = gpd.overlay(gdf1, gdf2, how='intersection')
# Union - combine all areas
union = gpd.overlay(gdf1, gdf2, how='union')
# Difference - areas in first not in second
difference = gpd.overlay(gdf1, gdf2, how='difference')
# Symmetric difference - areas in either but not both
sym_diff = gpd.overlay(gdf1, gdf2, how='symmetric_difference')
# Identity - intersection + difference
identity = gpd.overlay(gdf1, gdf2, how='identity')
Result includes attributes from both input GeoDataFrames.
Dissolve (Aggregation)
Aggregate geometries based on attribute values:
# Dissolve by attribute
dissolved = gdf.dissolve(by='region')
# Dissolve with aggregation functions
dissolved = gdf.dissolve(by='region', aggfunc='sum')
dissolved = gdf.dissolve(by='region', aggfunc={'population': 'sum', 'area': 'mean'})
# Dissolve all into single geometry
dissolved = gdf.dissolve()
# Preserve internal boundaries
dissolved = gdf.dissolve(by='region', as_index=False)
Clipping
Clip geometries to boundary of another geometry:
# Clip to polygon boundary
clipped = gpd.clip(gdf, boundary_polygon)
# Clip to another GeoDataFrame
clipped = gpd.clip(gdf, boundary_gdf)
Appending
Combine multiple GeoDataFrames:
import pandas as pd
# Concatenate GeoDataFrames (CRS must match)
combined = pd.concat([gdf1, gdf2], ignore_index=True)
# With keys for identification
combined = pd.concat([gdf1, gdf2], keys=['source1', 'source2'])
Spatial Indexing
Improve performance for spatial operations:
# GeoPandas uses spatial index automatically for most operations
# Access the spatial index directly
sindex = gdf.sindex
# Query geometries intersecting a bounding box
possible_matches_index = list(sindex.intersection((xmin, ymin, xmax, ymax)))
possible_matches = gdf.iloc[possible_matches_index]
# Query geometries intersecting a polygon
possible_matches_index = list(sindex.query(polygon_geometry))
possible_matches = gdf.iloc[possible_matches_index]
Spatial indexing significantly speeds up:
- Spatial joins
- Overlay operations
- Queries with geometric predicates
Distance Calculations
# Distance between geometries
distances = gdf1.geometry.distance(gdf2.geometry)
# Distance to single geometry
distances = gdf.geometry.distance(single_point)
# Minimum distance to any feature
min_dist = gdf.geometry.distance(point).min()
Area and Length Calculations
For accurate measurements, ensure proper CRS:
# Reproject to appropriate projected CRS for area/length calculations
gdf_projected = gdf.to_crs(epsg=3857) # Or appropriate UTM zone
# Calculate area (in CRS units, typically square meters)
areas = gdf_projected.geometry.area
# Calculate length/perimeter (in CRS units)
lengths = gdf_projected.geometry.length