# Spatial Analysis ## Attribute Joins Combine datasets based on common variables using standard pandas merge: ```python # Merge on common column result = gdf.merge(df, on='common_column') # Left join result = gdf.merge(df, on='common_column', how='left') # Important: Call merge on GeoDataFrame to preserve geometry # This works: gdf.merge(df, ...) # This doesn't: df.merge(gdf, ...) # Returns DataFrame, not GeoDataFrame ``` ## Spatial Joins Combine datasets based on spatial relationships. ### Binary Predicate Joins (sjoin) Join based on geometric predicates: ```python # Intersects (default) joined = gpd.sjoin(gdf1, gdf2, how='inner', predicate='intersects') # Available predicates joined = gpd.sjoin(gdf1, gdf2, predicate='contains') joined = gpd.sjoin(gdf1, gdf2, predicate='within') joined = gpd.sjoin(gdf1, gdf2, predicate='touches') joined = gpd.sjoin(gdf1, gdf2, predicate='crosses') joined = gpd.sjoin(gdf1, gdf2, predicate='overlaps') # Join types joined = gpd.sjoin(gdf1, gdf2, how='left') # Keep all from left joined = gpd.sjoin(gdf1, gdf2, how='right') # Keep all from right joined = gpd.sjoin(gdf1, gdf2, how='inner') # Intersection only ``` The `how` parameter determines which geometries are retained: - **left**: Retains left GeoDataFrame's index and geometry - **right**: Retains right GeoDataFrame's index and geometry - **inner**: Uses intersection of indices, keeps left geometry ### Nearest Joins (sjoin_nearest) Join to nearest features: ```python # Find nearest neighbor nearest = gpd.sjoin_nearest(gdf1, gdf2) # Add distance column nearest = gpd.sjoin_nearest(gdf1, gdf2, distance_col='distance') # Limit search radius (significantly improves performance) nearest = gpd.sjoin_nearest(gdf1, gdf2, max_distance=1000) # Find k nearest neighbors nearest = gpd.sjoin_nearest(gdf1, gdf2, k=5) ``` ## Overlay Operations Set-theoretic operations combining geometries from two GeoDataFrames: ```python # Intersection - keep areas where both overlap intersection = gpd.overlay(gdf1, gdf2, how='intersection') # Union - combine all areas union = gpd.overlay(gdf1, gdf2, how='union') # Difference - areas in first not in second difference = gpd.overlay(gdf1, gdf2, how='difference') # Symmetric difference - areas in either but not both sym_diff = gpd.overlay(gdf1, gdf2, how='symmetric_difference') # Identity - intersection + difference identity = gpd.overlay(gdf1, gdf2, how='identity') ``` Result includes attributes from both input GeoDataFrames. ## Dissolve (Aggregation) Aggregate geometries based on attribute values: ```python # Dissolve by attribute dissolved = gdf.dissolve(by='region') # Dissolve with aggregation functions dissolved = gdf.dissolve(by='region', aggfunc='sum') dissolved = gdf.dissolve(by='region', aggfunc={'population': 'sum', 'area': 'mean'}) # Dissolve all into single geometry dissolved = gdf.dissolve() # Preserve internal boundaries dissolved = gdf.dissolve(by='region', as_index=False) ``` ## Clipping Clip geometries to boundary of another geometry: ```python # Clip to polygon boundary clipped = gpd.clip(gdf, boundary_polygon) # Clip to another GeoDataFrame clipped = gpd.clip(gdf, boundary_gdf) ``` ## Appending Combine multiple GeoDataFrames: ```python import pandas as pd # Concatenate GeoDataFrames (CRS must match) combined = pd.concat([gdf1, gdf2], ignore_index=True) # With keys for identification combined = pd.concat([gdf1, gdf2], keys=['source1', 'source2']) ``` ## Spatial Indexing Improve performance for spatial operations: ```python # GeoPandas uses spatial index automatically for most operations # Access the spatial index directly sindex = gdf.sindex # Query geometries intersecting a bounding box possible_matches_index = list(sindex.intersection((xmin, ymin, xmax, ymax))) possible_matches = gdf.iloc[possible_matches_index] # Query geometries intersecting a polygon possible_matches_index = list(sindex.query(polygon_geometry)) possible_matches = gdf.iloc[possible_matches_index] ``` Spatial indexing significantly speeds up: - Spatial joins - Overlay operations - Queries with geometric predicates ## Distance Calculations ```python # Distance between geometries distances = gdf1.geometry.distance(gdf2.geometry) # Distance to single geometry distances = gdf.geometry.distance(single_point) # Minimum distance to any feature min_dist = gdf.geometry.distance(point).min() ``` ## Area and Length Calculations For accurate measurements, ensure proper CRS: ```python # Reproject to appropriate projected CRS for area/length calculations gdf_projected = gdf.to_crs(epsg=3857) # Or appropriate UTM zone # Calculate area (in CRS units, typically square meters) areas = gdf_projected.geometry.area # Calculate length/perimeter (in CRS units) lengths = gdf_projected.geometry.length ```