aperta.geo_processing

Geometric and raster-sampling utilities used across aperta.

Two broad concerns:

  1. Geometric helpers — building hectare / H3 grids, generating point or polygon geometries from coordinate data, projecting between coordinate reference systems, computing per-cell spatial neighborhoods. Distinct from geo_mapping, which is specifically about mapping data between geo units.

  2. Raster samplingsample_raster_at_points reads values from a raster file at given point coordinates (used for elevation lookups in topography.py). Requires the optional topo extra (rasterio).

aperta.geo_processing.remove_duplicate_indices(df)[source]

Remove duplicate indices from DataFrame or Series. Often useful after spatial join.

Parameters:

df (DataFrame)

Return type:

DataFrame

aperta.geo_processing.line_bearings_deg(geoms)[source]

Compass bearing (degrees, [0, 360)) at each line’s midpoint.

For each (Multi)LineString, the polyline segment containing the half-arclength point is identified, and the bearing of that segment is returned — i.e. the local direction of travel at the midpoint, not the first→last chord. For straight single-segment lines the two are identical; for curved polylines they diverge. MultiLineString parts are treated as if concatenated end-to-end in their stored order.

= +y (North), increasing clockwise (90° = East, 180° = South, 270° = West). The bearing reflects the line’s stored vertex order — the line A→B has the opposite bearing of B→A. For undirected geometries take bearings % 180.

Lines must be in a projected CRS with x = easting, y = northing (e.g. LV95 / EPSG:2056); a geographic CRS would introduce longitude-convergence error. Degenerate or empty geometries (no length, missing, or fewer than 2 vertices) return NaN.

Parameters:

geoms (GeoSeries)

Return type:

Series

aperta.geo_processing.line_segment_bearing_at(line, dist_along)[source]

Compass bearing of the polyline segment containing an arbitrary dist_along.

Same convention as line_bearings_deg but at any arclength position, not just the midpoint. Useful after snapping a point to a line, when you want the line’s local direction at the snap point (e.g. to compare a traffic counter’s bearing against the local edge bearing).

dist_along is in the line’s CRS units (meters in LV95). Clamped to [0, line.length]. MultiLineString parts are treated as concatenated end-to-end. Empty / degenerate inputs return NaN.

Parameters:

dist_along (float)

Return type:

float

aperta.geo_processing.angular_diff_deg(a, b, undirected=False)[source]

Smallest unsigned difference (in degrees) between two compass bearings.

Default (undirected=False): treats inputs as directions of travel — returns the circular distance in [0, 180]. E.g. angular_diff_deg(45, 350) is 55, not 305; opposing directions (90° vs 270°) give 180.

undirected=True: treats inputs as orientations of an undirected line — additionally folds across 180°, so A→B and B→A are equivalent. Returns [0, 90]. Use this when matching e.g. a directed traffic counter to an undirected road segment that carries traffic both ways: a counter bearing 90° aligns with edges at both 90° and 270°.

Accepts scalars or aligned numpy/pandas arrays; return type follows np.minimum (scalar in → scalar-like out).

Parameters:

undirected (bool)

aperta.geo_processing.sum_within_radius(gdf, cols, radius, *, return_densities=False, add_filled_densities=False)[source]

Same-set neighbourhood sum: for each cell in gdf, sum the values of cols over all cells (including itself) within radius.

For the cross-set case (targets × sources with two different GeoDataFrames) use [[cross_sum_within_radius]] instead.

The regular sum, divided by the circular query area when return_densities=True, gives per-km² density. The “filled” variant additionally divides by the occupied area (cells where at least one value in cols is > 0) — useful where occupancy is sparse and the circular-area normaliser dilutes the signal (e.g. employment concentrated at an airport).

The CRS of gdf must represent meters if densities are returned. Densities are returned per km².

Implementation: builds a scipy.spatial.KDTree on cell centroids and constructs a sparse 0/1 adjacency matrix from query_ball_point(radius); each cell’s own row is implicitly included (since dist(p, p) = 0 ≤ radius). Per-cell sums are then adj @ values in one vectorised pass.

Parameters:
Return type:

DataFrame

aperta.geo_processing.cross_sum_within_radius(targets, sources, radius, *, weight_column=None, return_density=False, name='aggregate')[source]

Cross-set neighbourhood sum: for each target geometry, count source geometries (or sum a weight_column over them) within radius of the target’s centroid.

For the same-set case (one gdf, queried against itself with self-inclusion, multiple value columns at once, optional “filled-area” densities) use [[sum_within_radius]] instead. The two helpers are deliberately separate: same-set and cross-set queries differ in input shape (one gdf vs two) and in optimal backend (single sparse-matrix multiply vs per-target KDTree query).

Backend: scipy cKDTree on source coordinates — O(N_src log N_src) one- time build, O(log N_src + k) per query.

Both targets and sources must share a metric CRS for densities (and the radius) to be meaningful in real-world units. The function uses the centroid of each geometry, so non-point inputs (polygons, lines) work fine — their centroid is queried.

Parameters:
  • targets (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries. Centroid of each geometry is the query point; targets.index becomes the output Series index.

  • sources (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries. Centroid of each source is the point that’s counted / summed. If weight_column is given, sources must be a GeoDataFrame carrying that column.

  • radius (float) – query radius in CRS units (typically metres).

  • weight_column (str | None) – name of the column on sources to sum. None (default) counts source geometries within radius instead.

  • return_density (bool) – if True, divide aggregates by π · radius² (the circular query area). Caller is responsible for ensuring CRS units are metric.

  • name (str) – name for the returned Series.

Returns:

pd.Series indexed by targets.index. Targets with no sources in range get 0 (or 0.0 density).

Return type:

Series

aperta.geo_processing.sample_raster_at_points(points, raster_path, *, band=1, method='bilinear', name='raster_value')[source]

Sample raster values at point centroids — vectorized, single-pass.

Reads the raster once and indexes into the array with either bilinear interpolation (default — right for continuous fields like elevation; smooths pixel-boundary quantization noise) or nearest-neighbour (for categorical rasters like land cover or zoning, where blending values is meaningless).

For ~100k+ points this is orders of magnitude faster (and far lighter on memory) than rasterio.Dataset.sample, which does a GDAL RasterIO() call per point and is the bottleneck inside osmnx.elevation.add_node_elevations_raster.

Points whose (row, col) falls outside the raster — or whose pixel equals the raster’s nodata value — get NaN. (For ‘bilinear’, any of the 4 surrounding pixels being nodata propagates to NaN.) The caller is responsible for ensuring points.crs matches the raster CRS; the function does not reproject. Polygon / line inputs are sampled at their centroid.

Parameters:
  • points (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries; sampling location is each geometry’s centroid; points.index becomes the output index.

  • raster_path – path to a single-band raster (GeoTIFF, VRT, etc.).

  • band (int) – 1-based band index.

  • method (str) – ‘bilinear’ (default) interpolates the 4 surrounding pixels weighted by fractional distance; ‘nearest’ returns the value of the pixel containing the point.

  • name (str) – name for the returned Series.

Returns:

pd.Series of floats indexed by points.index.

Return type:

Series

aperta.geo_processing.build_h3_grid(polygon, resolution, *, polygon_crs='EPSG:4326', target_crs=None, id_column='cell_id')[source]

Build a GeoDataFrame of H3 hex cells covering polygon.

One row per H3 cell whose centre lies inside polygon. Geometry is the closed hex polygon. The H3 library operates in EPSG:4326 (WGS84); the input polygon is reprojected if polygon_crs != ‘EPSG:4326’, and the output is optionally reprojected to target_crs.

For nested tiers (e.g. cells at H3 res 10, zones at res 8), call once per resolution OR derive the coarser tier by h3.cell_to_parent on the finer one (cheaper, and the nesting is exact). Pure replacement of the private _h3_cells_for_polygon / _h3_cell_to_polygon helpers seen across aperta example notebooks.

Parameters:
  • polygon – input area as a shapely Polygon or MultiPolygon.

  • resolution (int) – H3 resolution (0–15). Lower = larger cells.

  • polygon_crs (str) – CRS of the input polygon (default ‘EPSG:4326’).

  • target_crs (str | None) – optional output CRS (cells are reprojected if given).

  • id_column (str) – name for the H3 cell-ID index (default ‘cell_id’).

Returns:

GeoDataFrame indexed by H3 cell ID (string), with a geometry column. CRS is target_crs if given, else ‘EPSG:4326’.

Return type:

GeoDataFrame