`aperta.geo_processing`¶

Geometric and raster-sampling utilities used across aperta.

Four broad concerns:

Geometric helpers — building hectare / H3 grids, generating point or polygon geometries from coordinate data, projecting between coordinate reference systems, computing per-cell spatial neighborhoods. Distinct from geo_mapping, which is specifically about mapping data between geo units.
Geometry math — line_bearings_deg, line_segment_bearing_at, angular_diff_deg, simplify_geometry. Operations whose inputs and outputs are geometries (or geometry-derived numerics like bearings).
Raster sampling — sample_raster_at_points reads values from a raster file at given point coordinates (used for elevation lookups). Requires the optional topo extra (rasterio).
DEM fetch — fetch_copernicus_dem downloads, mosaics, clips, and optionally reprojects Copernicus GLO-30 DEM tiles for a given polygon. Requires the optional topo extras (rasterio, requests).

For purely tabular helpers (column transforms, aggregation rules, metric discovery), see data_processing.

aperta.geo_processing.line_bearings_deg(geoms)[source]¶

Compass bearing (degrees, [0, 360)) at each line’s midpoint.

For each (Multi)LineString, the polyline segment containing the half-arclength point is identified, and the bearing of that segment is returned — i.e. the local direction of travel at the midpoint, not the first→last chord. For straight single-segment lines the two are identical; for curved polylines they diverge. MultiLineString parts are treated as if concatenated end-to-end in their stored order.

0° = +y (North), increasing clockwise (90° = East, 180° = South, 270° = West). The bearing reflects the line’s stored vertex order — the line A→B has the opposite bearing of B→A. For undirected geometries take bearings % 180.

Lines must be in a projected CRS with x = easting, y = northing (e.g. LV95 / EPSG:2056); a geographic CRS would introduce longitude-convergence error. Degenerate or empty geometries (no length, missing, or fewer than 2 vertices) return NaN.

Parameters:: geoms (GeoSeries)
Return type:: Series

aperta.geo_processing.line_segment_bearing_at(line, dist_along)[source]¶

Compass bearing of the polyline segment containing an arbitrary dist_along.

Same convention as line_bearings_deg but at any arclength position, not just the midpoint. Useful after snapping a point to a line, when you want the line’s local direction at the snap point (e.g. to compare a traffic counter’s bearing against the local edge bearing).

dist_along is in the line’s CRS units (meters in LV95). Clamped to [0, line.length]. MultiLineString parts are treated as concatenated end-to-end. Empty / degenerate inputs return NaN.

Parameters:: dist_along (float)
Return type:: float

aperta.geo_processing.angular_diff_deg(a, b, undirected=False)[source]¶

Smallest unsigned difference (in degrees) between two compass bearings.

Default (undirected=False): treats inputs as directions of travel — returns the circular distance in [0, 180]. E.g. angular_diff_deg(45, 350) is 55, not 305; opposing directions (90° vs 270°) give 180.

undirected=True: treats inputs as orientations of an undirected line — additionally folds across 180°, so A→B and B→A are equivalent. Returns [0, 90]. Use this when matching e.g. a directed traffic counter to an undirected road segment that carries traffic both ways: a counter bearing 90° aligns with edges at both 90° and 270°.

Accepts scalars or aligned numpy/pandas arrays; return type follows np.minimum (scalar in → scalar-like out).

Parameters:: undirected (bool)

aperta.geo_processing.sum_within_radius(gdf, cols, radius, *, return_densities=False, add_filled_densities=False)[source]¶

Same-set neighbourhood sum: for each cell in gdf, sum the values of cols over all cells (including itself) within radius.

For the cross-set case (targets × sources with two different GeoDataFrames) use [[cross_sum_within_radius]] instead.

The regular sum, divided by the circular query area when return_densities=True, gives per-km² density. The “filled” variant additionally divides by the occupied area (cells where at least one value in cols is > 0) — useful where occupancy is sparse and the circular-area normaliser dilutes the signal (e.g. employment concentrated at an airport).

The CRS of gdf must represent meters if densities are returned. Densities are returned per km².

Implementation: builds a scipy.spatial.KDTree on cell centroids and constructs a sparse 0/1 adjacency matrix from query_ball_point(radius); each cell’s own row is implicitly included (since dist(p, p) = 0 ≤ radius). Per-cell sums are then adj @ values in one vectorised pass.

Parameters:

gdf (GeoDataFrame)
cols (list[str])
radius (int | float)
return_densities (bool)
add_filled_densities (bool)

Return type:

DataFrame

aperta.geo_processing.count_within_radius(gdf, radius)[source]¶

Same-set neighbourhood count: for each cell in gdf, the number of cells (including itself) within radius. Useful as a denominator for neighbourhood-mean / variance computations and as a “neighbourhood crowding” feature in its own right.

Mirrors [[sum_within_radius]]’s machinery — builds a scipy.spatial.KDTree on cell centroids and queries query_ball_point(radius); the per-cell count is the length of each returned neighbour list. Each cell’s own row is always counted (since dist(p, p) = 0 ≤ radius), so the minimum returned value is 1.

Returns a pd.Series of int64 counts indexed by gdf.index, named count_r{radius}.

Parameters:

gdf (GeoDataFrame)
radius (int | float)

Return type:

Series

aperta.geo_processing.cross_sum_within_radius(targets, sources, radius, *, weight_column=None, return_density=False, name='aggregate')[source]¶

Cross-set neighbourhood sum: for each target geometry, count source geometries (or sum a weight_column over them) within radius of the target’s centroid.

For the same-set case (one gdf, queried against itself with self-inclusion, multiple value columns at once, optional “filled-area” densities) use [[sum_within_radius]] instead. The two helpers are deliberately separate: same-set and cross-set queries differ in input shape (one gdf vs two) and in optimal backend (single sparse-matrix multiply vs per-target KDTree query).

Backend: scipy cKDTree on source coordinates — O(N_src log N_src) one- time build, O(log N_src + k) per query.

Both targets and sources must share a metric CRS for densities (and the radius) to be meaningful in real-world units. The function uses the centroid of each geometry, so non-point inputs (polygons, lines) work fine — their centroid is queried.

Parameters:

targets (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries. Centroid of each geometry is the query point; targets.index becomes the output Series index.
sources (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries. Centroid of each source is the point that’s counted / summed. If weight_column is given, sources must be a GeoDataFrame carrying that column.
radius (float) – query radius in CRS units (typically metres).
weight_column (str | None) – name of the column on sources to sum. None (default) counts source geometries within radius instead.
return_density (bool) – if True, divide aggregates by π · radius² (the circular query area). Caller is responsible for ensuring CRS units are metric.
name (str) – name for the returned Series.

Returns:

pd.Series indexed by targets.index. Targets with no sources in range get 0 (or 0.0 density).

Return type:

Series

aperta.geo_processing.sample_raster_at_points(points, raster_path, *, band=1, method='bilinear', name='raster_value')[source]¶

Sample raster values at point centroids — vectorized, single-pass.

Reads the raster once and indexes into the array with either bilinear interpolation (default — right for continuous fields like elevation; smooths pixel-boundary quantization noise) or nearest-neighbour (for categorical rasters like land cover or zoning, where blending values is meaningless).

For ~100k+ points this is orders of magnitude faster (and far lighter on memory) than rasterio.Dataset.sample, which does a GDAL RasterIO() call per point and is the bottleneck inside osmnx.elevation.add_node_elevations_raster.

Points whose (row, col) falls outside the raster — or whose pixel equals the raster’s nodata value — get NaN. (For ‘bilinear’, any of the 4 surrounding pixels being nodata propagates to NaN.) The caller is responsible for ensuring points.crs matches the raster CRS; the function does not reproject. Polygon / line inputs are sampled at their centroid.

Parameters:

points (GeoDataFrame | GeoSeries) – GeoDataFrame or GeoSeries; sampling location is each geometry’s centroid; points.index becomes the output index.
raster_path – path to a single-band raster (GeoTIFF, VRT, etc.).
band (int) – 1-based band index.
method (str) – ‘bilinear’ (default) interpolates the 4 surrounding pixels weighted by fractional distance; ‘nearest’ returns the value of the pixel containing the point.
name (str) – name for the returned Series.

Returns:

pd.Series of floats indexed by points.index.

Return type:

Series

aperta.geo_processing.build_h3_grid(polygon, resolution, *, polygon_crs='EPSG:4326', target_crs=None, id_column='cell_id')[source]¶

Build a GeoDataFrame of H3 hex cells covering polygon.

One row per H3 cell whose centre lies inside polygon. Geometry is the closed hex polygon. The H3 library operates in EPSG:4326 (WGS84); the input polygon is reprojected if polygon_crs != ‘EPSG:4326’, and the output is optionally reprojected to target_crs.

For nested tiers (e.g. cells at H3 res 10, zones at res 8), call once per resolution OR derive the coarser tier by h3.cell_to_parent on the finer one (cheaper, and the nesting is exact). Pure replacement of the private _h3_cells_for_polygon / _h3_cell_to_polygon helpers seen across aperta example notebooks.

Parameters:

polygon – input area as a shapely Polygon or MultiPolygon.
resolution (int) – H3 resolution (0–15). Lower = larger cells.
polygon_crs (str) – CRS of the input polygon (default ‘EPSG:4326’).
target_crs (str | None) – optional output CRS (cells are reprojected if given).
id_column (str) – name for the H3 cell-ID index (default ‘cell_id’).

Returns:

GeoDataFrame indexed by H3 cell ID (string), with a geometry column. CRS is target_crs if given, else ‘EPSG:4326’.

Return type:

GeoDataFrame

aperta.geo_processing.get_hectare_geometries(df, crs)[source]¶

Get 100m hectare square geometries.

crs is the CRS of the (x, y) coordinates in df.index (e.g. ‘EPSG:2056’ for Swiss LV95). Library function — does not assume a particular country/CRS key.

Parameters:

df (DataFrame)
crs (str)

Return type:

GeoSeries

aperta.geo_processing.simplify_geometry(geometry, target_size, maximum_size, max_precision_m=10.0, min_precision_m=250.0)[source]¶

Simplify geometry toward target_size vertices using a per-line precision.

Returns (simplified, new_size, new_size / old_size). Precision is bounded between max_precision_m and min_precision_m. Logs a warning if the result still exceeds maximum_size.

Assumes input is in a lat/lon CRS (EPSG:4326). Uses the 0.00001-degree ≈ 1 m approximation appropriate for central-European latitudes.

Parameters:

geometry (LineString)
target_size (int)
maximum_size (int)
max_precision_m (float)
min_precision_m (float)

Return type:

tuple[LineString, int, float]

aperta.geo_processing.fetch_copernicus_dem(polygon, out_path, *, polygon_crs='EPSG:4326', target_crs=None, cache_tile_dir=None, cleanup_tiles=True, verbose=True)[source]¶

Download + mosaic + clip + reproject Copernicus GLO-30 DEM for a polygon.

Workflow:

Compute the 1° × 1° tile bounding box covering polygon in EPSG:4326.
Download each tile from AWS Open Data (skipped if already present in cache_tile_dir).
Mosaic the tiles with rasterio.merge.
Clip to polygon (in EPSG:4326).
Optionally reproject to target_crs.
Write the result as a compressed GeoTIFF to out_path.
Optionally clean up the raw tile files.

If out_path already exists, the function is a no-op and returns out_path immediately — caller is responsible for invalidating the cache (e.g., delete the file) when the underlying request changes.

Parameters:

polygon – shapely Polygon or MultiPolygon covering the area of interest.
out_path – destination GeoTIFF path. Parent directory must exist.
polygon_crs (str) – CRS of polygon (default ‘EPSG:4326’).
target_crs (str | None) – CRS of the output GeoTIFF (e.g. ‘EPSG:2056’ for Swiss LV95). None keeps EPSG:4326.
cache_tile_dir – directory for downloaded raw tiles. None defaults to the parent of out_path. Tiles already present here are not re-downloaded.
cleanup_tiles (bool) – if True (default), the raw per-tile .tif files are deleted after the mosaic / clip / reproject; set False to keep them for reuse.
verbose (bool) – print per-tile download progress + final summary.

Returns:

Path to the saved GeoTIFF (same as out_path).

Raises:

ImportError – if rasterio or requests is not installed.
requests.HTTPError – on tile-download failure.

Return type:

Path

aperta.geo_processing¶

`aperta.geo_processing`¶