`aperta.overhead`¶

Trip overheads — the first-mile and last-mile costs that aren’t on the routed path.

Aperta routes between network nodes, but real trips start and end at units (cells, buildings, etc.) that are typically NOT at network nodes, and may carry additional fixed costs at the origin or destination side (parking-find time, station-access time, etc.). The “overhead” is the extra cost between the actual unit and its assigned network node — at the origin (first mile) and at the destination (last mile).

Two supported sides: origin (first-mile, added to every OD leaving that unit) and destination (last-mile, added to every OD arriving at that unit). Applied to a geo-keyed cost TieredODGeoPairs via add_geo_overheads, which auto-derives the zone-tier overhead from the cell-tier one when only the cell side is passed (see the function’s docstring for the footgun this protects against).

Destination overhead for the zone tier is a modelling choice. Three canonical aggregators are provided:

aggregate_dest_overhead_per_node — mean per-cell overhead per snap node (cell tier).
aggregate_dest_overhead_per_group_euclidean — for road-network analyses, where “cost to a typical place in the zone” is best modeled as Euclidean centroid-distance ÷ speed.
aggregate_dest_overhead_per_group_routed — for transit-style analyses, where users have to reach a specific stop node so the last-mile is a routed Dijkstra distance rather than Euclidean.

Workflow (canonical geo-keyed pattern):

# 1. Per-cell first-mile (origin side) — typically done in data prep
cells['walk_overhead_s'] = dist_to_node / WALK_SPEED_MS

# 2. Compute aggregated destination overheads
node_overhead = overhead.aggregate_dest_overhead_per_node(
    cells, 'walk_overhead_s')
zones['walk_dest_overhead_s'] = overhead.aggregate_dest_overhead_per_group_euclidean(
    cells, zones, speed=WALK_SPEED_MS,
    group_id_column='zone_id', cell_overhead_column='walk_overhead_s')

# 3. Bake overheads into the geo-keyed cost ODM
times_geo = overhead.add_geo_overheads(
    times_geo, pairs_geo,
    origin_cell=cells['walk_overhead_s'],
    dest_cell=cells['walk_overhead_s'],
    dest_zone=zones['walk_dest_overhead_s'],
    cell_to_zone=cells['zone_id'],
)

# 4. Accessibility reads the pre-baked ODM directly.
accessibility.gravity(times_geo, weights, cell_to_zone, decays)

aperta.overhead.aggregate_dest_overhead_per_node(cells, cell_overhead_column, *, node_column='node_id', weight_column=None)[source]¶

Per-network-node destination overhead — (weighted) mean of per-cell overheads across cells sharing each node.

For cell-tier destinations: a destination network node typically represents one or more cells (any cell whose node_column value is that node). The “destination overhead” — the cost of getting from the node back to a representative cell — is approximated as the mean of those cells’ own first-mile overheads.

Parameters:

cells (DataFrame) – per-cell DataFrame.
cell_overhead_column (str) – column on cells with the per-cell overhead value (typically the first-mile cost — cell centroid → assigned network node — divided by speed if cells carries distance).
node_column (str) – column on cells mapping each cell to its network node.
weight_column (str | None) – optional column to weight the mean (e.g. ‘population’ or another size-of-cell column). None (default) = uniform.

Returns:

pd.Series indexed by network node ID, with one mean overhead per node. Nodes with no associated cells are absent from the result.

Return type:

Series

aperta.overhead.aggregate_dest_overhead_per_group(cells, target_groups, *, distance, group_id_column, cell_overhead_column=None, weight_column=None, speed=None, graph=None, weight=None, node_column='node_id', cutoff=None)[source]¶

Per-group destination overhead — for zone-tier destinations.

Pair with add_geo_overheads(dest_zone=…) (overhead #4 at zone tier). The same function shape handles any grouping via the group_id_column kwarg.

For each target group g (with representative point / network node):

overhead(g) = (weighted) mean over cells c in g of:
(cells[c, cell_overhead_column] if cell_overhead_column else 0) + last_mile_distance(c, g)

The last_mile_distance term depends on the distance mode:

‘euclidean’: euclidean(c_centroid, g_centroid) / speed. Appropriate for road-network analyses where users don’t actually have to pass through the group’s representative node — the “geometric distance to a typical place in the group” is the more honest approximation. Requires speed (CRS-units per time-unit) and CRSes on cells + target_groups that agree and are metric.
‘routed’: route(g_node → c_node, weight). Appropriate for transit-style analyses where users have to access a specific stop node. Direction is g_node → c_node (single-source Dijkstra from g_node) — by symmetry on undirected graphs this equals c_node → g_node; for directed graphs (one-way streets etc.), the g_node → c_node direction is the “egress at destination” semantic. Requires graph, weight, and node_column on both cells and target_groups.

The cell_overhead_column typically encodes the mode-specific constant plus any feature-based overhead (e.g. β · population_density) the user has precomputed per cell. The last-mile distance is added on top.

Parameters:

cells (DataFrame | GeoDataFrame) – per-cell DataFrame (routed mode) or GeoDataFrame with polygon / point geometry (Euclidean mode). Must have group_id_column linking to target_groups.index. In routed mode must also have node_column (network node ID).
target_groups (DataFrame | GeoDataFrame) – per-group DataFrame / GeoDataFrame, indexed by group ID. In Euclidean mode the polygon centroid is the “representative point”; in routed mode node_column gives the group’s representative network node.
distance (str) – ‘euclidean’ or ‘routed’. Picks which last-mile distance model applies and which mode-specific kwargs are consulted.
group_id_column (str) – column on cells linking to target_groups.index (typically ‘zone_id’ or ‘region_id’).
cell_overhead_column (str | None) – optional column on cells with per-cell base overhead (constant + feature-based), added on top of the last-mile distance.
weight_column (str | None) – optional column on cells to weight the mean (e.g. ‘population’). None = uniform.
speed (float | None) – ‘euclidean’ mode only — speed in CRS-units per time-unit, used to convert distance to time. Must be > 0.
graph (Graph | None) – ‘routed’ mode only — routable networkx (or osmnx) graph.
weight (str | None) – ‘routed’ mode only — edge attribute name used for routing (e.g. ‘walk_time_s’).
node_column (str) – ‘routed’ mode only — column name carrying the network node ID, on both cells and target_groups. Default ‘node_id’.
cutoff (float | None) – ‘routed’ mode only — optional csg.dijkstra(limit=cutoff) in weight units. Cells beyond it from g_node are treated as unreachable (contribute NaN, filtered from the mean). Set this comfortably above the longest expected last-mile in weight units (typical zone diameter ÷ slowest mode speed) to speed up routing on large graphs.

Returns:

pd.Series indexed by target_groups.index, with one mean overhead per group. Groups with no constituent cells (or, in routed mode, with all cells unreachable from g_node) get NaN.

Return type:

Series

aperta.overhead.add_geo_overheads(costs, pairs, *, origin_cell=None, origin_zone=None, dest_cell=None, dest_zone=None, cell_to_zone=None, zone_aggregator='mean')[source]¶

Add per-geo-unit origin and destination overheads to a geo-keyed cost ODM.

Four independent overhead lookups, one per (side × tier-granularity) combination. Each kwarg is a per-unit lookup (pd.Series indexed by unit ID or dict[unit_id -> value]); units absent from a lookup contribute 0 overhead.

Origin (looked up by origin unit ID at each tier):

origin_cell: per-cell-id overhead, added to every cells_to_cells AND cells_to_zones OD cost (both tiers have cell-id origins). Use for per-cell first-mile (e.g. cell-centroid → assigned network node, mode-specific). Mode-specific origin overhead baked here propagates correctly through aggregate_across_modes.
origin_zone: per-zone-id overhead, added to every zones_to_zones OD cost. If origin_cell is given and origin_zone is not, the zone-tier version is auto-derived from origin_cell + cell_to_zone using zone_aggregator (default ‘mean’).

Destination (looked up by dest unit ID at each tier):

dest_cell: per-cell-id overhead, added to every cells_to_cells destination. Use for per-cell last-mile.
dest_zone: per-zone-id overhead, added to every cells_to_zones AND zones_to_zones destination (both tiers have zone-id dests). If dest_cell is given and dest_zone is not, the zone-tier version is auto-derived from dest_cell + cell_to_zone using zone_aggregator.

Tiers not present in costs pass through as None. The input is not mutated; a new TieredODGeoPairs is returned.

Why the auto-derivation matters. Leaving zone-tier overhead absent when cell-tier overhead is set produces silently-wrong accessibility: zones_to_zones OD pairs end up with zero overhead while cells_to_cells pairs carry the full 2× cell overhead. In nearest-k metrics this makes the z2z tier appear artificially cheap, and z2z routes (which use origin-zone rep-nodes shared by all cells in a zone) produce visible origin-zone outlines in the output. Auto-derivation closes this footgun by default.

Parameters:

costs (TieredODGeoPairs) – geo-keyed cost ODM (typically from reindex_by_geo_unit).
pairs (TieredODGeoPairs) – matching geo-keyed pairs (for tier structure + dest lookups).
dest_zone (Series | dict | None) – see above.
cell_to_zone (Series | dict | None) – cell-id → zone-id map (pd.Series, dict, or any cell-indexed series with zone values). Required when a cell-tier overhead is given but the corresponding zone-tier overhead is absent AND a coarser tier exists in costs. Cells absent from the map raise ValueError.
zone_aggregator (str | Callable) – how to collapse per-cell overheads into per-zone scalars during auto-derivation. Any pandas groupby-compatible string (‘mean’, ‘median’, etc.) or callable. Default ‘mean’ — unweighted mean over cells in each zone.
origin_cell (Series | dict | None)
origin_zone (Series | dict | None)
dest_cell (Series | dict | None)
dest_zone

Raises:

ValueError – if a cell-tier overhead is given but the corresponding zone-tier is needed for a present tier and cell_to_zone is missing; or if cell_to_zone lacks zones referenced by costs; or if any cell in a cell-tier overhead is missing from cell_to_zone.

Return type:

TieredODGeoPairs

aperta.overhead¶

`aperta.overhead`¶