aperta.overhead

Trip overheads — the first-mile and last-mile costs that aren’t on the routed path.

Aperta routes between network nodes, but real trips start and end at units (cells, buildings, etc.) that are typically NOT at network nodes, and may carry additional fixed costs at the origin or destination side (parking-find time, station-access time, etc.). The “overhead” is the extra cost between the actual unit and its assigned network node — at the origin (first mile) and at the destination (last mile).

Four kinds of overhead. Aperta supports four overhead categories, organised by which side (origin / destination) and what granularity (cell-specific within one node, or node-level for all cells at one node):

    1. origin × cell-specific: per-cell first-mile.

    1. origin × node-level: origin overhead at node.

    1. destination × node-level: destination overhead at node.

    1. destination × cell-specific: per-cell-tier-node aggregate.

Each category in detail:

  • (1) Origin overhead, cell → node — per-cell first-mile (cell centroid → assigned network node). Different cells at the same node have different values. Cannot be added to a TieredODPairs because TieredODPairs is keyed by node, not by cell. Applied at accessibility-computation time via the cell_overhead_column kwarg on gravity, cumulative_opportunities, etc.

  • (2) Origin overhead at node — per-node overhead independent of which cell at that node is the actual origin (e.g., parking-find time on departure). Added to a TieredODPairs of costs upfront via add_node_overheads(origin=…).

  • (3) Destination overhead at node — per-destination-node overhead, independent of geo unit (e.g., parking-find time on arrival). Added via add_node_overheads(dest_cell=…, dest_zone=…) per tier.

  • (4) Destination overhead, node → cell (aggregated) — for cell-tier destinations, the mean per-cell overhead across cells sharing the node; for zone-tier destinations, the (weighted) average across cells in the zone of the “intra-zone” access cost. Computed via aggregate_dest_overhead_per_node (cell tier), aggregate_dest_overhead_per_group_euclidean (zone tier — for road networks, where users don’t actually pass through a specific node), or aggregate_dest_overhead_per_group_routed (zone tier — for transit-style analyses where users do have to access a specific stop). Apply via add_node_overheads.

Recommended pattern: pick ONE side.

Mixing the two granularities on the same side (e.g., per-cell first-mile + per-node origin overhead) is technically supported but makes the analysis structure harder to reason about. We recommend picking ONE granularity per side:

  • Cell-granularity workflow: (1) + (4). Per-cell first-mile via cell_overhead_column; aggregated destination overhead per tier via this module. Natural for analyses where intra-node heterogeneity matters (e.g., walking accessibility with hectare cells).

  • Node-granularity workflow: (2) + (3). Per-node origin and destination overheads via add_node_overheads. Natural for analyses where the unit- of-interest IS the network node (e.g., transit-stop-to-stop accessibility).

The two patterns can be mixed when there’s a clear reason — but document the reason in your project.

Workflow (cell-granularity case):

# 1. Per-cell first-mile (origin side, #1) — typically done in data prep
cells['walk_overhead_s'] = dist_to_node / WALK_SPEED_MS

# 2. Compute aggregated destination overheads (#4)
node_overhead = overhead.aggregate_dest_overhead_per_node(
    cells, 'walk_overhead_s')
zones['walk_dest_overhead_s'] = overhead.aggregate_dest_overhead_per_group_euclidean(
    cells, zones, speed=WALK_SPEED_MS,
    group_id_column='zone_id', cell_overhead_column='walk_overhead_s')

# 3. Apply destination overheads to costs
times_aug = overhead.add_node_overheads(
    times, pairs,
    dest_cell=node_overhead,
    dest_zone=zones.set_index('node_id')['walk_dest_overhead_s'],
)

# 4. Accessibility — origin first-mile still applied here via cell_overhead_column
accessibility.gravity(
    times_aug, weights, c2z, decays,
    cells=cells, node_column='node_id', cell_overhead_column='walk_overhead_s',
)
aperta.overhead.aggregate_dest_overhead_per_node(cells, cell_overhead_column, *, node_column='node_id', weight_column=None)[source]

Per-network-node destination overhead — (weighted) mean of per-cell overheads across cells sharing each node.

Use as dest_cell=… in add_node_overheads (overhead #4 at cell tier).

For cell-tier destinations: a destination network node typically represents one or more cells (any cell whose node_column value is that node). The “destination overhead” — the cost of getting from the node back to a representative cell — is approximated as the mean of those cells’ own first-mile overheads.

Parameters:
  • cells (DataFrame) – per-cell DataFrame.

  • cell_overhead_column (str) – column on cells with the per-cell overhead value (typically the first-mile cost — cell centroid → assigned network node — divided by speed if cells carries distance).

  • node_column (str) – column on cells mapping each cell to its network node.

  • weight_column (str | None) – optional column to weight the mean (e.g. ‘population’ or another size-of-cell column). None (default) = uniform.

Returns:

pd.Series indexed by network node ID, with one mean overhead per node. Nodes with no associated cells are absent from the result — callers using the result via add_node_overheads will receive 0 overhead for those nodes (the dict.get(…, 0) fallback).

Return type:

Series

aperta.overhead.aggregate_dest_overhead_per_group_euclidean(cells, target_groups, speed, *, group_id_column, cell_overhead_column=None, weight_column=None)[source]

Per-group destination overhead — Euclidean-distance-based, for zone-tier destinations.

Use as dest_zone=… in add_node_overheads (overhead #4 at zone tier). The same function shape handles any via the group_id_column kwarg.

For each target group g (with polygon centroid g_centroid):

overhead(g) = (weighted) mean over cells c in g of:

(cells[c, cell_overhead_column] if cell_overhead_column else 0) + euclidean(c_centroid, g_centroid) / speed

The Euclidean variant is appropriate for road-network analyses where users don’t actually have to pass through the group’s representative node — the “geometric distance to a typical place in the group” is the more honest approximation. For transit-style analyses where users do have to access a specific stop, use aggregate_dest_overhead_per_group_routed instead.

The CRS of cells and target_groups must agree and be metric (Euclidean distance computations require it). speed is in CRS-units per time-unit (e.g. for walking with cells in metres: speed=1.4).

The cell_overhead_column typically encodes the mode-specific constant plus any feature-based overhead (e.g. β · population_density) the user has precomputed per cell. The Euclidean penalty is added on top.

Parameters:
  • cells (GeoDataFrame) – per-cell GeoDataFrame with polygon (or point) geometry. Must have group_id_column linking to target_groups.index.

  • target_groups (GeoDataFrame) – per-group GeoDataFrame with polygon (or point) geometry, indexed by group ID. Polygon centroid is used as the “representative point”.

  • speed (float) – speed in CRS-units per time-unit. Used to convert distance to time. Must be > 0.

  • group_id_column (str) – column on cells linking to target_groups.index (typically ‘zone_id’ or ‘region_id’).

  • cell_overhead_column (str | None) – optional column on cells with per-cell base overhead (constant + feature-based), added on top of the Euclidean penalty.

  • weight_column (str | None) – optional column on cells to weight the mean (e.g. ‘population’). None = uniform.

Returns:

pd.Series indexed by target_groups.index, with one mean overhead per group. Groups with no constituent cells get NaN.

Return type:

Series

aperta.overhead.aggregate_dest_overhead_per_group_routed(cells, target_groups, graph, weight, *, group_id_column, node_column='node_id', cell_overhead_column=None, weight_column=None, cutoff=None)[source]

Per-group destination overhead via routing — for zone-tier destinations.

Use as dest_zone=… in add_node_overheads (overhead #4 at zone tier). The same function shape handles any via the group_id_column kwarg.

For each target group g (with representative network node g_node):

overhead(g) = (weighted) mean over cells c in g of:

(cells[c, cell_overhead_column] if cell_overhead_column else 0) + route(g_node → c_node, weight)

Routing direction is g_node → c_node (single-source Dijkstra from g_node) — by symmetry on undirected graphs this equals c_node → g_node. The “egress at destination” semantic is the g_node → c_node direction; for directed graphs (one-way streets etc.), that’s the right direction.

Parameters:
  • cells (DataFrame) – per-cell DataFrame. Must have node_column (network node ID) and group_id_column (target-group ID linking to target_groups.index).

  • target_groups (DataFrame) – per-group DataFrame (e.g. zones), indexed by group ID, with node_column giving the group’s representative network node.

  • graph (Graph) – routable networkx (or osmnx) graph.

  • weight (str) – edge attribute name used for routing (e.g. ‘walk_time_s’).

  • group_id_column (str) – column on cells linking to target_groups.index (typically ‘zone_id’ or ‘region_id’).

  • node_column (str) – column name carrying the network node ID, on both cells and target_groups. Default ‘node_id’.

  • cell_overhead_column (str | None) – optional column on cells with per-cell first-mile overhead to add to the routed distance before averaging. None = routed cost only (zone-internal first-mile ignored).

  • weight_column (str | None) – optional column on cells to weight the mean (e.g. ‘population’). None = uniform.

  • cutoff (float | None) – optional csg.dijkstra(limit=cutoff) in weight units. Cells beyond it from g_node are treated as unreachable (contribute NaN, filtered from the mean). Set this comfortably above the longest expected last-mile in weight units (typical zone diameter ÷ slowest mode speed) to speed up routing on large graphs.

Returns:

pd.Series indexed by target_groups.index, with one mean overhead per group. Groups with no constituent cells (or with all cells unreachable from g_node) get NaN.

Return type:

Series

aperta.overhead.add_node_overheads(costs, pairs, *, origin=None, dest_cell=None, dest_zone=None)[source]

Add per-node origin and destination overheads to a cost TieredODPairs.

Each kwarg is a per-node lookup (pd.Series indexed by node ID, or a dict[node_id -> overhead]). Nodes absent from a lookup contribute 0 overhead.

  • origin: added to every OD cost whose origin matches a key. Looked up by the origin node of each TieredODPairs entry. Applies to all tiers (cells_to_cells and cells_to_zones use cell-tier origin nodes; zones_to_zones uses zone-tier origin nodes).

  • dest_cell: added to cells_to_cells OD costs, looked up by destination cell-tier node. Use for overhead #3 (cell-tier dest, at-node) and / or #4 (cell-tier dest, aggregated — from aggregate_dest_overhead_per_node).

  • dest_zone: added to BOTH cells_to_zones and zones_to_zones OD costs, looked up by destination zone-tier node. Use for overhead #3 / #4 at zone tier (both middle and far tier have zone destinations).

Any kwarg can be None (no overhead applied at that side / tier). The returned TieredODPairs is a new object — the input is not mutated.

Note on origin overhead: the same Series is looked up at all tiers, but the origin nodes themselves differ by tier (cell-tier and middle-tier origins are cell-nodes; far-tier origins are zone-nodes). To apply a single per-cell-node origin overhead to all tiers, you would need to combine it with cell_overhead_column at accessibility time instead — see this module’s docstring on the per-cell-vs-per-node granularity choice.

Parameters:
  • costs (TieredODPairs) – TieredODPairs of routed costs.

  • pairs (TieredODPairs) – TieredODPairs of destination IDs (typically from od_pairs.get_pairs), position-aligned with costs.

  • origin (Series | dict | None) – per-node overhead lookups.

  • dest_cell (Series | dict | None) – per-node overhead lookups.

  • dest_zone (Series | dict | None) – per-node overhead lookups.

Returns:

New TieredODPairs of cost arrays with the requested overheads added. Tiers that are None in costs pass through as None.

Return type:

TieredODPairs

aperta.overhead.add_geo_overheads(costs, pairs, *, origin_cell=None, origin_zone=None, dest_cell=None, dest_zone=None, cell_to_zone=None, zone_aggregator='mean')[source]

Add per-geo-unit origin and destination overheads to a geo-keyed cost ODM.

Geo-keyed twin of add_node_overheads. Four independent overhead lookups, one per (side × tier-granularity) combination. Each kwarg is a per-unit lookup (pd.Series indexed by unit ID or dict[unit_id -> value]); units absent from a lookup contribute 0 overhead.

Origin (looked up by origin unit ID at each tier):

  • origin_cell: per-cell-id overhead, added to every cells_to_cells AND cells_to_zones OD cost (both tiers have cell-id origins). Use for per-cell first-mile (e.g. cell-centroid → assigned network node, mode-specific). Mode-specific origin overhead baked here propagates correctly through aggregate_across_modes.

  • origin_zone: per-zone-id overhead, added to every zones_to_zones OD cost. If origin_cell is given and origin_zone is not, the zone-tier version is auto-derived from origin_cell + cell_to_zone using zone_aggregator (default ‘mean’).

Destination (looked up by dest unit ID at each tier):

  • dest_cell: per-cell-id overhead, added to every cells_to_cells destination. Use for per-cell last-mile.

  • dest_zone: per-zone-id overhead, added to every cells_to_zones AND zones_to_zones destination (both tiers have zone-id dests). If dest_cell is given and dest_zone is not, the zone-tier version is auto-derived from dest_cell + cell_to_zone using zone_aggregator.

Tiers not present in costs pass through as None. The input is not mutated; a new TieredODGeoPairs is returned.

Why the auto-derivation matters. Leaving zone-tier overhead absent when cell-tier overhead is set produces silently-wrong accessibility: zones_to_zones OD pairs end up with zero overhead while cells_to_cells pairs carry the full 2× cell overhead. In nearest-k metrics this makes the z2z tier appear artificially cheap, and z2z routes (which use origin-zone rep-nodes shared by all cells in a zone) produce visible origin-zone outlines in the output. Auto-derivation closes this footgun by default.

Parameters:
  • costs (TieredODGeoPairs) – geo-keyed cost ODM (typically from reindex_by_geo_unit).

  • pairs (TieredODGeoPairs) – matching geo-keyed pairs (for tier structure + dest lookups).

  • dest_zone (Series | dict | None) – see above.

  • cell_to_zone (Series | dict | None) – cell-id → zone-id map (pd.Series, dict, or any cell-indexed series with zone values). Required when a cell-tier overhead is given but the corresponding zone-tier overhead is absent AND a coarser tier exists in costs. Cells absent from the map raise ValueError.

  • zone_aggregator (str | Callable) – how to collapse per-cell overheads into per-zone scalars during auto-derivation. Any pandas groupby-compatible string (‘mean’, ‘median’, etc.) or callable. Default ‘mean’ — unweighted mean over cells in each zone.

  • origin_cell (Series | dict | None)

  • origin_zone (Series | dict | None)

  • dest_cell (Series | dict | None)

  • dest_zone

Raises:

ValueError – if a cell-tier overhead is given but the corresponding zone-tier is needed for a present tier and cell_to_zone is missing; or if cell_to_zone lacks zones referenced by costs; or if any cell in a cell-tier overhead is missing from cell_to_zone.

Return type:

TieredODGeoPairs

aperta.overhead.add_origin_cell_overhead(costs, pairs, cells, overhead_column, *, zone_id_column='zone_id', zone_aggregator='mean')[source]

Bake per-cell origin overhead into a geo-keyed cost ODM at all tiers.

Convenience wrapper around add_geo_overheads. The per-cell first-mile overhead is added directly at the cell tier; at the zone tier (where origins are zones, not cells), the per-zone aggregate of per-cell overheads is added — cells in the same zone share their zone-tier OD pair, so collapsing to a per-zone scalar is the natural granularity.

Why this matters for cross-modal accessibility: the per-cell first-mile is mode-specific (dist_to_node / WALK_SPEED_MS vs dist_to_node / CAR_SPEED_MS). Baking it into the per-mode cost ODM before aggregate_across_modes lets the cross-modal logsum see the right per-mode disutility, instead of conflating origin time across modes.

Parameters:
  • costs (TieredODGeoPairs) – geo-keyed cost ODM (typically from reindex_by_geo_unit).

  • pairs (TieredODGeoPairs) – matching geo-keyed pairs (for tier structure; not used for dest lookups here since this function only touches origins).

  • cells (DataFrame) – cell-level DataFrame indexed by cell_id. Must have overhead_column and (when zone-tier is populated) zone_id_column.

  • overhead_column (str) – per-cell overhead column on cells.

  • zone_id_column (str) – column on cells mapping each cell to its zone. Required iff costs.zones_to_zones is populated (only the far tier uses zone-level origin overhead — the cells_to_cells and cells_to_zones tiers both have cell-id origins and consume origin_cell directly).

  • zone_aggregator (str) – pandas-compatible string aggregator (default ‘mean’), applied to per-cell overhead values within each zone.

Returns:

New TieredODGeoPairs with the overhead added. costs is not mutated.

Return type:

TieredODGeoPairs

aperta.overhead.linear_per_cell_overhead(cells, constant, feature_coefficients)[source]

Canonical aperta linear per-cell trip overhead:

overhead(cell) = constant + Σ coef_i × cells[col_i]

One side (origin or destination) of the per-cell trip overhead. The classic decomposition is a constant (“door-to-curb” time), a snap- distance term (coef_i = seconds-per-metre, col_i = ‘snap_dist’), and a density term (coef_i = seconds-per-density-unit, col_i = ‘density_norm’), but any per-cell numeric column works — the formula is mode- and feature-agnostic.

NaN handling: missing values in a feature column are treated as 0 (the assumption is that “data not available” doesn’t add overhead). If a different convention is needed, pre-process cells before calling.

Returns a per-cell pd.Series indexed like cells, ready to pass as origin_cell= or dest_cell= to overhead.add_geo_overheads.

Parameters:
  • cells (DataFrame) – per-cell DataFrame; must contain every column named in feature_coefficients.

  • constant (float) – side constant (e.g. seconds of “door-to-curb” time).

  • feature_coefficients (dict[str, float]) – {column_name -> coefficient}. Empty dict is allowed (returns a constant Series).

Returns:

pd.Series of floats indexed by cells.index.

Return type:

Series

See also: [[add_geo_overheads]] for applying the result to a cost ODM, and [[aggregate_dest_overhead_per_group_euclidean]] for aggregating destination-side overheads to zones.