aperta.routing

Routing primitives backed by scipy.sparse.csgraph.dijkstra. Input graphs are networkx.Graph (or its multi/directed variants); aperta converts to a scipy CSR matrix internally per call. No backend choice — scipy throughout.

The library exposes two concerns separately:

  1. Edge weightingapply_edge_weights runs a user-supplied callable on each edge of a graph and writes the result to a named edge attribute. Mode-specific behavior (car vs. bike vs. walking, peak vs. off-peak, density-adjusted speeds, intersection penalties, etc.) lives in these callables, not in this module. combine_edge_weights sums multiple per-edge components into a single routing weight (e.g. edge travel time + intersection penalty -> total cost). mask_excluded_edges wraps a weight callable to return cost = ∞ for edges flagged by prepare_network as non-traversable for a given mode (e.g. motorways for walking).

  2. Routing primitives — covering the common query shapes:

    • shortest_distances_from: single source → all reachable nodes (with optional cutoff). Used for accessibility / isochrone calculations.

    • shortest_distances_pairwise: full distance matrix between two node lists.

    • shortest_path_metrics_one_to_one: paired (origin, destination) routing that also aggregates edge features along each path. Used for travel-time model calibration and similar trip-by-trip work.

    • tiered_path_costs / tiered_path_aggregate: bulk many-to-many routing across the three-tier OD structure.

Why scipy-only: aperta’s routing workflow is one-shot Dijkstra-from-many-origins on a live graph whose edge weights are routinely mutated (calibration loop, scenario comparison, time-of-day variants). Contraction hierarchies (Pandana, OSRM) preprocess once and reuse, which fights this workflow. The historical networkx / igraph backends were dropped in favour of scipy CSR + scipy csgraph.dijkstra, which is competitive in raw speed, zero-preprocessing, and has a single code path.

A future RoutingProfile class will bundle the duration callable + parameters + graph into one object; for now, callers compose the pieces themselves.

aperta.routing.apply_edge_weights(graph, weight_fn, weight_name, **fn_kwargs)[source]

Apply weight_fn to each edge of graph (mutates graph in place).

weight_fn receives the edge data dict plus any extra fn_kwargs. The dict supports row[‘key’] access just like a pandas Series, so callables written against a GeoDataFrame edge-row pattern work without modification.

Parameters:
Return type:

None

aperta.routing.mask_excluded_edges(weight_fn, cost_excluded_flag)[source]

Wrap weight_fn so edges flagged cost_excluded_flag=True get cost = ∞.

Companion to routing_prep.prepare_network, which writes the per-edge boolean flag based on a mode’s cost_excluded_tags. The flag name is recorded on PreparedGraph.cost_excluded_flag; pass it here to bake the mode-specific edge exclusion into a routing-weight callable.

Typical use:

prepared = prepare_network(graph, “walk”) weight_fn = mask_excluded_edges(duration_walk_fn, prepared.cost_excluded_flag) apply_edge_weights(prepared.graph, weight_fn, “duration_walk_s”)

Edges with the flag missing or False flow through to weight_fn unchanged.

Parameters:
Return type:

Callable

aperta.routing.combine_edge_weights(graph, source_names, target_name)[source]

Sum multiple per-edge attributes into one combined weight (in place).

Use case (lumos pattern): travel time and intersection penalty are computed separately, stored as duration_edge_t3 and duration_node_t3, then summed into the routing weight duration_t3.

Parameters:
Return type:

None

aperta.routing.shortest_distances_from(graph, origin, weight, cutoff=None)[source]

Single-source shortest distances from origin to all reachable nodes.

Returns a dict mapping node -> total weight. With cutoff, only nodes within that weight threshold are returned. Unreachable nodes are omitted.

Parameters:
Return type:

dict

aperta.routing.shortest_distances_pairwise(graph, origins, destinations, weight, cutoff=None)[source]

Distance matrix for origins x destinations.

Returns ndarray of shape (len(origins), len(destinations)). Unreachable destinations are np.inf.

Parameters:
Return type:

ndarray

aperta.routing.shortest_path_metrics_one_to_one(graph, trip_ids, origins, destinations, weight, length_attr='length', edge_features=None, *, cutoff=None)[source]

Paired (origin, destination) shortest-path routing with edge-feature aggregation.

edge_features maps an edge attribute name to an aggregation:
  • ‘sum’ : element-wise sum along the path (e.g. count of intersections)

  • ‘length_weighted’ : average weighted by edge length (e.g. average gradient)

Returns a DataFrame indexed by trip_id with columns:
  • distance (sum of length_attr along the path)

  • cost (sum of weight along the path)

  • one column per requested edge feature

Trips with no path are omitted from the output (so output length <= input length).

cutoff (optional): per-origin Dijkstra limit= in weight units. Set this to a value comfortably above the longest expected trip cost — trips that would route beyond it are silently dropped (treated as unreachable), same as actually-unreachable trips. Big speed-up on large graphs when trip costs are small relative to graph diameter (e.g. urban trips on a country-scale road graph). Default None = no cutoff.

Raises DataError if ‘distance’ or ‘cost’ appear in edge_features (would collide with the built-in path-total columns).

Parameters:
Return type:

DataFrame

aperta.routing.tiered_path_costs(pairs, graph, weight, *, mask=None, cutoff=None, dtype=<class 'numpy.float32'>)[source]

Shortest-path cost (sum of edge weight along the path) for every OD pair in pairs, across all tiers.

Single-process. For the experimental multi-process variant see tiered_path_costs_mp. This is the hot path for almost every aperta application — the closure-based inner loop is on purpose, not a refactor candidate (a module-level worker pattern adds per-origin dict lookups that measurably slow down single-process routing).

Every tier is routed across the same graph. All node IDs referenced anywhere in pairs — cell nodes (cells_to_cells keys + values), zone nodes (cells_to_zones values, zones_to_zones keys + values) — must therefore be present in graph.

Parameters:
  • pairs (TieredODPairs) – TieredODPairs of destination IDs (typically from od_pairs.get_pairs).

  • graph (Graph) – networkx routable graph containing every node referenced in pairs. Converted internally to a scipy CSR matrix.

  • weight (str) – edge attribute name used as the per-edge routing cost (e.g. ‘duration_naive’, ‘duration_traffic_iterative’).

  • mask (TieredODPairs | None) – optional boolean TieredODPairs (build via od_pairs.make_mask). Destinations where the mask is False are skipped and stored as np.inf in the output (same convention as unreachable). Output arrays keep the same length as the input pairs (position-wise alignment is preserved); use the mask itself to distinguish “masked-out” from “unreachable” if you care. Missing origins or missing tiers in the mask are treated as “no filter”.

  • cutoff (float | None) – optional network-distance cutoff in weight units (e.g. seconds for time-weighted edges, metres for length-weighted). Passed through to scipy.sparse.csgraph.dijkstra as limit=cutoff, truncating each per-origin Dijkstra at the cutoff. Big speed-up when the cutoff is small relative to graph diameter (e.g. walk accessibility on a country-scale graph). Destinations beyond cutoff are stored as np.inf — same convention as unreachable, so downstream metrics (cumulative_opportunities etc.) handle them naturally. Default None = no cutoff (limit=np.inf).

  • dtype (dtype | type) – dtype of returned cost arrays (default np.float32 — halves memory + on-disk size vs float64, with seconds-resolution precision more than sufficient for travel costs). Pass np.float64 if downstream arithmetic needs the extra range (e.g. logsum with very small scale parameter).

Returns:

TieredODPairs of cost arrays paired position-wise with pairs. Each unreachable, masked-out, or beyond-cutoff destination is stored as np.inf.

Return type:

TieredODPairs

class aperta.routing.PathAggregation(name, attribute, aggregator='sum')[source]

Bases: NamedTuple

Named per-edge feature aggregation along realised shortest paths.

name labels the corresponding output column in tiered_path_aggregate’s return dict. attribute extracts a per-edge value; aggregator combines those values into one scalar per OD pair.

attribute:
  • str: name of an edge attribute on the graph; the per-edge value is edge_data[attribute].

  • Callable[(u, v, data) -> float]: arbitrary per-edge function.

aggregator:
  • ‘sum’: sum across path edges (returns 0 for an empty path).

  • ‘mean’: arithmetic mean (returns NaN for an empty path).

  • ‘min’, ‘max’: respective extremes (NaN for an empty path).

  • Callable[(np.ndarray) -> float]: arbitrary callable on the per-edge value array.

Parameters:
name: str

Alias for field number 0

attribute: str | Callable

Alias for field number 1

aggregator: str | Callable

Alias for field number 2

class aperta.routing.NodeAggregation(name, attribute, aggregator='sum', include_endpoints=True)[source]

Bases: NamedTuple

Named per-node feature aggregation along realised shortest paths.

Parallel to PathAggregation but for node attributes (e.g. counting traffic signals encountered, or finding the highest-elevation node along a route). The node sequence of a path is [u₀, u₁, …, uₙ]; include_endpoints controls whether the route’s origin (u₀) and destination (uₙ) nodes contribute.

name, aggregator: as in PathAggregation.

attribute:
  • str: name of a node attribute on the graph; the per-node value is node_data[attribute].

  • Callable[(node, data) -> float]: arbitrary per-node function.

include_endpoints:
  • True (default): all n+1 nodes contribute, including origin and destination. Risk: endpoints shared across many routes get amplified weight in cross-route counts.

  • False: interior nodes only (u₁ .. uₙ₋₁). Self-pair [u] and single-edge path [u, v] both yield an empty array → aggregator empty-path semantics apply (‘sum’ → 0; ‘mean’/’min’/’max’ → NaN).

Parameters:
name: str

Alias for field number 0

attribute: str | Callable

Alias for field number 1

aggregator: str | Callable

Alias for field number 2

include_endpoints: bool

Alias for field number 3

aperta.routing.aggregate_along_paths(paths, graph, weight, *, edge_aggregations=(), node_aggregations=(), dtype=<class 'numpy.float32'>)[source]

Walk realised paths and aggregate per-edge / per-node features along each.

Pure path walker — no routing involved. Use this directly when you already have a list of paths (Strava traces, prebuilt routes, calibration targets, etc.). tiered_path_aggregate is the wrapper that routes shortest paths on a TieredODPairs and scatters results back into per-tier TieredODPairs outputs.

For each path:
  • cost = sum of weight along the path’s edges

  • each PathAggregation reduces per-edge attribute values

  • each NodeAggregation reduces per-node attribute values

paths semantics:
  • [] → unreachable: cost=`inf`, all aggs=`NaN`

  • [u] → self-pair: cost=0, edge aggs follow empty-array

    semantics, node aggs follow each spec’s include_endpoints setting

  • [u, v, …] → multi-node path; cost + aggs walked normally

Parameters:
  • paths (list[list]) – list of node-id sequences (lists). Node IDs must match graph keys.

  • graph (Graph) – networkx graph used for edge / node attribute lookup. For MultiGraph / MultiDiGraph the min-weight parallel edge is used (matches the router’s choice).

  • weight (str) – edge attribute name used as the per-edge cost.

  • edge_aggregations (Sequence[PathAggregation]) – list of PathAggregation specs (per-edge).

  • node_aggregations (Sequence[NodeAggregation]) – list of NodeAggregation specs (per-node). At least one of edge_aggregations / node_aggregations must be non-empty. Names must be unique across both lists.

  • dtype (dtype | type) – dtype of returned arrays (default np.float32).

Returns:

  • costs: ndarray of shape (len(paths),). inf for unreachable; 0.0 for self-pairs.

  • aggregations_by_name: dict {name -> ndarray} with one entry per spec across both lists. Unreachable destinations are NaN.

Return type:

(costs, aggregations_by_name)

aperta.routing.tiered_path_aggregate(pairs, graph, weight, *, edge_aggregations=(), node_aggregations=(), mask=None, cutoff=None, dtype=<class 'numpy.float32'>)[source]

Route shortest paths and aggregate per-edge / per-node features along each.

Wraps aggregate_along_paths with routing on every tier of pairs. Memory cost matches tiered_path_costs for the cost component — paths are processed per-origin and discarded.

For the cost-only case (no aggregations needed), use tiered_path_costs directly: it can skip path retrieval (more expensive than distance retrieval) and is faster.

Parameters:
  • pairs (TieredODPairs) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.

  • graph (Graph) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.

  • weight (str) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.

  • mask (TieredODPairs | None) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.

  • cutoff (float | None) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.

  • dtype (dtype | type) – as in tiered_path_costs. Paths are retrieved via scipy.sparse.csgraph.dijkstra( return_predecessors=True) and reconstructed by walking the predecessor chain back from each target to the origin.

  • edge_aggregations (Sequence[PathAggregation]) – list of PathAggregation specs (per-edge).

  • node_aggregations (Sequence[NodeAggregation]) – list of NodeAggregation specs (per-node). At least one of the two must be non-empty. Names must be unique across both lists.

Returns:

  • costs: TieredODPairs of routing costs (sum of weight along the realised path). Same shape and conventions as tiered_path_costs. Unreachable / masked-out / beyond-cutoff destinations are np.inf.

  • aggregations_by_name: dict[name -> TieredODPairs]. One entry per spec (edge + node), keyed by spec name. Unreachable / masked-out / beyond-cutoff destinations are np.nan (not inf, since aggregations may be signed or already use inf semantics).

Return type:

(costs, aggregations_by_name)

For OSMnx-style MultiDiGraphs with multiple parallel edges between the same (u, v) pair, the edge with the lowest weight is used for both cost computation and attribute extraction (matching the router’s choice).

For self-pairs (origin == destination, path length 0): cost is 0.0, edge aggregations follow each aggregator’s empty-array semantics (‘sum’ → 0.0; ‘mean’/’min’/’max’ → NaN), node aggregations depend on each spec’s include_endpoints setting.

aperta.routing.add_trip_overhead(costs, pairs, cell_info, *, zone_info=None, origin_overhead=None, dest_overhead=None, verify_finite=True)[source]

Add per-trip origin and/or destination overhead to each OD pair’s cost.

For each (origin_node, dest_node) pair in costs (paired position-wise with pairs), the returned cost is:

new_cost = old_cost + origin_overhead(info_o.loc[orig])
                    + dest_overhead (info_d.loc[dest])

where the info dataframe depends on the tier and the endpoint side:

cells_to_cells:    info_o = cell_info,   info_d = cell_info
cells_to_zones:    info_o = cell_info,   info_d = zone_info
zones_to_zones:    info_o = zone_info,   info_d = zone_info

Each info DataFrame is one row per network node at that tier, indexed by node ID. It can mix native node-level attributes (e.g. local density, intersection count) with aggregated unit-level attributes (e.g. distance from cell centroid to nearest network node) — the function doesn’t care which is which, just looks up by node ID and hands the row(s) to the callback.

When multiple units share a node (typical for cells), aggregate upstream:

cell_info = (cells.groupby('node_id_nw').agg({
                'dist_to_node': 'mean',
                'population': 'sum',
            }).join(nodes))   # node-level attrs joined in
Each callback receives a single info argument:
  • For the origin side: a 1-D pd.Series (the row for that single origin). The callback returns a scalar.

  • For the destination side: a pd.DataFrame (one row per dest, ordered the same as the dest array). The callback returns a 1-D Series / ndarray of the same length. Pandas column access (info[‘col’]) yields a scalar in the Series case and a Series in the DataFrame case, so the same callable typically works for both modes without branching.

origin_overhead and dest_overhead are independently optional — pass None to skip that side.

Parameters:
  • pairs (TieredODPairs) – TieredODPairs of destination IDs (typically from od_pairs.get_pairs).

  • costs (TieredODPairs) – TieredODPairs of cost arrays to augment; same shape as pairs.

  • cell_info (DataFrame) – per-cell-node info DataFrame, indexed by the cell-tier node ID.

  • zone_info (DataFrame | None) – per-zone-node info DataFrame, indexed by the zone-tier node ID. Required iff cells_to_zones or zones_to_zones is present in costs.

  • origin_overhead (Callable | None) – callable, see above. None to skip the origin contribution.

  • dest_overhead (Callable | None) – callable, see above. None to skip the dest contribution.

  • verify_finite (bool) – if True, a ValueError is raised when output is not finite (NaN or Inf).

Returns:

New TieredODPairs of cost arrays (same shape as costs) with the overhead added. costs is not mutated. Unreachable / masked-out entries (np.inf in the input) stay infinite (inf + anything = inf).

Raises:

ValueError if overhead result is not finite and verify_finite is True.

Return type:

TieredODPairs

aperta.routing.floor_intrazonal_costs(costs, min_cost)[source]

Floor cell-tier costs at min_cost — applied uniformly to every entry.

Routing on a graph returns 0 for the trivial origin-to-origin path. That’s fine for cumulative-opportunity output (cost 0 falls in the smallest bin), but degenerate for decay-based metrics like gravity: exp(-β·0) = 1 puts the maximum possible decay weight on the cell itself, and c^(-β) at c = 0 diverges outright.

The floor is applied uniformly to every cell-tier entry, not just self- pairs. Setting only the self-pair to a non-zero floor would create an inconsistency: a cell would route to itself at, say, 120 s while a different (very close) cell could route at 60 s, implying you can travel further faster than you can travel zero distance. The min-cost interpretation is the physical floor on per-trip cost — no trip can take less than min_cost, regardless of distance — and it handles the intrazonal-cost-0 case as a side effect.

Non-finite entries (np.inf for unreachable destinations, np.nan for missing observations) are passed through unchanged — the floor is applied only to finite costs. Flooring inf would erase reachability information (an unreachable destination would become reachable in min_cost seconds), and flooring nan would silently invent data; both behaviours would be incorrect.

Only cells_to_cells is modified — cells_to_zones and zones_to_zones are routed between distinct cell-zone / zone-zone pairs and don’t have the same zero-self-cost degeneracy. Tiers that are None pass through.

Parameters:
  • costs (TieredODPairs) – TieredODPairs of cost arrays.

  • min_cost (float | dict | Series) – floor value. Either a scalar float (same floor for every origin), a dict[origin_node -> float] (per-origin floor; origins absent from the dict get no floor and their costs pass through unchanged), or a pd.Series indexed by origin_node (same semantics as the dict form).

Returns:

New TieredODPairs with cell_tier_cost = max(cell_tier_cost, min_cost) applied per origin to finite entries; non-finite entries (inf, nan) pass through unchanged.

Return type:

TieredODPairs