`aperta.calibration`¶

Iterative calibration of per-edge weights against observed trip-time data.

calibrate_edge_weights fits a linear model relating observed point-to-point trip times to features collected along the routed shortest path plus features at trip endpoints. The same feature set defines both the per-edge weight formula used for routing AND the regression — keeping the two consistent (a subtle pitfall in earlier ad-hoc calibration code).

Model:

time_trip = α · baseline_time

Σ_m coef_m · (baseline_time · length-weighted-avg of m along path)

Σ_a coef_a · (sum of a along path)

Σ_e coef_e · (endpoint value of e)

constant

where features come in three classes (matching how they enter the per-edge duration formula in examples/swiss/prepare/4_edge_weights.ipynb):

multiplier: scales baseline speed (so it multiplies baseline time per edge — appears in the regression as baseline_time · feature_avg). Examples: local density, traffic flow.
additive_route: adds seconds per unit summed along the path. Examples: intersection counts (sec per intersection), elevation gain (sec per metre).
additive_endpoint: adds seconds based on the value of a node attribute at the origin and at the destination. Examples: snap distance, local density.

Iteration (option A from the design discussion): re-route after each OLS fit, since updated coefficients change edge weights and therefore the chosen path + feature aggregates. Cheap to repeat — usually converges in 2-3 passes.

This module does NOT compute betweenness / traffic flows itself. Treat the traffic estimate as just another per-edge attribute the caller supplies (e.g. via network_processing.get_nested_edge_betweenness). Then include it in multiplier_features (if it scales duration like density) or additive_route_features (if seconds-per-unit).

class aperta.calibration.CalibrationResult(coefficients, metrics_baseline, metrics_calibrated, metrics_regression, n_used)[source]¶

Bases: object

Outcome of calibrate_edge_weights.

Variables:

coefficients (pandas.DataFrame) – DataFrame indexed by feature name with columns coef (fitted value) and p (p-value). Includes the OLS constant (const, if constant was set), baseline_time (the α scale on baseline duration), and one row per multiplier / additive_route / additive_endpoint feature.
n_used (int) – Number of ground-truth trips that survived snap + distance filters and entered the OLS fit.
a (Three per-distance-band metrics frames are reported, each)
km", (DataFrame indexed by distance band ("all", "< 5)
bias. ("5-25 km", ">= 25 km") with columns r2, rmse,)
different (Each measures fit between observed times and a)
set (prediction mechanism on the same trip) –
- metrics_baseline — predict via routing with length / speed_kph only (the un-calibrated graph). Invariant to the user’s prior coefficient choices. Quantifies the lift the calibration provides over the raw speed_kph attribute.
- metrics_calibrated — predict via routing on the CALIBRATED graph (final α + coefs applied to edge weights; Dijkstra re-runs to pick paths under those weights, and the sum of weights along the chosen path is the prediction). This is the production-relevant number — what you’d actually get if you deployed the calibrated graph for routing.
- metrics_regression — OLS R² of the final iteration’s linear-model fit. The OLS sits on the iteration-N routing’s paths (NOT the final-coefs routing’s paths), so this is an upper bound on calibrated R². The gap between regression and calibrated R² is a convergence diagnostic: small gap → calibration converged at this n_iterations; large gap → bumping n_iterations would tighten things.
access (Quick overall-fit) – result.metrics_calibrated.loc[‘all’, ‘r2’].

Parameters:

coefficients (DataFrame)
metrics_baseline (DataFrame)
metrics_calibrated (DataFrame)
metrics_regression (DataFrame)
n_used (int)

coefficients: DataFrame¶

metrics_baseline: DataFrame¶

metrics_calibrated: DataFrame¶

metrics_regression: DataFrame¶

n_used: int¶

aperta.calibration.apply_edge_durations(graph, *, multiplier_features=None, additive_route_features=None, alpha=1.0, out_attr='duration', baseline_duration_attr='speed_kph', min_speed_kph=1.0, max_speed_kph=120.0)[source]¶

Write per-edge duration to out_attr (mutates graph in place):

edge_duration = α · base + base · Σ_m c_m · m_value + Σ_a c_a · a_value

where base = baseline_duration_attr (usually: speed limit for cars, a fixed speed for active modes) before any features are added.

Term semantics:

α · base — global scale on the network-derived baseline. α=1.0 (default) is the prior; trust the network’s speed_kph as-is. After calibration, pass the fitted α (from CalibrationResult.coefficients row baseline_time) to apply the calibrated weights to a fresh graph for downstream routing.
base · multiplier_features[f] · edge[f] — speed-like correction: per-feature coefficient scales the baseline by f’s value on that edge (e.g. slope_climb · 8.0 on a 10 % climb adds 80 % to that edge’s baseline time).
additive_route_features[f] · edge[f] — raw seconds per edge, independent of length / speed (e.g. is_traffic_signal · 5.0 adds 5 s on signalised intersections).

Parameters:

graph (MultiGraph)
multiplier_features (dict[str, float] | None)
additive_route_features (dict[str, float] | None)
alpha (float)
out_attr (str)
baseline_duration_attr (str)
min_speed_kph (float)
max_speed_kph (float)

Return type:

None

aperta.calibration.calibrate_edge_weights(graph, ground_truth, *, baseline_speed_attr='speed_kph', multiplier_features=None, additive_route_features=None, additive_endpoint_features=None, min_speed_kph=1.0, max_speed_kph=120.0, constant=None, n_iterations=3, max_distance=300.0, max_dist_to_line_ratio=4.0, edge_duration_attr='duration_calibrated', eligible_node_ids=None, eligible_node_flag=None)[source]¶

Iteratively calibrate per-edge durations against observed trip times.

See module docstring for the model. Each iteration:

Writes per-edge duration to edge_duration_attr from the current coefficients (or initial guesses on iteration 1).
Routes each ground-truth trip on those weights, aggregating features along the path.
Fits an OLS model of time_measured ~ baseline_time + features + endpoint terms.
Updates coefficients to the OLS fit.

Parameters:

graph (MultiDiGraph) – routable networkx graph. Must carry length and baseline_speed_attr on every edge, plus every attribute named in the feature dicts.
ground_truth (DataFrame) – DataFrame with columns orig_x, orig_y, dest_x, dest_y, time_measured (seconds). Optional dist_measured enables the dist-ratio filter. Optional dist_line is computed from coords if not provided.
baseline_speed_attr (str) – per-edge speed in km/h (e.g. from osmnx.add_edge_speeds). Not modified by this function.
multiplier_features (dict[str, float] | None) – {edge_attr: initial_coef}. Each scales the baseline duration (new_dur = old_dur · (1 + Σ coef · feat)). Use for density-like features.
additive_route_features (dict[str, float] | None) – {edge_attr: initial_coef}. Each contributes coef · feat_value seconds per edge (summed along path). Use for intersection counts, elevation gain, etc.
additive_endpoint_features (dict[str, float] | None) – {node_attr: initial_coef}. Each adds coef · value_at_origin + coef · value_at_destination to total trip duration. Use for snap distance, local density at endpoints.
min_speed_kph (float) – minimum edge speed (including node effects) in km/h.
max_speed_kph (float) – maximum edge speed (including node effects) in km/h.
constant (float | None) – include an intercept in the OLS fit.
n_iterations (int) – number of route-fit cycles. 2-3 usually converges.
max_distance (float) – drop trips where origin or destination is farther than this from any network node (metres).
max_dist_to_line_ratio (float) – if dist_measured is present, drop trips where dist_measured / dist_line exceeds this (long detours are usually data noise).
edge_duration_attr (str) – name of the per-edge duration attribute written on graph (overwritten each iteration).
eligible_node_ids – optional set / list / Index of node IDs to restrict trip-endpoint snap targets to. Forwarded to snap_to_network_nodes. Typically prepared.snap_eligible_nodes from routing_prep.prepare_network — prevents trips from snapping to trapped nodes and contaminating the calibration fit.
eligible_node_flag (str | None) – alternative to eligible_node_ids — name of a per-node bool attribute on graph marking eligible snap targets (e.g., prepared.snap_eligible_flag). Ignored if eligible_node_ids is also given.

Returns:

CalibrationResult — see its docstring.

Raises:

ValueError – if any required column is missing or every trip filters out before fitting.

Return type:

CalibrationResult

aperta.calibration.snap_counters_to_edges(counters, graph, *, max_distance=50.0, bearing_tol_deg=20.0, bearing_column='bearing_deg', eligible_edges=None, bidirectional=None)[source]¶

Snap directional traffic counters to the correct network edges.

Counters typically sit next to several parallel candidate edges (opposite directions on the same road; service roads; frontage roads), so naïve nearest-line matching picks the wrong edge most of the time. This function adds a bearing tolerance filter — only edges whose local bearing matches the counter’s bearing_deg (within bearing_tol_deg) are eligible. For directed graphs the bearing comparison is directional (a counter at bearing 90° won’t snap to an edge pointing at 270°), which correctly assigns the two counters of a two-way road to the two directional edges.

Uses d[‘geometry’] from every edge — guaranteed by consolidate_intersections. Edges without a geometry attribute (e.g. raw OSMnx graphs with simplify=True) are silently skipped; consolidate first or call osmnx.graph_to_gdfs(…, fill_edge_geometry=True).

Parameters:

counters (GeoDataFrame) – GeoDataFrame of point geometries with a bearing_column (degrees, OSM/north-clockwise convention). Same CRS as the graph node coordinates.
graph (MultiDiGraph) – routable nx graph. Edge attributes must include geometry (LineString) and whatever eligible_edges reads.
max_distance (float | Series) – max cartesian distance for candidate edges (CRS units). Pass a scalar for one global radius or a pd.Series aligned to counters.index for per-counter radii (e.g. wider for highway counters which sit further from the carriageway).
bearing_tol_deg (float) – max angular difference between counter bearing and local edge bearing at the snap point.
bearing_column (str) – counter column holding the directional bearing (default ‘bearing_deg’).
eligible_edges (Callable[[Series, GeoDataFrame], GeoDataFrame] | None) – optional (counter_row, candidate_edges_gdf) -> subset callback. Use to restrict matches by class — typically a highway counter only matches highway edges, a local counter only matches local edges. Forwarded to [[geo_mapping.map_points_to_filtered_lines]].
bidirectional (bool | None) – how to compare bearings. True collapses opposite bearings (counter at 90° matches edges at 90° AND 270°) — correct for undirected graphs where one edge represents both directions of a road. False is directional — correct for directed graphs (the default nx.MultiDiGraph). None auto-detects from graph.is_directed().

Returns:

u, v, k: matched edge ID (or pd.NA if no acceptable match within radius);
snap_dist: cartesian distance counter → edge (or NaN);
dist_along: along-edge distance from edge start to nearest point on edge (or NaN).

Return type:

DataFrame indexed like counters with columns

Unmatched counters get all-NA rows — drop with result.dropna(subset=[‘u’]).

aperta.calibration.evaluate_against_counters(modeled, counters, *, observed_column='traffic_cars')[source]¶

Compare modeled per-edge AADT against snapped counter observations.

Parameters:

modeled (Series) – per-edge modeled AADT, indexed by (u, v, k) tuples (the output of traffic_flows.nested_node_sample + betweenness + AADT scaling).
counters (DataFrame) – DataFrame with u, v, k columns (from snap_counters_to_edges) and an observed-AADT column. Rows with NA in u/v/k are dropped (unmatched counters).
observed_column (str) – name of the observed-AADT column (default ‘traffic_cars’, matching the Swiss counter schema).

Returns:

r2: Pearson correlation² between modeled and observed — scale-invariant, so use this to pick distribution-shape params (lognormal σ, μ).
slope: slope from a no-intercept regression modeled = slope · observed. Tells you how to rescale absolute volumes — e.g. multiply trips_per_person_per_day by 1 / slope to bring the modeled total in line with counters.
rmse: root-mean-square error on the matched set, in counter-units (veh/day).
n_matched: number of counters used in the comparison.
merged: DataFrame with observed, modeled, (u, v, k) for every matched counter — convenient for scatter plots.

Return type:

Dict with

aperta.calibration¶

`aperta.calibration`¶