aperta.calibration

Iterative calibration of per-edge weights against observed trip-time data.

calibrate_edge_weights fits a linear model relating observed point-to-point trip times to features collected along the routed shortest path plus features at trip endpoints. The same feature set defines both the per-edge weight formula used for routing AND the regression — keeping the two consistent (a subtle pitfall in earlier ad-hoc calibration code).

Model:

time_trip = α · baseline_time
  • Σ_m coef_m · (baseline_time · length-weighted-avg of m along path)

  • Σ_a coef_a · (sum of a along path)

  • Σ_e coef_e · (endpoint value of e)

  • constant

where features come in three classes (matching how they enter the per-edge duration formula in examples/swiss/prepare/4_edge_weights.ipynb):

  • multiplier: scales baseline speed (so it multiplies baseline time per edge — appears in the regression as baseline_time · feature_avg). Examples: local density, traffic flow.

  • additive_route: adds seconds per unit summed along the path. Examples: intersection counts (sec per intersection), elevation gain (sec per metre).

  • additive_endpoint: adds seconds based on the value of a node attribute at the origin and at the destination. Examples: snap distance, local density.

Iteration (option A from the design discussion): re-route after each OLS fit, since updated coefficients change edge weights and therefore the chosen path + feature aggregates. Cheap to repeat — usually converges in 2-3 passes.

This module does NOT compute betweenness / traffic flows itself. Treat the traffic estimate as just another per-edge attribute the caller supplies (e.g. via network_processing.get_nested_edge_betweenness). Then include it in multiplier_features (if it scales duration like density) or additive_route_features (if seconds-per-unit).

class aperta.calibration.CalibrationResult(coefficients, r_squared, n_used, predicted_times, observed_times, routed_distances, rmse, rmse_by_distance, edge_duration_attr, iter_log)[source]

Bases: object

Outcome of calibrate_edge_weights.

Variables:
  • coefficients (pandas.DataFrame) – DataFrame indexed by feature name with columns kind (multiplier / additive_route / additive_endpoint / const / baseline), coef (fitted value), p (p-value), mean_effect (coef × mean of column, in seconds).

  • r_squared (float) – OLS R² on the held-in trips.

  • n_used (int) – number of ground-truth rows that survived snap + filter + successful routing.

  • predicted_times (pandas.Series) – per-trip predicted time (Series indexed by trip_id); comparable to ground_truth.loc[predicted_times.index, ‘time_measured’].

  • observed_times (pandas.Series) – per-trip observed time (same index as predicted).

  • routed_distances (pandas.Series) – per-trip routed distance (m); useful for distance-band breakdowns.

  • rmse (float) – overall RMSE on the held-in trips, in seconds.

  • rmse_by_distance (pandas.Series) – Series of RMSE per distance band, indexed by band label (’< 10 km’, ‘10-25 km’, ‘>= 25 km’).

  • edge_duration_attr (str) – name of the per-edge attribute written to graph by the final iteration (default ‘duration_calibrated’). Downstream routing can use this as the cost.

  • iter_log (pandas.DataFrame) – DataFrame, one row per iteration, columns r_squared, rmse, n_used — useful to inspect convergence.

Parameters:
coefficients: DataFrame
r_squared: float
n_used: int
predicted_times: Series
observed_times: Series
routed_distances: Series
rmse: float
rmse_by_distance: Series
edge_duration_attr: str
iter_log: DataFrame
aperta.calibration.calibrate_edge_weights(graph, ground_truth, *, baseline_speed_attr='speed_kph', multiplier_features=None, additive_route_features=None, additive_endpoint_features=None, constant=True, n_iterations=3, snap_max_distance=300.0, min_trip_distance=3000.0, max_trip_distance=100000.0, max_dist_to_line_ratio=4.0, edge_duration_attr='duration_calibrated', eligible_node_ids=None, eligible_node_flag=None)[source]

Iteratively calibrate per-edge durations against observed trip times.

See module docstring for the model. Each iteration:
  1. Writes per-edge duration to edge_duration_attr from the current coefficients (or initial guesses on iteration 1).

  2. Routes each ground-truth trip on those weights, aggregating features along the path.

  3. Fits an OLS model of time_measured ~ baseline_time + features + endpoint terms.

  4. Updates coefficients to the OLS fit.

Parameters:
  • graph (MultiDiGraph) – routable networkx graph. Must carry length and baseline_speed_attr on every edge, plus every attribute named in the feature dicts.

  • ground_truth (DataFrame) – DataFrame with columns orig_x, orig_y, dest_x, dest_y, time_measured (seconds). Optional dist_measured enables the dist-ratio filter. Optional dist_line is computed from coords if not provided.

  • baseline_speed_attr (str) – per-edge speed in km/h (e.g. from osmnx.add_edge_speeds). Not modified by this function.

  • multiplier_features (dict[str, float] | None) – {edge_attr: initial_coef}. Each scales the baseline duration (new_dur = old_dur · (1 + Σ coef · feat)). Use for density-like features.

  • additive_route_features (dict[str, float] | None) – {edge_attr: initial_coef}. Each contributes coef · feat_value seconds per edge (summed along path). Use for intersection counts, elevation gain, etc.

  • additive_endpoint_features (dict[str, float] | None) – {node_attr: initial_coef}. Each adds coef · value_at_origin + coef · value_at_destination to total trip duration. Use for snap distance, local density at endpoints.

  • constant (bool) – include an intercept in the OLS fit.

  • n_iterations (int) – number of route-fit cycles. 2-3 usually converges.

  • snap_max_distance (float) – drop trips where origin or destination is farther than this from any network node (metres).

  • min_trip_distance (float) – drop trips with dist_line below this (metres).

  • max_trip_distance (float) – drop trips with dist_line above this (metres).

  • max_dist_to_line_ratio (float) – if dist_measured is present, drop trips where dist_measured / dist_line exceeds this (long detours are usually data noise).

  • edge_duration_attr (str) – name of the per-edge duration attribute written on graph (overwritten each iteration).

  • eligible_node_ids – optional set / list / Index of node IDs to restrict trip-endpoint snap targets to. Forwarded to snap_to_network_nodes. Typically prepared.snap_eligible_nodes from routing_prep.prepare_network — prevents trips from snapping to trapped nodes and contaminating the calibration fit.

  • eligible_node_flag (str | None) – alternative to eligible_node_ids — name of a per-node bool attribute on graph marking eligible snap targets (e.g., prepared.snap_eligible_flag). Ignored if eligible_node_ids is also given.

Returns:

CalibrationResult — see its docstring.

Raises:

ValueError – if any required column is missing or every trip filters out before fitting.

Return type:

CalibrationResult

aperta.calibration.snap_counters_to_edges(counters, graph, *, search_radius=50.0, bearing_tol_deg=20.0, bearing_column='bearing_deg', eligible_edges=None, bidirectional=None)[source]

Snap directional traffic counters to the correct network edges.

Counters typically sit next to several parallel candidate edges (opposite directions on the same road; service roads; frontage roads), so naïve nearest-line matching picks the wrong edge most of the time. This function adds a bearing tolerance filter — only edges whose local bearing matches the counter’s bearing_deg (within bearing_tol_deg) are eligible. For directed graphs the bearing comparison is directional (a counter at bearing 90° won’t snap to an edge pointing at 270°), which correctly assigns the two counters of a two-way road to the two directional edges.

Uses d[‘geometry’] from every edge — guaranteed by consolidate_intersections. Edges without a geometry attribute (e.g. raw OSMnx graphs with simplify=True) are silently skipped; consolidate first or call osmnx.graph_to_gdfs(…, fill_edge_geometry=True).

Parameters:
  • counters (GeoDataFrame) – GeoDataFrame of point geometries with a bearing_column (degrees, OSM/north-clockwise convention). Same CRS as the graph node coordinates.

  • graph (MultiDiGraph) – routable nx graph. Edge attributes must include geometry (LineString) and whatever eligible_edges reads.

  • search_radius (float | Series) – max cartesian distance for candidate edges (CRS units). Pass a scalar for one global radius or a pd.Series aligned to counters.index for per-counter radii (e.g. wider for highway counters which sit further from the carriageway).

  • bearing_tol_deg (float) – max angular difference between counter bearing and local edge bearing at the snap point.

  • bearing_column (str) – counter column holding the directional bearing (default ‘bearing_deg’).

  • eligible_edges (Callable[[Series, GeoDataFrame], GeoDataFrame] | None) – optional (counter_row, candidate_edges_gdf) -> subset callback. Use to restrict matches by class — typically a highway counter only matches highway edges, a local counter only matches local edges. Forwarded to [[geo_mapping.map_points_to_filtered_lines]].

  • bidirectional (bool | None) – how to compare bearings. True collapses opposite bearings (counter at 90° matches edges at 90° AND 270°) — correct for undirected graphs where one edge represents both directions of a road. False is directional — correct for directed graphs (the default nx.MultiDiGraph). None auto-detects from graph.is_directed().

Returns:

  • u, v, k: matched edge ID (or pd.NA if no acceptable match within radius);

  • snap_dist: cartesian distance counter → edge (or NaN);

  • dist_along: along-edge distance from edge start to nearest point on edge (or NaN).

Return type:

DataFrame indexed like counters with columns

Unmatched counters get all-NA rows — drop with result.dropna(subset=[‘u’]).

aperta.calibration.evaluate_against_counters(modeled, counters, *, observed_column='traffic_cars')[source]

Compare modeled per-edge AADT against snapped counter observations.

Parameters:
  • modeled (Series) – per-edge modeled AADT, indexed by (u, v, k) tuples (the output of traffic_flows.nested_node_sample + betweenness + AADT scaling).

  • counters (DataFrame) – DataFrame with u, v, k columns (from snap_counters_to_edges) and an observed-AADT column. Rows with NA in u/v/k are dropped (unmatched counters).

  • observed_column (str) – name of the observed-AADT column (default ‘traffic_cars’, matching the Swiss counter schema).

Returns:

  • r2: Pearson correlation² between modeled and observed — scale-invariant, so use this to pick distribution-shape params (lognormal σ, μ).

  • slope: slope from a no-intercept regression modeled = slope · observed. Tells you how to rescale absolute volumes — e.g. multiply trips_per_person_per_day by 1 / slope to bring the modeled total in line with counters.

  • rmse: root-mean-square error on the matched set, in counter-units (veh/day).

  • n_matched: number of counters used in the comparison.

  • merged: DataFrame with observed, modeled, (u, v, k) for every matched counter — convenient for scatter plots.

Return type:

Dict with