Bayesian Inference¶
Bayesian Statistics¶
Hubble constant posterior evaluation.
BayesianStatistics loads saved Cramér-Rao bounds and orchestrates the
full Hubble-constant posterior evaluation using the real GLADE galaxy catalog,
simulation-based SimulationDetectionProbability,
full Fisher-matrix covariance, and multiprocessing.
Invoked via main.py:evaluate() / --evaluate CLI flag.
Output is written to simulations/posteriors/ as JSON.
- class master_thesis_code.bayesian_inference.bayesian_statistics.BayesianStatistics[source]¶
Bases:
objectHubble constant posterior evaluation.
Loads saved Cramér-Rao bounds from CSV, constructs a simulation-based
SimulationDetectionProbability, builds multivariate-normal GW likelihoods from the full Fisher-matrix covariance, and evaluates per-detection posteriors over an H₀ grid using a multiprocessing pool.Invoked via
main.py:evaluate()(--evaluateCLI flag). Output is written tosimulations/posteriors/as JSON.- cosmological_model: LamCDMScenario¶
- cramer_rao_bounds: DataFrame¶
- evaluate(galaxy_catalog, cosmological_model, h_value, num_workers=None, catalog_only=False, pdet_dl_bins=60, pdet_mass_bins=40, fisher_cond_threshold=1e16)[source]¶
- p_D(galaxy_catalog, redshift_upper_limit, pool, completeness, detection_probability_obj)[source]¶
- Parameters:
galaxy_catalog (GalaxyCatalogueHandler)
redshift_upper_limit (float)
pool (Pool)
completeness (GladeCatalogCompleteness)
detection_probability_obj (SimulationDetectionProbability)
- Return type:
None
- master_thesis_code.bayesian_inference.bayesian_statistics.child_process_init(redshift_lower_limit, redshift_upper_limit, bh_mass_lower_limit, bh_mass_upper_limit, current_detection_probability, current_means_3d, current_cov_inv_3d, current_log_norm_3d, current_means_4d, current_cov_inv_4d, current_log_norm_4d, current_det_index_to_slot, current_sigma2_cond_arr, current_proj_arr, current_det_d_L_arr, current_det_d_L_unc_arr, current_det_M_arr, current_det_phi_arr, current_det_theta_arr, current_D_h_table=None)[source]¶
- Parameters:
redshift_lower_limit (float)
redshift_upper_limit (float)
bh_mass_lower_limit (float)
bh_mass_upper_limit (float)
current_detection_probability (SimulationDetectionProbability)
current_cov_inv_3d (ndarray[tuple[Any, ...], dtype[float64]])
current_log_norm_3d (ndarray[tuple[Any, ...], dtype[float64]])
current_cov_inv_4d (ndarray[tuple[Any, ...], dtype[float64]])
current_log_norm_4d (ndarray[tuple[Any, ...], dtype[float64]])
current_sigma2_cond_arr (ndarray[tuple[Any, ...], dtype[float64]])
current_det_d_L_arr (ndarray[tuple[Any, ...], dtype[float64]])
current_det_d_L_unc_arr (ndarray[tuple[Any, ...], dtype[float64]])
current_det_M_arr (ndarray[tuple[Any, ...], dtype[float64]])
current_det_phi_arr (ndarray[tuple[Any, ...], dtype[float64]])
current_det_theta_arr (ndarray[tuple[Any, ...], dtype[float64]])
- Return type:
None
- master_thesis_code.bayesian_inference.bayesian_statistics.compute_sigma_deviation(sigma, sigma_error, h_mean, h_mean_error)[source]¶
- master_thesis_code.bayesian_inference.bayesian_statistics.precompute_completion_denominator(h_values, detection_probability_obj, Omega_m, Omega_DE, *, quad_n=_DH_QUAD_ORDER)[source]¶
Precompute the completion-term denominator D(h) for each h value.
Gray et al. (2020), arXiv:1908.06050, Eq. A.19: Denominator integrates P_det * dVc/dz over the full detectable volume.
\[D(h) = \int_{z_{\min}}^{z_{\max}(h)} P_{\det}(d_L(z,h)) \frac{dV_c}{dz\,d\Omega}\, dz\]where
z_max(h)is the redshift corresponding to the P_det grid’s maximumd_Lat the given h.- Parameters:
h_values (list[float]) – List of Hubble parameter values to evaluate.
detection_probability_obj (SimulationDetectionProbability) – SimulationDetectionProbability instance (must have
get_dl_maxanddetection_probability_without_bh_mass_interpolated_zero_fill).Omega_m (float) – Matter density parameter.
Omega_DE (float) – Dark energy density parameter.
quad_n (int) – Gauss-Legendre quadrature order (default 100).
- Returns:
Dict mapping h -> D(h) in units of Mpc^3/sr.
- Return type:
- master_thesis_code.bayesian_inference.bayesian_statistics.single_host_likelihood(host_phiS, host_qS, host_z, host_z_error, host_M, host_M_error, detection_index, h, evaluate_with_bh_mass)[source]¶
- master_thesis_code.bayesian_inference.bayesian_statistics.single_host_likelihood_grid(possible_host, detection, detection_index, h, evaluate_with_bh_mass)[source]¶
Simulation Detection Probability¶
Simulation-based detection probability from injection campaign data.
Replaces DetectionProbability (KDE-based) with a histogram-binned
approach that loads raw injection CSVs, applies an SNR threshold at evaluation
time, and builds P_det grids via SNR rescaling.
All injection data is pooled regardless of the Hubble parameter value used during the injection campaign. When querying P_det at a target h value, each event’s SNR is rescaled using the exact relation:
SNR(h_target) = SNR_raw * d_L(z, h_inj) / d_L(z, h_target)
This is exact because the GW strain amplitude scales as 1/d_L, while the waveform frequency content depends only on source-frame parameters (which are h-independent). See Gray et al. (2020) arXiv:1908.06050 Section III.B-C and Laghi et al. (2021) arXiv:2102.01708 Section III.A.
Grids are built in (d_L, M) space so that query-time lookups avoid the
expensive dist_to_redshift numerical inversion (fsolve).
- class master_thesis_code.bayesian_inference.simulation_detection_probability.SimulationDetectionProbability(injection_data_dir, snr_threshold, h_grid=None, *, dl_bins=_DEFAULT_DL_BINS, mass_bins=_DEFAULT_M_BINS, h_prior_range=(_DEFAULT_H_PRIOR_MIN, _DEFAULT_H_PRIOR_MAX), bandwidth_scale=1.0, _force_unit_weights=False)[source]¶
Bases:
objectSimulation-based detection probability from injection campaign data.
Loads raw injection CSVs (z, M, phiS, qS, SNR, h_inj, luminosity_distance), pools ALL events regardless of h_inj, and builds P_det grids on-the-fly via SNR rescaling at query time.
For a source at redshift z injected at h_inj with measured SNR_raw, the rescaled SNR at target h is:
SNR(h) = SNR_raw * d_L(z, h_inj) / d_L(z, h)
This is exact because h(t) ~ 1/d_L for gravitational wave strain, while the waveform shape depends only on source-frame parameters.
Grids are cached with LRU eviction (max 20 entries) for performance.
- Parameters:
injection_data_dir (str) – Directory containing injection CSV files matching
injection_h_*_task_*.csvorinjection_h_*.csv.snr_threshold (float) – SNR threshold for detection. Events with SNR >= snr_threshold are considered detected.
h_grid (list[float] | None) – Deprecated. Previously used to specify h grid points for pre-computed grids. Now ignored (grids are built on-the-fly via SNR rescaling). Passing this parameter emits a deprecation warning.
_force_unit_weights (bool) – Internal flag for testing. When True, passes explicit
weights=np.ones(N)to_build_grid_2dto verify IS estimator backward compatibility.dl_bins (int)
mass_bins (int)
bandwidth_scale (float)
References
Gray et al. (2020), arXiv:1908.06050, Section III.B-C. Laghi et al. (2021), arXiv:2102.01708, Section III.A. SNR ~ 1/d_L: Hogg (1999), arXiv:astro-ph/9905116, Eq. (16).
- detection_probability_with_bh_mass_interpolated(d_L, M_z, phi, theta, *, h)[source]¶
Detection probability including BH mass dependence.
Sky angles (phi, theta) are accepted for API compatibility but are marginalized over internally (D-02).
The grid is in (d_L, M_z) space — observer-frame M_z on both the grid axis and the query argument. See
_build_grid_2dfor the coordinate convention;docs/H0_BIAS_RESOLUTION.md§3.15 for the history of the convention fix.Out-of-grid policy: principled monotonic-asymptotic extrapolation.
For queries inside the (d_L, M) grid, returns scipy’s bilinear interpolation (clipped to [0, 1]). For queries outside the grid, applies a slope-matched linear extrapolation from the nearest face, clamped per the physical asymptote table:
Direction
Asymptote
Reason
d_L > d_L_max
0
SNR-suppressed (sources too distant for SNR ≥ threshold)
d_L < d_L_min
1
SNR-saturated (very nearby sources detected with prob 1)
M_z > M_z_max
0
EMRI rate / waveform model breakdown above the EMRI mass
M_z < M_z_min
0
SNR-suppressed and/or rate cutoff at low M
corner (both axes outside)
min of the two faces
“Either axis suppresses” — the only saturating face is (d_L<min, M in grid), which is NOT a corner
Construction. Two cases.
Saturating face (d_L<d_L_min): there is a unique natural scale d_L=0 where the asymptote p_det=1 is exact (no source closer than the observer). Linearly interpolate from (dl_min, p_edge) to (0, 1) — C0 continuous at dl_min, reaches the asymptote at the natural physical boundary, and uses no fitted parameters or noisy boundary slope estimates. Explicitly:
p(dl) = 1 - (1 - p_edge) * (dl / dl_min)fordl ∈ [0, dl_min].Suppressing faces (d_L>d_L_max, M_z>M_max, M_z<M_min): no analogous finite asymptote scale exists (suppression is at infinity). Use slope-matched linear extrapolation from the boundary, computed from the last two grid centers along the relevant axis with the slope evaluated at the projected position on the other axis.
p_extrap = p_edge + slope · (query - edge), clamped to[0, p_edge](Option A: never overshoots boundary value, floor at the asymptote 0).Corner cells (both axes outside) take the
minof the two face extrapolations. The only saturating face is (d_L<d_L_min, M in grid), which is not a corner; all true corners involve at least one suppressing axis, sominis monotone-correct.Properties.
C0 continuous at the boundary by construction (extrapolation formula evaluates to p_edge at the boundary, matching the in-grid interpolation). Removes the discontinuity that previously drove spurious h_trial-dependence in the joint posterior as hosts crossed the grid boundary at small Δh.
Bounded in [0, 1] by construction (clamp + asymptote table). The previous policy (raw scipy linear extrapolation + clip) could drift to negative values at the d_L→0 boundary due to KDE noise in the low-d_L bins (only ~7 injections per bin in production), systematically returning ≈0 instead of ≈1 for 6–12% of events. See
.planning/2D-CHANNEL-AUDIT-20260505.mdStep 1b for the quantitative diagnostic.Slope from the simulated KDE (no fitted anchor): the boundary slope is the same one scipy would use for its own linear extrapolation; the difference from the prior policy is the directional clamp (Option A) that prevents wrong-direction KDE noise from running into the wrong asymptote.
- Parameters:
d_L (float | ndarray[tuple[Any, ...], dtype[float64]]) – Luminosity distance in Gpc.
M_z (float | ndarray[tuple[Any, ...], dtype[float64]]) – Observer-frame (redshifted) BH mass in solar masses.
phi (float | ndarray[tuple[Any, ...], dtype[float64]]) – Sky angle phi (unused, marginalized over).
theta (float | ndarray[tuple[Any, ...], dtype[float64]]) – Sky angle theta (unused, marginalized over).
h (float) – Dimensionless Hubble parameter.
- Returns:
Detection probability in [0, 1].
- Return type:
References
Gray et al. (2020), arXiv:1908.06050, Eq. (8). Laghi et al. (2021), arXiv:2102.01708, Section III.A. Maggiore (2008), Gravitational Waves Vol 1 §7.7 (SNR scaling for inspirals → monotonicity argument). Babak et al. (2017), arXiv:1703.09722 §III (EMRI rate density with high/low M cutoffs → asymptote table).
- detection_probability_without_bh_mass_interpolated(d_L, phi, theta, *, h)[source]¶
Detection probability marginalized over BH mass.
Drop-in replacement for
DetectionProbability.detection_probability_without_bh_mass_interpolatedwith an additionalhkeyword for h-dependent P_det.Sky angles (phi, theta) are accepted for API compatibility but are marginalized over internally (D-02).
- Parameters:
d_L (float | ndarray[tuple[Any, ...], dtype[float64]]) – Luminosity distance in Gpc.
phi (float | ndarray[tuple[Any, ...], dtype[float64]]) – Sky angle phi (unused, marginalized over).
theta (float | ndarray[tuple[Any, ...], dtype[float64]]) – Sky angle theta (unused, marginalized over).
h (float) – Dimensionless Hubble parameter.
- Returns:
Detection probability in [0, 1].
- Return type:
References
Gray et al. (2020), arXiv:1908.06050, Eq. (8). Laghi et al. (2021), arXiv:2102.01708, Section III.A.
- detection_probability_without_bh_mass_interpolated_zero_fill(d_L, phi, theta, *, h)[source]¶
Detection probability with principled out-of-grid extrapolation.
Mirror of
detection_probability_with_bh_mass_interpolated()(2D channel) specialized to the 1D (d_L only) grid. Call sites inbayesian_statistics:precompute_completion_denominator(D(h) full-volume denominator)p_Di.completion_numerator_integrand(L_comp 4σ window)single_host_likelihood.numerator_integrant_without_bh_mass(L_cat numerator)single_host_likelihood.denominator_integrant_without_bh_mass(L_cat denominator)single_host_likelihood_integration_testing(legacy, two integrands)
Out-of-grid policy: principled monotonic-asymptotic extrapolation.
Direction
Asymptote
Reason
d_L > d_L_max
0
SNR-suppressed (sources too distant for SNR ≥ threshold)
d_L < d_L_min
1
SNR-saturated (very nearby sources detected with prob 1)
Construction.
Saturating face (d_L < d_L_min): linear bridge from
(d_L_min, p_edge)to(0, 1). The d_L=0 limit is the natural physical scale where the asymptote p_det=1 is exact (no source closer than the observer). Explicitly:p(dl) = 1 - (1 - p_edge) * (dl / d_L_min)fordl ∈ [0, d_L_min]. C0 continuous at d_L_min by construction; reaches the asymptote 1 at d_L=0 by construction. Uses no fitted constants and no boundary KDE slope (the first bin has ~7 injections in production; the bridge construction is insensitive to that noise).Suppressing face (d_L > d_L_max): slope-matched linear extrapolation from the boundary, computed from the last two grid centers along d_L.
p_extrap = p_edge + slope · (query - edge), clamped to[0, p_edge](Option A: never exceeds boundary, asymptotic floor at 0).
Why this replaces the Phase 45 anchor scheme (2026-05-05): the previous scheme prepended two empirical anchors at d_L=0 (Wilson 95% LB = 0.7931) and d_L=0.05 (empirical point estimate = 1.0). The Wilson LB was deliberately chosen to “not overshoot truth on production posteriors” — fitted to truth, against the project’s principled-modeling preference. The augmented Phase 46 injection campaign now gives p̂(c_0) ≈ 1.0 at the first bin, so the Wilson anchor is actively suppressing the empirical 1.0 down to 0.7931 — the opposite of its original lift purpose. The bridge construction here is the same scheme as the boundary (linear, C0) extended to the natural asymptote location; no fit, no anchor. See
.planning/2D-CHANNEL-AUDIT-20260505.mdfor the full rationale and the 2D-channel sibling that motivated this alignment.- Parameters:
d_L (float | ndarray[tuple[Any, ...], dtype[float64]]) – Luminosity distance in Gpc.
phi (float | ndarray[tuple[Any, ...], dtype[float64]]) – Sky angle phi (unused, marginalized over).
theta (float | ndarray[tuple[Any, ...], dtype[float64]]) – Sky angle theta (unused, marginalized over).
h (float) – Dimensionless Hubble parameter.
- Returns:
Detection probability in [0, 1].
- Return type:
References
Gray et al. (2020), arXiv:1908.06050, Eq. (A.19). Maggiore (2008), Gravitational Waves Vol 1 §7.7 (SNR scaling for inspirals → monotonicity argument).
- get_dl_max(h)[source]¶
Return the maximum d_L of the 1D P_det grid for the given h value.
This is
max(injection_d_L) * 1.1, i.e. the upper edge of the 1D histogram used by_build_grid_1d(). Needed to computez_max(h)for the full-volume denominator integral.
- quality_flags(h)[source]¶
Return per-bin quality metadata for the given h value.
Quality flags are diagnostic metadata stored during grid construction. They do not affect the P_det interpolation result.
If the grid for this h has not been built yet, it will be built (triggering SNR rescaling and caching).
The returned dict contains (F4 kernel-form semantics):
n_total: float array (dl_bins, M_bins) – kernel-weighted mass (Σ_k K_k · w_IS_k) per cell. Continuous “effective total”; replaces the pre-F4 integer histogram counts.n_detected: float array (dl_bins, M_bins) – kernel-weighted detected mass (Σ_k K_k · w_IS_k · 1[SNR_k≥thr]) per cell.reliable: bool array (dl_bins, M_bins) – True where n_eff >= 10 (Kish effective sample size threshold).dl_edges: float array (dl_bins+1,) – d_L grid support endpoints in Gpc (kernel evaluated at centers; edges retained for back-compat).M_edges: float array (M_bins+1,) – M grid support endpoints.n_eff: float array (dl_bins, M_bins) – Kish effective sample size per cell, (Σ w_k_eff)² / Σ (w_k_eff)² with w_k_eff = K_k · w_IS_k.
Posterior Combination¶
Posterior combination module for merging per-event likelihoods.
Combines per-h-value posterior JSON files from the evaluation campaign into a joint posterior over the Hubble constant, with multiple zero-handling strategies and diagnostic reporting.
Four zero-handling strategies:
- naive: Replace 0.0 with np.finfo(float).tiny before log.
- exclude: Remove events that have any zero likelihood.
- per-event-floor: Replace 0.0 with min(nonzero) / 100 per event.
- physics-floor: Per-event minimum nonzero likelihood as floor value; all-zero events excluded.
- class master_thesis_code.bayesian_inference.posterior_combination.CombinationStrategy(*values)[source]¶
Bases:
StrEnumZero-handling strategy for posterior combination.
- EXCLUDE = 'exclude'¶
- NAIVE = 'naive'¶
- PER_EVENT_FLOOR = 'per-event-floor'¶
- PHYSICS_FLOOR = 'physics-floor'¶
- master_thesis_code.bayesian_inference.posterior_combination.apply_strategy(likelihoods, strategy)[source]¶
Apply a zero-handling strategy to the likelihood array.
Parameters¶
- likelihoodsndarray of shape
(n_events, n_h_values) Raw likelihood array (may contain 0.0 entries).
- strategyCombinationStrategy
Which strategy to apply.
Returns¶
- processedndarray
Likelihood array with zeros handled.
- excluded_countint
Number of events removed (0 for strategies that keep all events).
- likelihoodsndarray of shape
- master_thesis_code.bayesian_inference.posterior_combination.build_likelihood_array(h_values, event_likelihoods)[source]¶
Build a 2-D likelihood array from the nested dict structure.
Parameters¶
- h_valueslist[float]
Sorted h-values (columns).
- event_likelihoodsdict[int, dict[float, float]]
Nested dict from
load_posterior_jsons().
Returns¶
- arrayndarray of shape
(n_events, n_h_values) Likelihood values.
NaNwhere an event is missing a particular h-value;0.0where the likelihood was explicitly zero.- detection_indiceslist[int]
Sorted detection indices (row labels).
- master_thesis_code.bayesian_inference.posterior_combination.combine_log_space(likelihoods, log_D_h=None, n_events_used=0)[source]¶
Combine per-event likelihoods into a joint posterior using log-space.
The per-event likelihoods \(L_i(h)\) are already conditional on detection:
L_comp = num/D(h)(Gray et al. 2020 Eq. 31, whereD(h)plays the role of the prior normalization forp_galaxy ∝ p_det · dV_c) andL_cat = (1/N_g) Σ_g (N_g_local/D_g_local)(per-galaxy detection-conditional expectation). The joint posterior is therefore justΠ_i L_i(h)with no additionalβ(h)^Nselection correction (Loredo 2004; Mandel et al. 2019, arXiv:1809.02063 §3): conditioning on the observed N_detected makes the β^N factor in the unconditional likelihood cancel against the Poisson rate prior.The
log_D_handn_events_usedarguments are retained for backward compatibility with older callers, but are now ignored — applying−N · log D(h)here would double-count the selection correction already inside each per-eventL_comp.Parameters¶
- likelihoodsndarray of shape
(n_events, n_h_values) Likelihood array with zeros already handled (all values > 0).
- log_D_hndarray of shape
(n_h_values,)or None Ignored. Kept for backward compatibility; pass
Nonefor new callers. Phase 43-H1 commit2853c32introduced an−N · log D(h)subtraction here that we have since shown (Tier 3 audit, 2026-05-04) over-applies the selection correction.- n_events_usedint
Ignored. Kept for backward compatibility.
Returns¶
- posteriorndarray of shape
(n_h_values,) Normalized joint posterior.
- likelihoodsndarray of shape
- master_thesis_code.bayesian_inference.posterior_combination.combine_posteriors(posteriors_dir, strategy, output_dir, d_h_table=None)[source]¶
Combine per-event posteriors into a joint posterior.
This is the main entry point called from CLI.
Parameters¶
- posteriors_dirstr
Path to directory containing
h_*.jsonfiles.- strategystr
Zero-handling strategy name (one of the
CombinationStrategyvalues).- output_dirstr
Path to write output files.
- d_h_tabledict[float, float] or None
Pre-computed
{h: D(h)}mapping (Gray et al. 2020, Eq. A.19). WhenNone,D(h)is computed automatically usingprecompute_completion_denominator().
Returns¶
- resultdict
Combined posterior result with keys
h_values,posterior,strategy,n_events_total,n_events_used,n_events_excluded,n_events_empty,map_h,map_posterior,variant,D_h_per_h.
- master_thesis_code.bayesian_inference.posterior_combination.compute_canonical_combined_posterior(posteriors_dir)[source]¶
Compute the canonical joint H0 posterior from per-h JSONs.
Aggregates per-event log-likelihoods via raw
Σ log L_i(no physics-floor, no outer D(h) correction — the Tier 3 fix removed the latter fromcombine_log_space). This is the combination quoted in Phase 48 verdict JSON and throughoutdocs/H0_BIAS_RESOLUTION.md.Parameters¶
- posteriors_dir:
Directory of
h_*.jsonfiles (1D or 2D variant).
Returns¶
- dict with keys
h_values: list[float] — sorted h-gridposterior: list[float] — peak-normalised posteriorlog_posterior: list[float] — un-normalised Σ log L_in_events_used: int — events surviving the full-mask filterdiscrete_map: float — argmax on the discrete gridcontinuous_map: float — parabolic-refined sub-grid MAPstrategy: literal string"raw-sum-log"
- master_thesis_code.bayesian_inference.posterior_combination.generate_comparison_table(h_values, likelihoods, detection_indices, variant, log_D_h=None)[source]¶
Generate a markdown comparison table for all strategies.
Parameters¶
- h_valuesndarray
Sorted h-values.
- likelihoodsndarray of shape
(n_events, n_h_values) Raw likelihood array.
- detection_indiceslist[int]
Detection indices.
- variantstr
Name of the posterior variant (e.g.
"posteriors").- log_D_hndarray of shape
(n_h_values,)or None Ignored (kept for backward compatibility). See
combine_log_spacedocstring — D(h) enters via L_comp = num/D inside each per-event likelihood (Gray Eq. 31), not as an outer correction.
Returns¶
- tablestr
Markdown-formatted comparison table.
- master_thesis_code.bayesian_inference.posterior_combination.generate_diagnostic_report(h_values, likelihoods, detection_indices)[source]¶
Generate a markdown diagnostic report about zero-likelihood events.
Parameters¶
- h_valueslist[float]
Sorted h-values.
- likelihoodsndarray of shape
(n_events, n_h_values) Raw likelihood array (before strategy application).
- detection_indiceslist[int]
Detection indices corresponding to rows.
Returns¶
- reportstr
Markdown-formatted diagnostic report.
- master_thesis_code.bayesian_inference.posterior_combination.load_per_h_likelihoods(directory)[source]¶
Load per-event log-likelihoods across all h-value JSONs in directory.
Reads every
h_*.jsonfile, extracts the scalar per-event likelihood for each integer event key, drops events that have missing data at any h-value (full_maskfilter), and returns(h_values, log_L)wherelog_Lhas shape(n_events_full, n_h_values).This is the reference implementation used by the bias-investigation suite and is the canonical loader for all paper-grade H0 posterior figures.
Parameters¶
- directory:
Path to a directory containing
h_*.jsonfiles (typicallyposteriors/orposteriors_with_bh_mass/).
Returns¶
- h_values:
Sorted list of h-values (one per JSON file).
- log_L:
Float64 array of shape
(n_events, n_h_values); entries arelog(max(L_event_at_h, 1e-300)). Events with NaN at any h are excluded.
- master_thesis_code.bayesian_inference.posterior_combination.load_posterior_jsons(posteriors_dir)[source]¶
Load per-h-value posterior JSON files from a directory.
Parameters¶
- posteriors_dirPath
Directory containing
h_*.jsonfiles.
Returns¶
- h_valueslist[float]
Sorted list of h values found across all files.
- event_likelihoodsdict[int, dict[float, float]]
Nested dict mapping
detection_index -> {h_value: likelihood}. Events with empty lists (missing data) are excluded.
- master_thesis_code.bayesian_inference.posterior_combination.parabolic_refine_map(h_grid, log_posterior)[source]¶
Refine the discrete MAP via 3-point parabolic interpolation.
Given a log-posterior on a discrete h-grid, fits a parabola through the discrete argmax and its two neighbours and returns the analytic vertex of the parabola. Falls back to the discrete argmax when the peak is on the grid boundary or the parabola degenerates.
Parameters¶
- h_grid:
Monotonically increasing h-values.
- log_posterior:
Log-posterior values on
h_grid.
Returns¶
- float
Sub-grid continuous MAP estimate.
Evaluation Report¶
Evaluation report module for baseline extraction and comparison.
Provides tools to extract a baseline H0 posterior snapshot from an h-sweep, compute credible intervals, and generate before/after comparison reports.
Used by Phases 31-34 to measure the effect of each fix on the H0 posterior.
- class master_thesis_code.bayesian_inference.evaluation_report.BaselineSnapshot(map_h, ci_lower, ci_upper, ci_width, bias_percent, n_events, h_values=<factory>, log_posteriors=<factory>, per_event_summaries=<factory>, n_excluded_fisher=0, median_cond_3d=0.0, median_cond_4d=0.0, created_at=<factory>, git_commit=<factory>)[source]¶
Bases:
objectSnapshot of H0 posterior metrics extracted from a posteriors directory.
- Parameters:
map_h (float) – MAP (maximum a posteriori) Hubble constant value.
ci_lower (float) – Lower bound of the 68% credible interval.
ci_upper (float) – Upper bound of the 68% credible interval.
ci_width (float) – Width of the 68% credible interval (ci_upper - ci_lower).
bias_percent (float) – Relative bias as (MAP - true_h) / true_h * 100.
n_events (int) – Number of detection events contributing to the posterior.
h_values (list[float]) – Sorted list of h values in the sweep.
log_posteriors (list[float]) – Log-posterior values at each h.
per_event_summaries (list[dict[str, float]]) – Per-event diagnostic data (d_L, SNR, etc.).
created_at (str) – ISO 8601 timestamp of when this snapshot was created.
git_commit (str) – Git commit hash at time of creation.
n_excluded_fisher (int)
median_cond_3d (float)
median_cond_4d (float)
- master_thesis_code.bayesian_inference.evaluation_report.compute_credible_interval(h_values, log_posteriors, level=0.68)[source]¶
Compute a credible interval from log-posterior values.
Converts log-posteriors to a normalized probability distribution, then computes the CDF and finds where it crosses (1-level)/2 and (1+level)/2.
- master_thesis_code.bayesian_inference.evaluation_report.extract_baseline(posteriors_dir, crb_csv_path=None, true_h=0.73)[source]¶
Extract baseline H0 posterior metrics from a posteriors directory.
Warning
This function does NOT apply the
-N log D(h)selection-effect normalization from Gray et al. (2020) arXiv:1908.06050 Eq. A.19. The MAP returned here is systematically biased towardh_maxin standard ΛCDM cosmology (where D(h) grows with h), because the log-posteriorΣ_i log L_i(h)is a monotone-increasing function of h without the D(h) correction.Use
--evaluate(BayesianStatistics.evaluate) for production MAP reporting.extract_baselineis diagnostic-only — suitable for comparing relative shifts between h-sweeps run under identical conditions, but not for absolute MAP recovery.- Parameters:
- Returns:
BaselineSnapshot with MAP h, 68% CI, bias %, and event count.
- Raises:
ValueError – If fewer than 3 h-value files are found (insufficient for credible interval computation per D-02).
- Return type:
- master_thesis_code.bayesian_inference.evaluation_report.generate_comparison_report(baseline, current, output_dir, label='current')[source]¶
Generate a comparison report between a baseline and current posterior snapshot.
Writes both a human-readable Markdown report and a machine-readable JSON sidecar.
- Parameters:
baseline (BaselineSnapshot) – The reference (pre-change) BaselineSnapshot.
current (BaselineSnapshot) – The current (post-change) BaselineSnapshot.
output_dir (Path) – Directory to write output files into (created if needed).
label (str) – Label suffix for output file names (default “current”).
- Returns:
Path to the generated Markdown report.
- Return type:
- master_thesis_code.bayesian_inference.evaluation_report.generate_diagnostic_summary(diagnostic_csv_path, output_dir, label='diagnostic')[source]¶
Analyze per-event diagnostic CSV and generate explanatory summary.
Reads the diagnostic CSV produced by BayesianStatistics.evaluate() and computes statistics explaining WHY the posterior bias changes: - Mean/median f_i across events (catalog completeness fraction) - L_comp contribution statistics - Fraction of events where L_comp pulls toward lower h
The “L_comp pulls toward lower h” metric compares L_comp at h=0.66 vs h=0.73 per event. If L_comp(h=0.66) > L_comp(h=0.73), the completion term biases that event toward lower h.
- master_thesis_code.bayesian_inference.evaluation_report.load_posteriors(posteriors_dir)[source]¶
Load h_*.json posterior files from a directory.
Each file should contain a dict with key “h” (float) and integer-string keys mapping to [likelihood_value] arrays for each detection event.
- Parameters:
posteriors_dir (Path) – Path to directory containing h_*.json files.
- Returns:
“h”, “log_posterior”, “n_detections”. Sorted by h value in ascending order.
- Return type:
List of dicts with keys