Utilities

Causal Inference Helpers

bayesgm.utils.helpers.get_ADRF(x_values=None, x_min=None, x_max=None, nb_intervals=None, dataset='Imbens')[source]

Compute the values of the Average Dose-Response Function (ADRF).

Parameters:
  • x_values (list or np.ndarray, optional) – A list or array of values at which to evaluate the ADRF. If provided, overrides x_min, x_max, and nb_intervals.

  • x_min (float, optional) – The minimum value of the range (used when x_values is not provided).

  • x_max (float, optional) – The maximum value of the range (used when x_values is not provided).

  • nb_intervals (int, optional) – The number of intervals in the range (used when x_values is not provided).

  • dataset (str, optional) – The dataset name (default: ‘Imbens’). Must be one of {‘Imbens’, ‘Sun’, ‘Lee’}.

Returns:

true_values – The computed ADRF values.

Return type:

np.ndarray

Notes

  • Either x_values or (x_min, x_max, nb_intervals) must be provided.

  • Supported datasets:
    • ‘Imbens’: ADRF = x + 2 / (1 + x)^3

    • ‘Sun’: ADRF = x - 1/2 + exp(-0.5) + 1

    • ‘Lee’: ADRF = 1.2 * x + x^3

bayesgm.utils.helpers.estimate_latent_dims(x, y, v, v_ratio=0.7, z0_dim=3, max_total_dim=64, min_z3_dim=3)[source]

Estimate the latent-dimension split for CausalBGM.

Uses Sliced Inverse Regression (SIR) and PCA to automatically choose dimensions [z0, z1, z2, z3] for the four latent sub-vectors.

Parameters:
  • x (np.ndarray) – Treatment variable with shape (n, 1).

  • y (np.ndarray) – Outcome variable with shape (n, 1).

  • v (np.ndarray) – Covariates with shape (n, v_dim).

  • v_ratio (float, default=0.7) – Cumulative PCA variance ratio used to determine total latent dimension.

  • z0_dim (int, default=3) – Fixed dimension for the confounding sub-vector \(Z_0\).

  • max_total_dim (int, default=64) – Upper bound on the total latent dimension.

  • min_z3_dim (int, default=3) – Minimum dimension for the residual sub-vector \(Z_3\).

Returns:

A list [z0_dim, z1_dim, z2_dim, z3_dim].

Return type:

list of int

Image Helpers

bayesgm.utils.helpers.mnist_mask_indices(shape=(28, 28), mode='hole', center=(14, 14), num_holes=1, hole_size=3, orientation='horizontal', stripe_width=4, stripe_pos=14, seed=None)[source]

Create pixel masks on a 2D grid and return flattened index arrays.

Parameters:
  • shape ((H, W)) – Image height and width.

  • mode (str) –

    One of:
    • ’hole’ : mask a hole with size hole_size`×`hole_size.

    • ’edge_stripe’ : mask a stripe along the edges; choose side and stripe_width.

    • ’upper_half’ : mask rows [0 : H//2)

    • ’lower_half’ : mask rows [H//2 : H)

    • ’left_half’ : mask cols [0 : W//2)

    • ’right_half’ : mask cols [W//2 : W)

  • center (tuple (row, col)) – Center of the hole when mode=’hole’.

  • hole_size (int) – Side length of each square hole (odd is best).

  • orientation (str) – Which edges to mask for mode=’edge_stripe’. ‘horizontal’ masks horizontal strip; ‘vertical’ masks vertical strip.

  • stripe_width (int) – Stripe thickness in pixels (for edge stripes).

  • stripe_pos (int) – Position of the stripe when mode=’edge_stripe’.

  • seed (int or None) – RNG seed for reproducibility.

Returns:

  • ind_x1 (np.ndarray (1D, dtype=int)) – Flattened indices of unmasked pixels.

  • ind_x2 (np.ndarray (1D, dtype=int)) – Flattened indices of masked pixels.

Data I/O

bayesgm.utils.data_io.save_data(fname, data, delimiter='\t')[source]

Save the data to the specified path.

Parameters:

fnamestr

The file name or path where the data will be saved.

datanp.ndarray

The data to save.

delimiterstr, optional

The delimiter for saving .txt or .csv files (default: ‘ ‘).

Raises:

ValueError

If the file extension is not recognized.

bayesgm.utils.data_io.parse_file(path, sep='\t', header=0, normalize=True)[source]

Parse an input data file and return a single data matrix.

This is a general-purpose loader for the BGM model, where the input is a single data matrix (as opposed to the causal triplet format with treatment, outcome, and covariates).

Parameters:
  • path (str) – Path to the input file. Supported formats: .npz, .csv, .txt.

  • sep (str, optional) – Separator for .csv or .txt files. Default is tab-delimited.

  • header (int or None, optional) – Row number to use as column names in .csv files. Default is 0.

  • normalize (bool, optional) – If True, the data will be normalized using StandardScaler.

Returns:

data – The data matrix with shape (n_samples, n_features), dtype float32.

Return type:

np.ndarray

Examples

>>> data = parse_file("data.csv", sep=',', normalize=True)
>>> data = parse_file("data.npz", normalize=False)
bayesgm.utils.data_io.parse_file_triplet(path, sep='\t', header=0, normalize=True)[source]

Parse an input file and extract the (treatment, outcome, covariates) triplet for CausalBGM model training or evaluation.

Parameters:
  • path (str) – Path to the input file. The file can be in .npz, .csv, or .txt format.

  • sep (str, optional) – Separator used in .csv or .txt files. Defaults to tab-delimited format.

  • header (int or None, optional) – Row number to use as column names in .csv files. Default is 0 (the first row). Use None if the file does not have a header.

  • normalize (bool, optional) – If True, the features in v will be normalized using StandardScaler.

Returns:

  • data_x (np.ndarray) – The treatment variable(s) extracted from the file, reshaped to (-1, 1).

  • data_y (np.ndarray) – The outcome variable(s) extracted from the file, reshaped to (-1, 1).

  • data_v (np.ndarray) – Covariates extracted from the file. Normalized if normalize=True.

Notes

  • Supported file formats:
    • .npz: Numpy compressed files with keys x, y, and v.

    • .csv: Comma-separated value files with treatment, outcome, and covariates as columns.

    • .txt: Tab- or other character-delimited text files with similar structure to .csv.

  • The input file must exist at the specified path.

  • The first column is assumed to be the treatment variable (x).

  • The second column is assumed to be the outcome variable (y).

  • Remaining columns are assumed to be covariates (v).

Examples

# Example for .csv input data_x, data_y, data_v = parse_file_triplet(“data.csv”, sep=’,’, header=0, normalize=True)

# Example for .npz input data_x, data_y, data_v = parse_file_triplet(“data.npz”, normalize=False)