Skip to content

Validation API

creditriskengine.validation.discrimination

Discriminatory power tests for credit risk models.

Statistical tests measuring a model's ability to separate defaulters from non-defaulters.

Regulatory context: - US SR 11-7 (Fed, April 2011) - ECB Guide to Internal Models - PRA SS1/23 - EBA GL/2017/16

auroc(y_true, y_score)

Area Under the Receiver Operating Characteristic curve.

Uses the Mann-Whitney U-statistic formulation for efficiency.

Parameters:

Name Type Description Default
y_true ndarray

Binary labels (1=default, 0=non-default).

required
y_score ndarray

Predicted scores/PDs (higher = riskier).

required

Returns:

Type Description
float

AUROC value in [0, 1].

Source code in creditriskengine\validation\discrimination.py
def auroc(y_true: np.ndarray, y_score: np.ndarray) -> float:
    """Area Under the Receiver Operating Characteristic curve.

    Uses the Mann-Whitney U-statistic formulation for efficiency.

    Args:
        y_true: Binary labels (1=default, 0=non-default).
        y_score: Predicted scores/PDs (higher = riskier).

    Returns:
        AUROC value in [0, 1].
    """
    y_true = np.asarray(y_true, dtype=np.int64)
    y_score = np.asarray(y_score, dtype=np.float64)

    n_pos = np.sum(y_true == 1)
    n_neg = np.sum(y_true == 0)

    if n_pos == 0 or n_neg == 0:
        return 0.5

    # Sort by score ascending
    order = np.argsort(y_score)
    y_sorted = y_true[order]

    # Mann-Whitney U: for each positive, count negatives ranked below it
    cum_neg = np.cumsum(1 - y_sorted)
    auc = float(np.sum(cum_neg[y_sorted == 1])) / float(n_pos * n_neg)
    return auc

gini_coefficient(y_true, y_score)

Gini coefficient (Accuracy Ratio).

Formula: Gini = 2 * AUC - 1

Standard metric per SR 11-7, ECB Guide.

Parameters:

Name Type Description Default
y_true ndarray

Binary labels.

required
y_score ndarray

Predicted scores/PDs.

required

Returns:

Type Description
float

Gini coefficient in [-1, 1].

Source code in creditriskengine\validation\discrimination.py
def gini_coefficient(y_true: np.ndarray, y_score: np.ndarray) -> float:
    """Gini coefficient (Accuracy Ratio).

    Formula: Gini = 2 * AUC - 1

    Standard metric per SR 11-7, ECB Guide.

    Args:
        y_true: Binary labels.
        y_score: Predicted scores/PDs.

    Returns:
        Gini coefficient in [-1, 1].
    """
    return 2.0 * auroc(y_true, y_score) - 1.0

ks_statistic(y_true, y_score)

Kolmogorov-Smirnov statistic.

Maximum separation between cumulative distributions of defaulters and non-defaulters.

Parameters:

Name Type Description Default
y_true ndarray

Binary labels.

required
y_score ndarray

Predicted scores/PDs.

required

Returns:

Type Description
float

KS statistic in [0, 1].

Source code in creditriskengine\validation\discrimination.py
def ks_statistic(y_true: np.ndarray, y_score: np.ndarray) -> float:
    """Kolmogorov-Smirnov statistic.

    Maximum separation between cumulative distributions
    of defaulters and non-defaulters.

    Args:
        y_true: Binary labels.
        y_score: Predicted scores/PDs.

    Returns:
        KS statistic in [0, 1].
    """
    y_true = np.asarray(y_true, dtype=np.int64)
    y_score = np.asarray(y_score, dtype=np.float64)

    defaults = y_score[y_true == 1]
    non_defaults = y_score[y_true == 0]

    if len(defaults) == 0 or len(non_defaults) == 0:
        return 0.0

    ks_stat, _ = stats.ks_2samp(defaults, non_defaults)
    return float(ks_stat)

cap_curve(y_true, y_score)

Cumulative Accuracy Profile (CAP) curve.

Parameters:

Name Type Description Default
y_true ndarray

Binary labels.

required
y_score ndarray

Predicted scores/PDs.

required

Returns:

Type Description
tuple[ndarray, ndarray]

Tuple of (fraction_of_population, fraction_of_defaults).

Source code in creditriskengine\validation\discrimination.py
def cap_curve(
    y_true: np.ndarray,
    y_score: np.ndarray,
) -> tuple[np.ndarray, np.ndarray]:
    """Cumulative Accuracy Profile (CAP) curve.

    Args:
        y_true: Binary labels.
        y_score: Predicted scores/PDs.

    Returns:
        Tuple of (fraction_of_population, fraction_of_defaults).
    """
    y_true = np.asarray(y_true, dtype=np.int64)
    y_score = np.asarray(y_score, dtype=np.float64)

    n = len(y_true)
    n_defaults = np.sum(y_true)

    # Sort by score descending (riskiest first)
    order = np.argsort(-y_score)
    y_sorted = y_true[order]

    cum_defaults = np.cumsum(y_sorted)
    frac_pop = np.arange(1, n + 1) / n
    frac_defaults = cum_defaults / n_defaults if n_defaults > 0 else cum_defaults

    return frac_pop, frac_defaults

accuracy_ratio(y_true, y_score)

Accuracy Ratio from CAP curve.

AR = area_under_model_CAP / area_under_perfect_CAP

Equivalent to Gini coefficient when computed correctly.

Parameters:

Name Type Description Default
y_true ndarray

Binary labels.

required
y_score ndarray

Predicted scores/PDs.

required

Returns:

Type Description
float

Accuracy ratio in [-1, 1].

Source code in creditriskengine\validation\discrimination.py
def accuracy_ratio(y_true: np.ndarray, y_score: np.ndarray) -> float:
    """Accuracy Ratio from CAP curve.

    AR = area_under_model_CAP / area_under_perfect_CAP

    Equivalent to Gini coefficient when computed correctly.

    Args:
        y_true: Binary labels.
        y_score: Predicted scores/PDs.

    Returns:
        Accuracy ratio in [-1, 1].
    """
    return gini_coefficient(y_true, y_score)

information_value(feature, target, bins=10)

Information Value (IV) for a single feature.

IV = Sum [(%good_i - %bad_i) * WoE_i]

Interpretation: - IV < 0.02: useless - 0.02-0.1: weak - 0.1-0.3: medium - 0.3-0.5: strong - > 0.5: suspicious (possible overfitting)

Parameters:

Name Type Description Default
feature ndarray

Feature values.

required
target ndarray

Binary target (1=default, 0=non-default).

required
bins int

Number of bins for continuous features.

10

Returns:

Type Description
float

Information Value.

Source code in creditriskengine\validation\discrimination.py
def information_value(
    feature: np.ndarray,
    target: np.ndarray,
    bins: int = 10,
) -> float:
    """Information Value (IV) for a single feature.

    IV = Sum [(%good_i - %bad_i) * WoE_i]

    Interpretation:
    - IV < 0.02: useless
    - 0.02-0.1: weak
    - 0.1-0.3: medium
    - 0.3-0.5: strong
    - > 0.5: suspicious (possible overfitting)

    Args:
        feature: Feature values.
        target: Binary target (1=default, 0=non-default).
        bins: Number of bins for continuous features.

    Returns:
        Information Value.
    """
    feature = np.asarray(feature, dtype=np.float64)
    target = np.asarray(target, dtype=np.int64)

    total_good = np.sum(target == 0)
    total_bad = np.sum(target == 1)

    if total_good == 0 or total_bad == 0:
        return 0.0

    # Bin the feature
    try:
        bin_edges = np.percentile(feature, np.linspace(0, 100, bins + 1))
        bin_edges = np.unique(bin_edges)
        bin_indices = np.digitize(feature, bin_edges[1:-1])
    except (ValueError, IndexError):
        return 0.0

    iv = 0.0
    for b in range(len(bin_edges) - 1):
        mask = bin_indices == b
        n_good = np.sum((target == 0) & mask)
        n_bad = np.sum((target == 1) & mask)

        pct_good = max(n_good / total_good, 1e-10)
        pct_bad = max(n_bad / total_bad, 1e-10)

        woe = np.log(pct_good / pct_bad)
        iv += (pct_good - pct_bad) * woe

    return float(iv)

somers_d(y_true, y_score)

Somers' D statistic.

Equals Gini coefficient when there are no ties.

Parameters:

Name Type Description Default
y_true ndarray

Binary labels.

required
y_score ndarray

Predicted scores.

required

Returns:

Type Description
float

Somers' D value.

Source code in creditriskengine\validation\discrimination.py
def somers_d(y_true: np.ndarray, y_score: np.ndarray) -> float:
    """Somers' D statistic.

    Equals Gini coefficient when there are no ties.

    Args:
        y_true: Binary labels.
        y_score: Predicted scores.

    Returns:
        Somers' D value.
    """
    return gini_coefficient(y_true, y_score)

divergence(y_true, y_score)

Divergence statistic.

D = (mean_default - mean_non_default)^2 / (0.5 * (var_default + var_non_default))

Parameters:

Name Type Description Default
y_true ndarray

Binary labels.

required
y_score ndarray

Predicted scores.

required

Returns:

Type Description
float

Divergence statistic (higher = better separation).

Source code in creditriskengine\validation\discrimination.py
def divergence(
    y_true: np.ndarray,
    y_score: np.ndarray,
) -> float:
    """Divergence statistic.

    D = (mean_default - mean_non_default)^2 / (0.5 * (var_default + var_non_default))

    Args:
        y_true: Binary labels.
        y_score: Predicted scores.

    Returns:
        Divergence statistic (higher = better separation).
    """
    y_true = np.asarray(y_true, dtype=np.int64)
    y_score = np.asarray(y_score, dtype=np.float64)

    defaults = y_score[y_true == 1]
    non_defaults = y_score[y_true == 0]

    if len(defaults) < 2 or len(non_defaults) < 2:
        return 0.0

    mean_diff = np.mean(defaults) - np.mean(non_defaults)
    avg_var = 0.5 * (np.var(defaults, ddof=1) + np.var(non_defaults, ddof=1))

    if avg_var < 1e-15:
        return 0.0

    return float(mean_diff ** 2 / avg_var)

creditriskengine.validation.calibration

Calibration tests — predicted vs observed default rates.

Tests whether predicted PDs are consistent with observed defaults.

Regulatory context: - BCBS WP14 (May 2005): Traffic Light approach - SR 11-7 (Fed): Outcomes analysis requirements - ECB Guide to Internal Models - EBA GL/2017/16

binomial_test(n_defaults, n_observations, predicted_pd, confidence=0.99)

Binomial test for PD calibration.

Tests if observed defaults are consistent with predicted PD, assuming independent Bernoulli trials.

H0: true PD = predicted_pd z = (d - NPD) / sqrt(NPD*(1-PD))

Parameters:

Name Type Description Default
n_defaults int

Observed number of defaults.

required
n_observations int

Total number of observations.

required
predicted_pd float

Predicted (average) PD.

required
confidence float

Confidence level for the test.

0.99

Returns:

Type Description
dict[str, float | bool]

Dict with z_stat, p_value, critical_value, reject_h0.

Source code in creditriskengine\validation\calibration.py
def binomial_test(
    n_defaults: int,
    n_observations: int,
    predicted_pd: float,
    confidence: float = 0.99,
) -> dict[str, float | bool]:
    """Binomial test for PD calibration.

    Tests if observed defaults are consistent with predicted PD,
    assuming independent Bernoulli trials.

    H0: true PD = predicted_pd
    z = (d - N*PD) / sqrt(N*PD*(1-PD))

    Args:
        n_defaults: Observed number of defaults.
        n_observations: Total number of observations.
        predicted_pd: Predicted (average) PD.
        confidence: Confidence level for the test.

    Returns:
        Dict with z_stat, p_value, critical_value, reject_h0.
    """
    if n_observations == 0:
        return {"z_stat": 0.0, "p_value": 1.0, "critical_value": 0.0, "reject_h0": False}

    expected = n_observations * predicted_pd
    std = (n_observations * predicted_pd * (1.0 - predicted_pd)) ** 0.5

    if std < 1e-15:
        return {"z_stat": 0.0, "p_value": 1.0, "critical_value": 0.0, "reject_h0": False}

    z = (n_defaults - expected) / std
    p_value = 1.0 - stats.norm.cdf(z)  # One-sided (upper tail)
    critical = stats.norm.ppf(confidence)

    return {
        "z_stat": float(z),
        "p_value": float(p_value),
        "critical_value": float(critical),
        "reject_h0": z > critical,
    }

hosmer_lemeshow_test(observed_defaults, predicted_pds, group_counts, n_groups=10)

Hosmer-Lemeshow goodness-of-fit test.

H-L = Sum(i=1..g) [(O_i - E_i)^2 / (N_i * pi_i * (1-pi_i))]

Parameters:

Name Type Description Default
observed_defaults ndarray

Observed defaults per group.

required
predicted_pds ndarray

Average predicted PD per group.

required
group_counts ndarray

Number of observations per group.

required
n_groups int

Number of groups (for degrees of freedom).

10

Returns:

Type Description
dict[str, float | bool]

Dict with hl_stat, p_value, df, reject_h0 (at 5% level).

Source code in creditriskengine\validation\calibration.py
def hosmer_lemeshow_test(
    observed_defaults: np.ndarray,
    predicted_pds: np.ndarray,
    group_counts: np.ndarray,
    n_groups: int = 10,
) -> dict[str, float | bool]:
    """Hosmer-Lemeshow goodness-of-fit test.

    H-L = Sum(i=1..g) [(O_i - E_i)^2 / (N_i * pi_i * (1-pi_i))]

    Args:
        observed_defaults: Observed defaults per group.
        predicted_pds: Average predicted PD per group.
        group_counts: Number of observations per group.
        n_groups: Number of groups (for degrees of freedom).

    Returns:
        Dict with hl_stat, p_value, df, reject_h0 (at 5% level).
    """
    expected = group_counts * predicted_pds
    variance = group_counts * predicted_pds * (1.0 - predicted_pds)

    # Avoid division by zero
    mask = variance > 1e-15
    hl = float(np.sum((observed_defaults[mask] - expected[mask]) ** 2 / variance[mask]))

    df = max(n_groups - 2, 1)
    p_value = 1.0 - stats.chi2.cdf(hl, df)

    return {
        "hl_stat": hl,
        "p_value": float(p_value),
        "df": df,
        "reject_h0": p_value < 0.05,
    }

spiegelhalter_test(y_true, y_pred)

Spiegelhalter test for overall calibration.

Tests whether the sum of (predicted - observed)^2 is consistent with expectations under correct calibration.

Parameters:

Name Type Description Default
y_true ndarray

Binary outcomes (0/1).

required
y_pred ndarray

Predicted probabilities.

required

Returns:

Type Description
dict[str, float | bool]

Dict with z_stat, p_value, reject_h0 (at 5% level).

Source code in creditriskengine\validation\calibration.py
def spiegelhalter_test(
    y_true: np.ndarray,
    y_pred: np.ndarray,
) -> dict[str, float | bool]:
    """Spiegelhalter test for overall calibration.

    Tests whether the sum of (predicted - observed)^2 is
    consistent with expectations under correct calibration.

    Args:
        y_true: Binary outcomes (0/1).
        y_pred: Predicted probabilities.

    Returns:
        Dict with z_stat, p_value, reject_h0 (at 5% level).
    """
    y_true = np.asarray(y_true, dtype=np.float64)
    y_pred = np.asarray(y_pred, dtype=np.float64)

    n = len(y_true)
    if n == 0:
        return {"z_stat": 0.0, "p_value": 1.0, "reject_h0": False}

    brier = np.sum((y_pred - y_true) ** 2)
    expected_brier = np.sum(y_pred * (1.0 - y_pred))

    # Variance of Brier score under H0
    var_terms = (1.0 - 2.0 * y_pred) ** 2 * y_pred * (1.0 - y_pred)
    var_brier = np.sum(var_terms)

    if var_brier < 1e-15:
        return {"z_stat": 0.0, "p_value": 1.0, "reject_h0": False}

    z = (brier - expected_brier) / (var_brier ** 0.5)
    p_value = 2.0 * (1.0 - stats.norm.cdf(abs(z)))

    return {
        "z_stat": float(z),
        "p_value": float(p_value),
        "reject_h0": p_value < 0.05,
    }

traffic_light_test(n_defaults, n_observations, predicted_pd)

Basel Committee Traffic Light approach.

Reference: BCBS WP14 (May 2005) — "Studies on the Validation of Internal Rating Systems".

Green: observed < 95th percentile of binomial Yellow: 95th-99.99th percentile Red: > 99.99th percentile

Parameters:

Name Type Description Default
n_defaults int

Observed defaults.

required
n_observations int

Total observations.

required
predicted_pd float

Predicted PD.

required

Returns:

Type Description
str

"green", "yellow", or "red".

Source code in creditriskengine\validation\calibration.py
def traffic_light_test(
    n_defaults: int,
    n_observations: int,
    predicted_pd: float,
) -> str:
    """Basel Committee Traffic Light approach.

    Reference: BCBS WP14 (May 2005) — "Studies on the Validation
    of Internal Rating Systems".

    Green: observed < 95th percentile of binomial
    Yellow: 95th-99.99th percentile
    Red: > 99.99th percentile

    Args:
        n_defaults: Observed defaults.
        n_observations: Total observations.
        predicted_pd: Predicted PD.

    Returns:
        "green", "yellow", or "red".
    """
    if n_observations == 0:
        return "green"

    # Binomial percentiles
    p_95 = stats.binom.ppf(0.95, n_observations, predicted_pd)
    p_9999 = stats.binom.ppf(0.9999, n_observations, predicted_pd)

    if n_defaults <= p_95:
        return "green"
    elif n_defaults <= p_9999:
        return "yellow"
    else:
        return "red"

jeffreys_test(n_defaults, n_observations, predicted_pd, confidence=0.99)

Jeffreys test — Bayesian alternative to binomial.

Uses Beta posterior: Beta(d + 0.5, n - d + 0.5) where d = defaults, n = observations.

Parameters:

Name Type Description Default
n_defaults int

Observed defaults.

required
n_observations int

Total observations.

required
predicted_pd float

Predicted PD.

required
confidence float

Confidence level.

0.99

Returns:

Type Description
dict[str, float | bool]

Dict with posterior_mean, lower_bound, upper_bound, pd_within_interval.

Source code in creditriskengine\validation\calibration.py
def jeffreys_test(
    n_defaults: int,
    n_observations: int,
    predicted_pd: float,
    confidence: float = 0.99,
) -> dict[str, float | bool]:
    """Jeffreys test — Bayesian alternative to binomial.

    Uses Beta posterior: Beta(d + 0.5, n - d + 0.5)
    where d = defaults, n = observations.

    Args:
        n_defaults: Observed defaults.
        n_observations: Total observations.
        predicted_pd: Predicted PD.
        confidence: Confidence level.

    Returns:
        Dict with posterior_mean, lower_bound, upper_bound, pd_within_interval.
    """
    alpha = n_defaults + 0.5
    beta_param = n_observations - n_defaults + 0.5

    posterior_mean = alpha / (alpha + beta_param)
    lower = stats.beta.ppf((1 - confidence) / 2, alpha, beta_param)
    upper = stats.beta.ppf((1 + confidence) / 2, alpha, beta_param)

    return {
        "posterior_mean": float(posterior_mean),
        "lower_bound": float(lower),
        "upper_bound": float(upper),
        "pd_within_interval": lower <= predicted_pd <= upper,
    }

brier_score(y_true, y_pred)

Brier Score for probability calibration.

BS = (1/N) * Sum(PD_i - D_i)^2

Lower is better. Perfect calibration: BS approaches PD*(1-PD).

Parameters:

Name Type Description Default
y_true ndarray

Binary outcomes (0/1).

required
y_pred ndarray

Predicted probabilities.

required

Returns:

Type Description
float

Brier score.

Source code in creditriskengine\validation\calibration.py
def brier_score(y_true: np.ndarray, y_pred: np.ndarray) -> float:
    """Brier Score for probability calibration.

    BS = (1/N) * Sum(PD_i - D_i)^2

    Lower is better. Perfect calibration: BS approaches PD*(1-PD).

    Args:
        y_true: Binary outcomes (0/1).
        y_pred: Predicted probabilities.

    Returns:
        Brier score.
    """
    y_true = np.asarray(y_true, dtype=np.float64)
    y_pred = np.asarray(y_pred, dtype=np.float64)
    return float(np.mean((y_pred - y_true) ** 2))

creditriskengine.validation.stability

Stability monitoring metrics for credit risk models.

Regulatory context: - OCC 2011-12: Model Risk Management - SR 11-7 (Fed): Ongoing monitoring requirements - ECB Guide to Internal Models: PSI/CSI monitoring

population_stability_index(actual, expected, bins=10, precomputed=False)

Population Stability Index (PSI).

PSI = Sum [(%actual_i - %expected_i) * ln(%actual_i / %expected_i)]

Interpretation: - PSI < 0.10: no significant change - 0.10 <= PSI < 0.25: moderate shift - PSI >= 0.25: significant shift

Parameters:

Name Type Description Default
actual ndarray

Actual distribution values (raw or pre-binned proportions).

required
expected ndarray

Expected/reference distribution (raw or pre-binned proportions).

required
bins int

Number of bins (if not precomputed).

10
precomputed bool

If True, actual/expected are already bin proportions.

False

Returns:

Type Description
float

PSI value.

Source code in creditriskengine\validation\stability.py
def population_stability_index(
    actual: np.ndarray,
    expected: np.ndarray,
    bins: int = 10,
    precomputed: bool = False,
) -> float:
    """Population Stability Index (PSI).

    PSI = Sum [(%actual_i - %expected_i) * ln(%actual_i / %expected_i)]

    Interpretation:
    - PSI < 0.10: no significant change
    - 0.10 <= PSI < 0.25: moderate shift
    - PSI >= 0.25: significant shift

    Args:
        actual: Actual distribution values (raw or pre-binned proportions).
        expected: Expected/reference distribution (raw or pre-binned proportions).
        bins: Number of bins (if not precomputed).
        precomputed: If True, actual/expected are already bin proportions.

    Returns:
        PSI value.
    """
    if precomputed:
        pct_actual = np.asarray(actual, dtype=np.float64)
        pct_expected = np.asarray(expected, dtype=np.float64)
    else:
        actual = np.asarray(actual, dtype=np.float64)
        expected = np.asarray(expected, dtype=np.float64)

        # Create bins from expected distribution
        bin_edges = np.percentile(expected, np.linspace(0, 100, bins + 1))
        bin_edges = np.unique(bin_edges)
        bin_edges[0] = -np.inf
        bin_edges[-1] = np.inf

        actual_counts = np.histogram(actual, bins=bin_edges)[0]
        expected_counts = np.histogram(expected, bins=bin_edges)[0]

        pct_actual = actual_counts / max(len(actual), 1)
        pct_expected = expected_counts / max(len(expected), 1)

    # Replace zeros to avoid log(0)
    pct_actual = np.maximum(pct_actual, 1e-10)
    pct_expected = np.maximum(pct_expected, 1e-10)

    psi = float(np.sum((pct_actual - pct_expected) * np.log(pct_actual / pct_expected)))
    return psi

characteristic_stability_index(actual, expected, bins=10)

Characteristic Stability Index (CSI).

Same formula as PSI but applied at individual feature level.

Parameters:

Name Type Description Default
actual ndarray

Actual feature distribution.

required
expected ndarray

Expected/reference feature distribution.

required
bins int

Number of bins.

10

Returns:

Type Description
float

CSI value (same interpretation as PSI).

Source code in creditriskengine\validation\stability.py
def characteristic_stability_index(
    actual: np.ndarray,
    expected: np.ndarray,
    bins: int = 10,
) -> float:
    """Characteristic Stability Index (CSI).

    Same formula as PSI but applied at individual feature level.

    Args:
        actual: Actual feature distribution.
        expected: Expected/reference feature distribution.
        bins: Number of bins.

    Returns:
        CSI value (same interpretation as PSI).
    """
    return population_stability_index(actual, expected, bins)

herfindahl_index(shares)

Herfindahl-Hirschman Index for concentration.

HHI = Sum(share_i^2)

Parameters:

Name Type Description Default
shares ndarray

Array of shares/proportions (should sum to 1.0).

required

Returns:

Type Description
float

HHI value in [0, 1]. Higher = more concentrated.

Source code in creditriskengine\validation\stability.py
def herfindahl_index(shares: np.ndarray) -> float:
    """Herfindahl-Hirschman Index for concentration.

    HHI = Sum(share_i^2)

    Args:
        shares: Array of shares/proportions (should sum to 1.0).

    Returns:
        HHI value in [0, 1]. Higher = more concentrated.
    """
    shares = np.asarray(shares, dtype=np.float64)
    return float(np.sum(shares ** 2))

migration_matrix_stability(matrix_1, matrix_2)

Compare two transition/migration matrices for stability.

Uses L2 (Frobenius) norm and maximum absolute difference.

Parameters:

Name Type Description Default
matrix_1 ndarray

First transition matrix.

required
matrix_2 ndarray

Second transition matrix.

required

Returns:

Type Description
dict[str, float]

Dict with frobenius_norm, max_abs_diff, mean_abs_diff.

Source code in creditriskengine\validation\stability.py
def migration_matrix_stability(
    matrix_1: np.ndarray,
    matrix_2: np.ndarray,
) -> dict[str, float]:
    """Compare two transition/migration matrices for stability.

    Uses L2 (Frobenius) norm and maximum absolute difference.

    Args:
        matrix_1: First transition matrix.
        matrix_2: Second transition matrix.

    Returns:
        Dict with frobenius_norm, max_abs_diff, mean_abs_diff.
    """
    diff = matrix_1 - matrix_2
    return {
        "frobenius_norm": float(np.linalg.norm(diff, "fro")),
        "max_abs_diff": float(np.max(np.abs(diff))),
        "mean_abs_diff": float(np.mean(np.abs(diff))),
    }