Validation API¶
creditriskengine.validation.discrimination
¶
Discriminatory power tests for credit risk models.
Statistical tests measuring a model's ability to separate defaulters from non-defaulters.
Regulatory context: - US SR 11-7 (Fed, April 2011) - ECB Guide to Internal Models - PRA SS1/23 - EBA GL/2017/16
auroc(y_true, y_score)
¶
Area Under the Receiver Operating Characteristic curve.
Uses the Mann-Whitney U-statistic formulation for efficiency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary labels (1=default, 0=non-default). |
required |
y_score
|
ndarray
|
Predicted scores/PDs (higher = riskier). |
required |
Returns:
| Type | Description |
|---|---|
float
|
AUROC value in [0, 1]. |
Source code in creditriskengine\validation\discrimination.py
gini_coefficient(y_true, y_score)
¶
Gini coefficient (Accuracy Ratio).
Formula: Gini = 2 * AUC - 1
Standard metric per SR 11-7, ECB Guide.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary labels. |
required |
y_score
|
ndarray
|
Predicted scores/PDs. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Gini coefficient in [-1, 1]. |
Source code in creditriskengine\validation\discrimination.py
ks_statistic(y_true, y_score)
¶
Kolmogorov-Smirnov statistic.
Maximum separation between cumulative distributions of defaulters and non-defaulters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary labels. |
required |
y_score
|
ndarray
|
Predicted scores/PDs. |
required |
Returns:
| Type | Description |
|---|---|
float
|
KS statistic in [0, 1]. |
Source code in creditriskengine\validation\discrimination.py
cap_curve(y_true, y_score)
¶
Cumulative Accuracy Profile (CAP) curve.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary labels. |
required |
y_score
|
ndarray
|
Predicted scores/PDs. |
required |
Returns:
| Type | Description |
|---|---|
tuple[ndarray, ndarray]
|
Tuple of (fraction_of_population, fraction_of_defaults). |
Source code in creditriskengine\validation\discrimination.py
accuracy_ratio(y_true, y_score)
¶
Accuracy Ratio from CAP curve.
AR = area_under_model_CAP / area_under_perfect_CAP
Equivalent to Gini coefficient when computed correctly.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary labels. |
required |
y_score
|
ndarray
|
Predicted scores/PDs. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Accuracy ratio in [-1, 1]. |
Source code in creditriskengine\validation\discrimination.py
information_value(feature, target, bins=10)
¶
Information Value (IV) for a single feature.
IV = Sum [(%good_i - %bad_i) * WoE_i]
Interpretation: - IV < 0.02: useless - 0.02-0.1: weak - 0.1-0.3: medium - 0.3-0.5: strong - > 0.5: suspicious (possible overfitting)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
feature
|
ndarray
|
Feature values. |
required |
target
|
ndarray
|
Binary target (1=default, 0=non-default). |
required |
bins
|
int
|
Number of bins for continuous features. |
10
|
Returns:
| Type | Description |
|---|---|
float
|
Information Value. |
Source code in creditriskengine\validation\discrimination.py
somers_d(y_true, y_score)
¶
Somers' D statistic.
Equals Gini coefficient when there are no ties.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary labels. |
required |
y_score
|
ndarray
|
Predicted scores. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Somers' D value. |
Source code in creditriskengine\validation\discrimination.py
divergence(y_true, y_score)
¶
Divergence statistic.
D = (mean_default - mean_non_default)^2 / (0.5 * (var_default + var_non_default))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary labels. |
required |
y_score
|
ndarray
|
Predicted scores. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Divergence statistic (higher = better separation). |
Source code in creditriskengine\validation\discrimination.py
creditriskengine.validation.calibration
¶
Calibration tests — predicted vs observed default rates.
Tests whether predicted PDs are consistent with observed defaults.
Regulatory context: - BCBS WP14 (May 2005): Traffic Light approach - SR 11-7 (Fed): Outcomes analysis requirements - ECB Guide to Internal Models - EBA GL/2017/16
binomial_test(n_defaults, n_observations, predicted_pd, confidence=0.99)
¶
Binomial test for PD calibration.
Tests if observed defaults are consistent with predicted PD, assuming independent Bernoulli trials.
H0: true PD = predicted_pd z = (d - NPD) / sqrt(NPD*(1-PD))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_defaults
|
int
|
Observed number of defaults. |
required |
n_observations
|
int
|
Total number of observations. |
required |
predicted_pd
|
float
|
Predicted (average) PD. |
required |
confidence
|
float
|
Confidence level for the test. |
0.99
|
Returns:
| Type | Description |
|---|---|
dict[str, float | bool]
|
Dict with z_stat, p_value, critical_value, reject_h0. |
Source code in creditriskengine\validation\calibration.py
hosmer_lemeshow_test(observed_defaults, predicted_pds, group_counts, n_groups=10)
¶
Hosmer-Lemeshow goodness-of-fit test.
H-L = Sum(i=1..g) [(O_i - E_i)^2 / (N_i * pi_i * (1-pi_i))]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observed_defaults
|
ndarray
|
Observed defaults per group. |
required |
predicted_pds
|
ndarray
|
Average predicted PD per group. |
required |
group_counts
|
ndarray
|
Number of observations per group. |
required |
n_groups
|
int
|
Number of groups (for degrees of freedom). |
10
|
Returns:
| Type | Description |
|---|---|
dict[str, float | bool]
|
Dict with hl_stat, p_value, df, reject_h0 (at 5% level). |
Source code in creditriskengine\validation\calibration.py
spiegelhalter_test(y_true, y_pred)
¶
Spiegelhalter test for overall calibration.
Tests whether the sum of (predicted - observed)^2 is consistent with expectations under correct calibration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary outcomes (0/1). |
required |
y_pred
|
ndarray
|
Predicted probabilities. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, float | bool]
|
Dict with z_stat, p_value, reject_h0 (at 5% level). |
Source code in creditriskengine\validation\calibration.py
traffic_light_test(n_defaults, n_observations, predicted_pd)
¶
Basel Committee Traffic Light approach.
Reference: BCBS WP14 (May 2005) — "Studies on the Validation of Internal Rating Systems".
Green: observed < 95th percentile of binomial Yellow: 95th-99.99th percentile Red: > 99.99th percentile
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_defaults
|
int
|
Observed defaults. |
required |
n_observations
|
int
|
Total observations. |
required |
predicted_pd
|
float
|
Predicted PD. |
required |
Returns:
| Type | Description |
|---|---|
str
|
"green", "yellow", or "red". |
Source code in creditriskengine\validation\calibration.py
jeffreys_test(n_defaults, n_observations, predicted_pd, confidence=0.99)
¶
Jeffreys test — Bayesian alternative to binomial.
Uses Beta posterior: Beta(d + 0.5, n - d + 0.5) where d = defaults, n = observations.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_defaults
|
int
|
Observed defaults. |
required |
n_observations
|
int
|
Total observations. |
required |
predicted_pd
|
float
|
Predicted PD. |
required |
confidence
|
float
|
Confidence level. |
0.99
|
Returns:
| Type | Description |
|---|---|
dict[str, float | bool]
|
Dict with posterior_mean, lower_bound, upper_bound, pd_within_interval. |
Source code in creditriskengine\validation\calibration.py
brier_score(y_true, y_pred)
¶
Brier Score for probability calibration.
BS = (1/N) * Sum(PD_i - D_i)^2
Lower is better. Perfect calibration: BS approaches PD*(1-PD).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary outcomes (0/1). |
required |
y_pred
|
ndarray
|
Predicted probabilities. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Brier score. |
Source code in creditriskengine\validation\calibration.py
creditriskengine.validation.stability
¶
Stability monitoring metrics for credit risk models.
Regulatory context: - OCC 2011-12: Model Risk Management - SR 11-7 (Fed): Ongoing monitoring requirements - ECB Guide to Internal Models: PSI/CSI monitoring
population_stability_index(actual, expected, bins=10, precomputed=False)
¶
Population Stability Index (PSI).
PSI = Sum [(%actual_i - %expected_i) * ln(%actual_i / %expected_i)]
Interpretation: - PSI < 0.10: no significant change - 0.10 <= PSI < 0.25: moderate shift - PSI >= 0.25: significant shift
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
actual
|
ndarray
|
Actual distribution values (raw or pre-binned proportions). |
required |
expected
|
ndarray
|
Expected/reference distribution (raw or pre-binned proportions). |
required |
bins
|
int
|
Number of bins (if not precomputed). |
10
|
precomputed
|
bool
|
If True, actual/expected are already bin proportions. |
False
|
Returns:
| Type | Description |
|---|---|
float
|
PSI value. |
Source code in creditriskengine\validation\stability.py
characteristic_stability_index(actual, expected, bins=10)
¶
Characteristic Stability Index (CSI).
Same formula as PSI but applied at individual feature level.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
actual
|
ndarray
|
Actual feature distribution. |
required |
expected
|
ndarray
|
Expected/reference feature distribution. |
required |
bins
|
int
|
Number of bins. |
10
|
Returns:
| Type | Description |
|---|---|
float
|
CSI value (same interpretation as PSI). |
Source code in creditriskengine\validation\stability.py
herfindahl_index(shares)
¶
Herfindahl-Hirschman Index for concentration.
HHI = Sum(share_i^2)
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
shares
|
ndarray
|
Array of shares/proportions (should sum to 1.0). |
required |
Returns:
| Type | Description |
|---|---|
float
|
HHI value in [0, 1]. Higher = more concentrated. |
Source code in creditriskengine\validation\stability.py
migration_matrix_stability(matrix_1, matrix_2)
¶
Compare two transition/migration matrices for stability.
Uses L2 (Frobenius) norm and maximum absolute difference.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
matrix_1
|
ndarray
|
First transition matrix. |
required |
matrix_2
|
ndarray
|
Second transition matrix. |
required |
Returns:
| Type | Description |
|---|---|
dict[str, float]
|
Dict with frobenius_norm, max_abs_diff, mean_abs_diff. |