Model Validation¶
Comprehensive toolkit for PD model validation per EBA GL/2017/16.
Discrimination¶
from creditriskengine.validation.discrimination import auroc, gini_coefficient, ks_statistic
auc = auroc(y_true, y_score)
gini = gini_coefficient(y_true, y_score)
ks = ks_statistic(y_true, y_score)
Calibration¶
from creditriskengine.validation.calibration import binomial_test, hosmer_lemeshow_test
result = binomial_test(n_defaults=15, n_observations=1000, predicted_pd=0.02)
Stability¶
from creditriskengine.validation.stability import population_stability_index
psi = population_stability_index(base_distribution, current_distribution)
creditriskengine.validation
¶
Model validation and backtesting framework.
Provides discrimination, calibration, stability, and backtesting metrics for credit risk model validation per SR 11-7, EBA GL/2017/16.
pd_backtest_full(predicted_pds, observed_defaults, rating_grades, confidence=0.99)
¶
Full PD backtest: binomial, traffic-light, and Jeffreys per rating grade.
Runs a comprehensive backtesting suite for each rating grade, combining frequentist (binomial), regulatory (traffic-light), and Bayesian (Jeffreys) approaches as recommended in BCBS WP14 and EBA GL/2017/16.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predicted_pds
|
ndarray
|
Predicted PDs per exposure. |
required |
observed_defaults
|
ndarray
|
Binary default indicator (0/1). |
required |
rating_grades
|
ndarray
|
Rating grade labels per exposure (same length). |
required |
confidence
|
float
|
Confidence level for binomial and Jeffreys tests. |
0.99
|
Returns:
| Type | Description |
|---|---|
FullBacktestResult
|
FullBacktestResult with per-grade detail and overall assessment. |
Raises:
| Type | Description |
|---|---|
ValueError
|
If input arrays have mismatched lengths. |
Source code in creditriskengine\validation\backtesting.py
92 93 94 95 96 97 98 99 100 101 102 103 104 105 106 107 108 109 110 111 112 113 114 115 116 117 118 119 120 121 122 123 124 125 126 127 128 129 130 131 132 133 134 135 136 137 138 139 140 141 142 143 144 145 146 147 148 149 150 151 152 153 154 155 156 157 158 159 160 161 162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 | |
pd_backtest_summary(predicted_pds, observed_defaults, rating_grades=None)
¶
Summary statistics for PD backtesting.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
predicted_pds
|
ndarray
|
Predicted PDs per exposure. |
required |
observed_defaults
|
ndarray
|
Binary default indicator (0/1). |
required |
rating_grades
|
ndarray | None
|
Optional rating grade labels for grouping. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, float]
|
Dict with overall metrics. |
Source code in creditriskengine\validation\backtesting.py
binomial_test(n_defaults, n_observations, predicted_pd, confidence=0.99)
¶
Binomial test for PD calibration.
Tests if observed defaults are consistent with predicted PD, assuming independent Bernoulli trials.
H0: true PD = predicted_pd z = (d - NPD) / sqrt(NPD*(1-PD))
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_defaults
|
int
|
Observed number of defaults. |
required |
n_observations
|
int
|
Total number of observations. |
required |
predicted_pd
|
float
|
Predicted (average) PD. |
required |
confidence
|
float
|
Confidence level for the test. |
0.99
|
Returns:
| Type | Description |
|---|---|
dict[str, float | bool]
|
Dict with z_stat, p_value, critical_value, reject_h0. |
Source code in creditriskengine\validation\calibration.py
brier_score(y_true, y_pred)
¶
Brier Score for probability calibration.
BS = (1/N) * Sum(PD_i - D_i)^2
Lower is better. Perfect calibration: BS approaches PD*(1-PD).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary outcomes (0/1). |
required |
y_pred
|
ndarray
|
Predicted probabilities. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Brier score. |
Source code in creditriskengine\validation\calibration.py
hosmer_lemeshow_test(observed_defaults, predicted_pds, group_counts, n_groups=10)
¶
Hosmer-Lemeshow goodness-of-fit test.
H-L = Sum(i=1..g) [(O_i - E_i)^2 / (N_i * pi_i * (1-pi_i))]
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
observed_defaults
|
ndarray
|
Observed defaults per group. |
required |
predicted_pds
|
ndarray
|
Average predicted PD per group. |
required |
group_counts
|
ndarray
|
Number of observations per group. |
required |
n_groups
|
int
|
Number of groups (for degrees of freedom). |
10
|
Returns:
| Type | Description |
|---|---|
dict[str, float | bool]
|
Dict with hl_stat, p_value, df, reject_h0 (at 5% level). |
Source code in creditriskengine\validation\calibration.py
traffic_light_test(n_defaults, n_observations, predicted_pd)
¶
Basel Committee Traffic Light approach.
Reference: BCBS WP14 (May 2005) — "Studies on the Validation of Internal Rating Systems".
Green: observed < 95th percentile of binomial Yellow: 95th-99.99th percentile Red: > 99.99th percentile
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
n_defaults
|
int
|
Observed defaults. |
required |
n_observations
|
int
|
Total observations. |
required |
predicted_pd
|
float
|
Predicted PD. |
required |
Returns:
| Type | Description |
|---|---|
str
|
"green", "yellow", or "red". |
Source code in creditriskengine\validation\calibration.py
auroc(y_true, y_score)
¶
Area Under the Receiver Operating Characteristic curve.
Uses the Mann-Whitney U-statistic formulation for efficiency.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary labels (1=default, 0=non-default). |
required |
y_score
|
ndarray
|
Predicted scores/PDs (higher = riskier). |
required |
Returns:
| Type | Description |
|---|---|
float
|
AUROC value in [0, 1]. |
Source code in creditriskengine\validation\discrimination.py
gini_coefficient(y_true, y_score)
¶
Gini coefficient (Accuracy Ratio).
Formula: Gini = 2 * AUC - 1
Standard metric per SR 11-7, ECB Guide.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary labels. |
required |
y_score
|
ndarray
|
Predicted scores/PDs. |
required |
Returns:
| Type | Description |
|---|---|
float
|
Gini coefficient in [-1, 1]. |
Source code in creditriskengine\validation\discrimination.py
ks_statistic(y_true, y_score)
¶
Kolmogorov-Smirnov statistic.
Maximum separation between cumulative distributions of defaulters and non-defaulters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
y_true
|
ndarray
|
Binary labels. |
required |
y_score
|
ndarray
|
Predicted scores/PDs. |
required |
Returns:
| Type | Description |
|---|---|
float
|
KS statistic in [0, 1]. |
Source code in creditriskengine\validation\discrimination.py
characteristic_stability_index(actual, expected, bins=10)
¶
Characteristic Stability Index (CSI).
Same formula as PSI but applied at individual feature level.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
actual
|
ndarray
|
Actual feature distribution. |
required |
expected
|
ndarray
|
Expected/reference feature distribution. |
required |
bins
|
int
|
Number of bins. |
10
|
Returns:
| Type | Description |
|---|---|
float
|
CSI value (same interpretation as PSI). |
Source code in creditriskengine\validation\stability.py
population_stability_index(actual, expected, bins=10, precomputed=False)
¶
Population Stability Index (PSI).
PSI = Sum [(%actual_i - %expected_i) * ln(%actual_i / %expected_i)]
Interpretation: - PSI < 0.10: no significant change - 0.10 <= PSI < 0.25: moderate shift - PSI >= 0.25: significant shift
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
actual
|
ndarray
|
Actual distribution values (raw or pre-binned proportions). |
required |
expected
|
ndarray
|
Expected/reference distribution (raw or pre-binned proportions). |
required |
bins
|
int
|
Number of bins (if not precomputed). |
10
|
precomputed
|
bool
|
If True, actual/expected are already bin proportions. |
False
|
Returns:
| Type | Description |
|---|---|
float
|
PSI value. |