What Model Testing is Required?

No consensus on best practices has emerged to date.

Elaine Gibbs, March 13, 2024

The specifics of model testing and validation are the biggest outstanding pieces of the model governance puzzle, particularly when it comes to unfair discrimination.

The regulations released to date all approach testing differently, just as - importantly - they also differ on the underlying question of what is being tested (see our post on differing definitions of “unfair discrimination” here).

The range of stances runs from encouraging testing for unfair discrimination as part of a broader suite of model validation practices to prescribing a specific battery of tests for each insurance line and use case.

References to Testing in Specific Regulations

The references to testing we see in regulations released as of March 2024:

NAIC Model Bulletin:
- Output of AI models, for generalizability, reliability, “model drift”
- Errors and bias (testing encouraged)
- Unfair discrimination (testing encouraged)
  - Different states have tweaked this language. New Hampshire, in their adoption, strongly encouraged testing for errors, as a means to avoid unfair discrimination, with references to “unfair bias analysis” used throughout the rest of the document
Colorado’s SB21-169:
- Unfair discrimination (well-specified requirements for life insurance underwriting and pricing)
New York’s Proposed Insurance Circular Letter (underwriting and pricing only):
- Output of AI models, including for model “drift”
- Data actuarial validity (flexible approach)
- Unfair and unlawful discrimination (flexible approach, qualitative assessment required)

Implications

Some elements of testing, as detailed above, are simply best practice in model building, particularly related to validity, stability, and generalizability of model outputs. In our view, such testing is table stakes and should be implemented now.

How to approach unfair and unlawful discrimination testing, however, is an open question, particularly as carriers often do not collect information on protected classes of interest.

For carriers operating in jurisdictions where testing is required (e.g., Colorado, New York), procedures must be developed to conform to each protocol. Such regulations are all still in draft form, so a watchful, waiting stance - while evaluating what it’ll take to conform to the current drafts - is appropriate.

For other jurisdictions, the path forward is less clear. A careful analysis of a carrier’s individual risk, in terms of AI causing potential consumer harm, should inform what level of testing as well as the goals of such tests conducted, until further guidance or clarification is received.