Part 7 of the “Preparing for Joint Commission AI Certification” series
Here’s the uncomfortable reality the RUAIH guidance addresses directly: an AI tool can be validated, FDA-cleared, and widely deployed-yet still perform poorly for your patients.
The guidance states it plainly: “AI tools may not perform well in specific settings or disease conditions or may not be generalized to larger populations if AI training data lacks diversity or bases predictive output on biased associations.”
Vendor validation doesn’t eliminate the need for local bias assessment. Your patients aren’t average. Your population isn’t national. And the consequences of biased AI fall on the patients you serve.
Why Vendor Assurances Aren’t Enough
When vendors describe their AI products, you’ll often hear about rigorous testing, diverse training data, and fairness evaluations. These assurances are usually genuine-but they have inherent limitations.
Training data may not match your population. An algorithm trained predominantly on data from urban academic medical centers may perform differently in a rural community hospital. An AI developed using data from younger, healthier patients may struggle with elderly populations with complex comorbidities.
Bias can emerge from many sources. The RUAIH guidance acknowledges that “bias can exist at any stage of the AI system lifecycle and can be due to several factors, including but not limited to data, the design and features of the underlying models, the methods of training/testing, and use of the AI tool.”
Vendor testing can’t anticipate how the AI will interact with your specific EHR configuration, documentation patterns, clinical workflows, and patient mix. Local validation isn’t a criticism of the vendor-it’s recognition that context matters.
The CHAI Applied Model Card
The Coalition for Health AI has developed an “Applied Model Card”-essentially a nutrition label for AI algorithms-that provides structured information about model development, performance, and limitations.
Key information the Model Card captures:
- Developer identity and intended uses
- Target patient populations
- Training data characteristics
- Key performance metrics
- Known risks and limitations
- Bias evaluation approaches and results
- Maintenance and monitoring requirements
The RUAIH guidance recommends requesting this information from vendors: “Healthcare organizations should request information from vendors on known risks, biases, and limitations of the AI tool. They should also consider asking how bias was evaluated and for which populations.”
If your vendor can provide a completed CHAI Model Card or equivalent documentation, you have a foundation for understanding potential bias concerns. If they can’t-or won’t-that itself is informative.
A Practical Bias Assessment Framework
For regional health systems without dedicated data science teams, comprehensive bias assessment can feel impossible. Here’s a practical framework that doesn’t require building your own validation infrastructure.
Step 1: Understand Your Population
Before you can assess whether an AI tool works well for your patients, you need to know who your patients are.
Pull demographic data on the population likely to be affected by the AI tool:
- Age distribution
- Race and ethnicity breakdown
- Sex distribution
- Payer mix (as proxy for socioeconomic factors)
- Common comorbidities
- Geographic distribution
Compare this to any information the vendor provides about their training and validation populations. Significant mismatches warrant closer scrutiny.
Step 2: Request Vendor Bias Information
Before deployment, ask vendors directly:
- What demographic groups were represented in training and validation data?
- Was performance evaluated across demographic subgroups (by race, age, sex)?
- Were any performance disparities identified? How were they addressed?
- Are there known populations for which the tool is contraindicated or performs less well?
- What bias mitigation techniques were employed during development?
Document vendor responses. If they can’t answer these questions, consider whether the tool is ready for deployment.
Step 3: Design Local Validation
If the vendor’s validation population doesn’t match yours, consider local validation before full deployment. This doesn’t require building an ML pipeline-it requires structured data collection.
For predictive tools (alerts, risk scores):
- Run the AI in “silent mode” (generating predictions without displaying to users)
- After a defined period, compare predictions to actual outcomes
- Stratify results by demographic groups
For documentation or classification tools:
- Sample outputs across patient types
- Compare AI outputs to manual review
- Look for systematic differences in quality across populations
For clinical decision support:
- Track recommendation acceptance rates across populations
- Monitor for differential override patterns
- Collect user feedback stratified by patient type
Validation doesn’t require perfection-it requires conscious attention to whether the AI performs consistently across your patient population.
Step 4: Implement Ongoing Bias Monitoring
The guidance is clear that bias assessment isn’t a one-time pre-deployment exercise: “Evaluation should occur before deploying the tool and continue as the AI tool is being used to identify any biases in outcomes that may not have been detected initially.”
Practical ongoing monitoring approaches:
Performance stratification: When reviewing AI performance metrics (see Post 5), stratify by demographic groups. An algorithm with 85% overall accuracy that performs at 60% for Black patients has a bias problem that aggregate metrics hide.
Outcome monitoring: For AI that influences treatment decisions, monitor clinical outcomes by population. Are patients from certain groups experiencing different results?
Override analysis: If clinicians frequently override AI recommendations for specific patient groups, investigate why. It may indicate systematic issues.
User feedback: Create channels for clinicians to report suspected bias. They may notice patterns invisible in aggregate data.
Step 5: Document and Act
Bias assessment isn’t just analytical-it requires documentation and action.
Document:
- Pre-deployment bias evaluation methodology and findings
- Vendor-provided bias information
- Local validation results
- Ongoing monitoring results
- Any concerns identified and actions taken
Act:
- If bias is identified, determine if it can be mitigated (workflow adjustments, additional training, population-specific modifications)
- If bias can’t be adequately mitigated, consider whether continued use is appropriate
- Communicate concerns to vendors and request response
- Escalate significant bias concerns to governance
Common Bias Patterns to Watch For
Certain bias patterns recur across healthcare AI applications:
Demographic underrepresentation: AI performs poorly for groups underrepresented in training data. Watch for differential performance by race, ethnicity, age extremes, and patients with disabilities.
Proxy discrimination: Even without explicit demographic variables, AI can learn proxies (zip code, insurance type, clinical history patterns) that effectively encode demographic information.
Label bias: If the outcome the AI is predicting was historically measured or recorded differently across populations, the AI may learn those biased patterns.
Automation bias affecting care: Even without technical bias, clinical staff may over-rely on AI recommendations in ways that affect marginalized populations differently.
Documentation patterns: For AI that analyzes clinical documentation, differential documentation quality or completeness across populations can create bias.
The Equity Imperative
The RUAIH guidance connects bias assessment to health equity: “This can lead to safety errors, misdiagnoses, administrative burden, operational inefficiencies, compromised quality, and organizational risk.”
But there’s a broader frame. Healthcare organizations have an ethical obligation to ensure new technologies don’t exacerbate existing disparities. AI has potential to both reduce disparities (by standardizing care decisions) and amplify them (by encoding historical biases into automated systems).
Bias assessment isn’t just about compliance or risk management-it’s about ensuring AI serves all your patients, not just those who look like the populations used to build the algorithms.
Working with Vendors on Bias
How you engage vendors on bias matters.
Before purchase:
- Make bias documentation a requirement in RFPs
- Ask specific questions about training data diversity and subgroup performance
- Request evidence, not just assurances
During contracting:
- Include bias-related representations and warranties
- Require notification if vendor identifies new bias concerns
- Reserve rights to conduct local bias validation
During deployment:
- Share local validation findings with vendors
- Request vendor assistance if bias is identified
- Document vendor responsiveness to bias concerns
At renewal:
- Review bias monitoring history as part of contract decision
- Escalate unresolved bias concerns in renewal negotiations
Vendors generally want to address bias concerns-it affects their reputation and market position. But without customer pressure, it may not be prioritized.
Getting Started
If you haven’t begun systematic bias assessment:
Immediate:
- For AI tools currently in production, request bias documentation from vendors
- Review any available performance data for demographic stratification
- Add bias assessment to procurement requirements for future AI purchases
Short-term:
- Develop a bias assessment checklist for new AI deployments
- Identify highest-risk AI tools for more intensive local validation
- Establish baseline demographic data for populations affected by AI tools
Ongoing:
- Integrate bias monitoring into regular AI performance review
- Create feedback mechanism for clinicians to report suspected bias
- Include bias findings in governance reporting
You won’t achieve perfect bias detection. The goal is systematic attention to whether AI tools serve all your patients equitably-not just those who match training data averages.
Next in the series: AI Training for Clinical Staff: Beyond the Vendor Demo
Harness.health helps regional health systems build AI governance programs aligned with Joint Commission RUAIH guidance. Learn more about our platform