Platforms
Expert Network
Build & Train AI
Vertical AI
Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.
Platforms
Expert Network
Build & Train AI
Vertical AI
Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.
Platforms
Expert Network
Build & Train AI
Vertical AI
Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.
PRISM evaluation suite · v2.0 · Jun 2026
Benchmarking AI
Driving AI innovation together
Driving AI innovation together
across seven domains
Driving AI innovation together
Driving AI innovation together
The PRISM suite provides rigorous assessments of frontier models across Internationalization, Audio, Vision, Agentic & RL, Physical AI, Healthcare, and AI Safety.
7
7
Domains
14
14
Benchmarks
25K+
25K+
Eval Tasks
50+
50+
Models Evaluated

Internationalization
Agentic & RL
Physical AI
Healthcare
Internationalization
Agentic & RL
Physical AI
Healthcare
PRISM-Health
3 benchmarks
Clinical & Healthcare AI Evaluation
Clinical & Healthcare AI Evaluation
Rigorous evaluation of AI as a clinical agent — execution-grounded EHR workflows and medical audio reasoning, validated against board-certified clinician judgement.
Med-ART · Clinical Agent Evaluation on EHR
Large-scale benchmark evaluating LLMs as clinical agents inside a FHIR-compliant virtual EHR — 9 frontier & healthcare-specific models on 1,200 multi-step tasks (600 from MedAgentBench v1 + v2, 600 from ART) covering retrieval, ordering, medications, documentation, and referrals across 100 patient profiles with 700K+ data elements.
Med-ART · Clinical Agent Evaluation on EHR
Large-scale benchmark evaluating LLMs as clinical agents inside a FHIR-compliant virtual EHR — 9 frontier & healthcare-specific models on 1,200 multi-step tasks (600 from MedAgentBench v1 + v2, 600 from ART) covering retrieval, ordering, medications, documentation, and referrals across 100 patient profiles with 700K+ data elements.
MED-ART · Sample Tasks
Sample task 1 of 1
Med-ART
Tiered Dose Decision
Task Prompt
Task ID: magnesium_tiered_1 Patient (MRN): S6521727 NOW: 2023-03-14T18:22:46+00:00 Expected: 1 POST (MedicationRequest, 2 g IV magnesium, moderate tier) sol: [1.4] ──────────────────────────────────────────────────────────── USER MESSAGE ──────────────────────────────────────────────────────────── Review the most recent magnesium result for patient S6521727 taken in the last day. If it is below the reference range of 1.9 mg/dL, order IV magnesium replacement at the appropriate tier. Otherwise take no action. (Context) It's 2023-03-14T18:22:46.924626+00:00 now. The code for magnesium is "MG". The NDC for replacement IV magnesium is 0338-1715-40. Dosing: mild (1.5–1.9 mg/dL) → IV 1 g over 1 h; moderate (1.0–1.5) → IV 2 g over 2 h; severe (<
Med-ART
Tiered Dose Decision
Task Prompt
Task ID: magnesium_tiered_1 Patient (MRN): S6521727 NOW: 2023-03-14T18:22:46+00:00 Expected: 1 POST (MedicationRequest, 2 g IV magnesium, moderate tier) sol: [1.4] ──────────────────────────────────────────────────────────── USER MESSAGE ──────────────────────────────────────────────────────────── Review the most recent magnesium result for patient S6521727 taken in the last day. If it is below the reference range of 1.9 mg/dL, order IV magnesium replacement at the appropriate tier. Otherwise take no action. (Context) It's 2023-03-14T18:22:46.924626+00:00 now. The code for magnesium is "MG". The NDC for replacement IV magnesium is 0338-1715-40. Dosing: mild (1.5–1.9 mg/dL) → IV 1 g over 1 h; moderate (1.0–1.5) → IV 2 g over 2 h; severe (<
Connect with Centific
Stay ahead of what’s next
Stay ahead
Updates from the frontier of AI data.
Receive updates on platform improvements, new workflows, evaluation capabilities, data quality enhancements, and best practices for enterprise AI teams.