Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.

Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.

PRISM evaluation suite · v2.0 · Jun 2026

Benchmarking AI

Driving AI innovation together

Driving AI innovation together

across seven domains

Driving AI innovation together

Driving AI innovation together

The PRISM suite provides rigorous assessments of frontier models across Internationalization, Audio, Vision, Agentic & RL, Physical AI, Healthcare, and AI Safety.

7

7

Domains

14

14

Benchmarks

25K+

25K+

Eval Tasks

50+

50+

Models Evaluated

Abstract image

PRISM-Health

3 benchmarks

Clinical & Healthcare AI Evaluation

Clinical & Healthcare AI Evaluation

Rigorous evaluation of AI as a clinical agent — execution-grounded EHR workflows and medical audio reasoning, validated against board-certified clinician judgement.

CareTransition-Audit: A Benchmark to Audit Discharge Summaries for Efficient Care Transitions

A structured audit of 100 de-identified hospital discharge summaries from MIMIC-IV against a 46-item clinical documentation rubric. Each item is a yes/no/unclear/N-A question covering whether a specific piece of information is present in the summary (allergies, medication reconciliation, follow-up plans, etc.). Items are grouped into 10 components: Demographic Information (D), Important alerts (I), Social setup (S), Comprehensive Past Medical History (C), Goals of care (G), Record of Medication Changes (R), Expected Follow-up Instructions (E), History of Presenting Complaint & Physical Examination (H), Assessment & Clinical Course (A), and Additional Documentation items (Add.). 8 of the 46 items are conditional and allow N/A. Models return structured output with answer, evidence, and justification per question; agreement with the clinician gold standard is summarized by accuracy, macro/weighted F1, and Cohen’s κ. Clinician Labels — the clinician gold standard marks 0.78 of items "yes" per summary on average. Model completeness ranges from 0.55 (Qwen 2.5-7B) to 0.75 (Gemini 3 Flash Preview), with Gemini closest to the clinician baseline.

CareTransition-Audit: A Benchmark to Audit Discharge Summaries for Efficient Care Transitions

A structured audit of 100 de-identified hospital discharge summaries from MIMIC-IV against a 46-item clinical documentation rubric. Each item is a yes/no/unclear/N-A question covering whether a specific piece of information is present in the summary (allergies, medication reconciliation, follow-up plans, etc.). Items are grouped into 10 components: Demographic Information (D), Important alerts (I), Social setup (S), Comprehensive Past Medical History (C), Goals of care (G), Record of Medication Changes (R), Expected Follow-up Instructions (E), History of Presenting Complaint & Physical Examination (H), Assessment & Clinical Course (A), and Additional Documentation items (Add.). 8 of the 46 items are conditional and allow N/A. Models return structured output with answer, evidence, and justification per question; agreement with the clinician gold standard is summarized by accuracy, macro/weighted F1, and Cohen’s κ. Clinician Labels — the clinician gold standard marks 0.78 of items "yes" per summary on average. Model completeness ranges from 0.55 (Qwen 2.5-7B) to 0.75 (Gemini 3 Flash Preview), with Gemini closest to the clinician baseline.

MM · Sample Tasks

Sample task 1 of 1

CareTransition-Audit · Discharge Summary Completeness

Task Prompt

You are a clinical documentation auditor who works on demographic information and patient alerts.

You will be given a discharge summary. Your task is to answer the following audit questions based ONLY on the information present in the discharge summary.

Note:
- You are working with a de-identified dataset, information maybe explicitly stated but the details of it maybe blank (e.g. contact information)
- Give justification clearly when dealing with information which has blanks or dashes

Rules:
- Do NOT infer or assume information.
- Answers must be strictly one of: "Yes", "No", "Unclear", or "N/A".
- Use "Unclear" ONLY if partial or ambiguous information is present.
- If the information is completely absent, answer "No".
- Use "N/A" ONLY when the question's precondition does not apply (e.g., a conditional question whose triggering condition is not met).
- Evidence must be a direct quote or exact phrase(s) from the discharge summary.
- Justification must briefly explain why the evidence supports the selected answer.
- Do NOT add any content outside the specified JSON structure.


Audit Questions for Demographic Information:
1) Are basic patient demographics (age or date of birth, and sex) documented in the discharge summary?

2) Is a patient identifier (e.g. name, medical record number, or patient identification number) documented, even if de-identified?

3) Is patient contact information (e.g. address or phone number) documented, even if de-identified or blank?

Audit Questions for Important Alerts:
1) Is the patient's allergy status documented (either specific allergies listed, or an explicit statement such as NKDA/NDA/no known allergies)?

2) If specific allergies are listed, are the allergens and their reaction types (e.g. rash, anaphylaxis) documented? Answer "N/A" if the patient is documented as having no
allergies.

3) Are any other clinical alerts documented, such as adverse drug reactions, special risks, or precautions?

--------------------

Output Format (STRICT - valid JSON only):

{
  "D": {
    "1": {
      "answer": "Yes/No/Unclear",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    },
    "2": {
      "answer": "Yes/No/Unclear",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    },
    "3": {
      "answer": "Yes/No/Unclear",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    }
  },
  "I": {
    "1": {
      "answer": "Yes/No/Unclear",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    },
    "2": {
      "answer": "Yes/No/Unclear/N/A",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    },
    "3": {
      "answer": "Yes/No/Unclear",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    }
  }
}

-------------------------
Discharge Summary:
[Patient Summary]

CareTransition-Audit · Discharge Summary Completeness

Task Prompt

You are a clinical documentation auditor who works on demographic information and patient alerts.

You will be given a discharge summary. Your task is to answer the following audit questions based ONLY on the information present in the discharge summary.

Note:
- You are working with a de-identified dataset, information maybe explicitly stated but the details of it maybe blank (e.g. contact information)
- Give justification clearly when dealing with information which has blanks or dashes

Rules:
- Do NOT infer or assume information.
- Answers must be strictly one of: "Yes", "No", "Unclear", or "N/A".
- Use "Unclear" ONLY if partial or ambiguous information is present.
- If the information is completely absent, answer "No".
- Use "N/A" ONLY when the question's precondition does not apply (e.g., a conditional question whose triggering condition is not met).
- Evidence must be a direct quote or exact phrase(s) from the discharge summary.
- Justification must briefly explain why the evidence supports the selected answer.
- Do NOT add any content outside the specified JSON structure.


Audit Questions for Demographic Information:
1) Are basic patient demographics (age or date of birth, and sex) documented in the discharge summary?

2) Is a patient identifier (e.g. name, medical record number, or patient identification number) documented, even if de-identified?

3) Is patient contact information (e.g. address or phone number) documented, even if de-identified or blank?

Audit Questions for Important Alerts:
1) Is the patient's allergy status documented (either specific allergies listed, or an explicit statement such as NKDA/NDA/no known allergies)?

2) If specific allergies are listed, are the allergens and their reaction types (e.g. rash, anaphylaxis) documented? Answer "N/A" if the patient is documented as having no
allergies.

3) Are any other clinical alerts documented, such as adverse drug reactions, special risks, or precautions?

--------------------

Output Format (STRICT - valid JSON only):

{
  "D": {
    "1": {
      "answer": "Yes/No/Unclear",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    },
    "2": {
      "answer": "Yes/No/Unclear",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    },
    "3": {
      "answer": "Yes/No/Unclear",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    }
  },
  "I": {
    "1": {
      "answer": "Yes/No/Unclear",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    },
    "2": {
      "answer": "Yes/No/Unclear/N/A",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    },
    "3": {
      "answer": "Yes/No/Unclear",
      "evidence": "Exact quoted text or Not documented",
      "justification": "Brief explanation linking the evidence to the answer"
    }
  }
}

-------------------------
Discharge Summary:
[Patient Summary]

Connect with Centific

Stay ahead of what’s next

Stay ahead

Updates from the frontier of AI data.

Receive updates on platform improvements, new workflows, evaluation capabilities, data quality enhancements, and best practices for enterprise AI teams.

By proceeding, you agree to our Terms of Use and Privacy Policy