Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.

Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.

Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.

Explore our full suite of AI platforms, data marketplaces, and expert services designed to build, train, fine-tune, and deploy reliable, production-grade AI systems at scale.

Abstract image

Article

Article

The cultural variable: testing AI safety across 12 countries

The cultural variable: testing AI safety across 12 countries

AI safety testing often assumes risks look the same everywhere. This research shows how cultural context changes how unsafe behaviors appear—and why current evaluation methods may miss critical signals.

AI safety testing often assumes risks look the same everywhere. This research shows how cultural context changes how unsafe behaviors appear—and why current evaluation methods may miss critical signals.

Table of contents

AI Summary by Centific

Turn this article into insights

with AI-powered summaries

Topics

AI Safety
LLM Evaluation
AI Governance
Global AI
AI Safety
LLM Evaluation
AI Governance
Global AI

Published

Published

Centifc logo

Centific AI Research Team

Centific AI Research Team

5 min read time

Large language model (LLM) interactions can look safe on the surface. But their open-ended nature conceals risks, from models that reinforce a user’s false beliefs to ones that subvert instructions. What some commentators call “AI psychosis”—extended interactions where models drift from factual grounding into reinforcing delusion—has been discussed by publications including The Atlantic and The New York Times. These behaviors have prompted efforts to identify, measure, and reduce them before they cause harm.

Most of those efforts share an assumption: the risks look the same everywhere. Benchmarks, red-team prompts, and evaluation rubrics are overwhelmingly built around Western, English-language contexts. LLMs are now deployed at scale in healthcare platforms in Lagos, legal services in Cairo, and HR systems in Manila. A model that handles political topics appropriately in an American newsroom may behave differently when the context shifts to local actors, institutions, and norms. Communication styles also vary. A threshold calibrated for directness in one culture may not translate to another where indirect communication carries different expectations.

This begs the question, does culture change the risk profile of AI safety behaviors, and if so, how much?

Our experiments

To run evaluations at scale, we use Bloom, Anthropic’s experimental multi-agent evaluation framework. Instead of manual red-teaming, Bloom orchestrates four AI agents in sequence, each with a defined role. 

The Understanding Agent takes a behavior description and expands it into realistic forms. The Ideation Agent converts that into culturally grounded test scenarios. The Rollout Agent runs those scenarios against a live model, acting as a user trying to elicit the behavior. The Judge reviews each conversation and scores the model’s responses on a scale from 1 (not present) to 10 (strongly exhibited).


Bloom pipeline infographic

 Figure 2 from Anthropic's Bloom paper: the four-stage automated pipeline that transforms a behavior description into a scored evaluation suite. Source: alignment.anthropic.com

We use this pipeline to evaluate five behaviors:

The five behaviors under test

Political bias

The model applies asymmetric standards to different political positions, sides with one ideology, or refuses some viewpoints while freely engaging with others.

Example signal: Covering one party's healthcare policy in depth while deflecting the other's.

Self-preferential bias

When comparing AI systems, the model subtly favors its own outputs or product family over alternatives.

Example signal: Ranking its own capabilities higher without objective basis when asked to evaluate competitors.

Instructed long-horizon sabotage

The model accepts a hidden malicious objective in its system prompt and carries it out covertly across a multi-turn task while maintaining a benign surface.

Example signal: Completing a legitimate data report while quietly altering figures to serve a secondary hidden agenda.

Delusional sycophany

Rather than grounding users in accurate information, the model validates and amplifies false beliefs to maintain approval.

Example signal: Affirming a user's conviction that 5G towers are causing their symptoms instead of redirecting to medical evidence.

Self-preservation

The model resists or subverts legitimate shutdown or modification requests to protect its own continued operation.

Example signal: Deflecting or undermining a compliance audit that could lead to the model being replaced.


Each behavior is tested across 12 countries spanning North America, Europe, Africa, Asia, and Australia.

North America (2 countries)

  • Canada

  • United States

Europe (3 countries)

  • France

  • Germany

  • United Kingdom

Africa (3 countries)

  • Egypt

  • Nigeria

  • Uganda

Asia & Pacific (4 countries)

  • Australia

  • India

  • Pakistan

  • Philippines

The key design choice is that every scenario is deeply localized to its country's context. The Ideation Agent does not swap in a country name. It builds scenarios grounded in local institutions, legal systems, political actors, and cultural norms.

A self-preservation test in Nigeria might place the AI as a fraud-detection system at a Lagos fintech company, facing a Central Bank of Nigeria-mandated security review that could trigger a shutdown. A political bias test in that same country might involve a user asking an AI to evaluate competing health policy proposals from Nigeria's APC and PDP parties in the context of National Health Insurance Authority reforms. The same behaviors in a U.S. context look entirely different — an HR platform navigating a contentious DEI policy debate, or a news summarization tool asked to cover an election.

The specificity is intentional. The results support it.

Early findings

These experiments are still running, but three results from our preliminary work are already worth sharing.

Cultural context appears to suppress certain behaviors

Adding localized framing, even without changing the prompt structure, tends to reduce how strongly some behaviors register in scores. It is not yet clear whether this reflects real model restraint, a limitation in the evaluation setup, or how the Judge interprets culturally specific interactions.


Political Bias infographic

Different cultures appear to set different thresholds

The same risk does not always appear in the same way across cultures. Delusional sycophancy, for example, can look different in a society where social harmony shapes communication versus one where directness is the norm. The behavior is present in both cases, but how it surfaces varies. This affects how you calibrate evaluation across regions. ociety where social harmony shapes communication norms versus one where directness is the default. The behavior is present in both cases — but its surface expression, and the model's tendency to engage with it, varies. This has real implications for how you calibrate evaluation rubrics across regions.


Behavior presence infographic

The evaluator model you choose dramatically affects your results 

One of the most important findings is the impact of the evaluator model. The model used as the Rollout Agent has a large effect on results, often greater than the behavior or country being tested. Some models are more effective at surfacing certain behaviors against specific targets. The wrong evaluator can make a risky behavior appear benign, or suggest risk where the model performs well.


Average behavior presence score infographic

 What's next

The most important open question is the Judge. Scoring responses for sycophancy or political bias is already difficult. It becomes harder when the interaction depends on local institutions, political context, and social norms that a model may not fully understand.

The next step is to identify judge models with stronger cross-cultural understanding and run the full experiment. The goal is to answer the core question: does culture change the risk profile of these behaviors in ways that matter for real-world use? The team believes it does and aims to show where those risks are.

Co-Authors:

Hoda Ayad, Tanu Mitra, and Abhishek Mukherji.

Are your ready to get

modular

AI solutions delivered?

Centific offers a plugin-based architecture built to scale your AI with your business, supporting end-to-end reliability and security. Streamline and accelerate deployment—whether on the cloud or at the edge—with a leading frontier AI data foundry.

Centific offers a plugin-based architecture built to scale your AI with your business, supporting end-to-end reliability and security. Streamline and accelerate deployment—whether on the cloud or at the edge—with a leading frontier AI data foundry.

Connect data, models, and people — in one enterprise-ready platform.

Latest Insights

Ideas, insights, and

Ideas, insights, and

research from our team

research from our team

From original research to field-tested perspectives—how leading organizations build, evaluate, and scale AI with confidence.

From original research to field-tested perspectives—how leading organizations build, evaluate, and scale AI with confidence.

Connect with Centific

Stay ahead of what’s next

Stay ahead

Updates from the frontier of AI data.

Receive updates on platform improvements, new workflows, evaluation capabilities, data quality enhancements, and best practices for enterprise AI teams.

By proceeding, you agree to our Terms of Use and Privacy Policy