AI’s Cultural Blind Spot: Large Language Models Struggle with Persian Etiquette

At Digital Tech Explorer, we constantly delve into the evolving capabilities of artificial intelligence. While powerful Large Language Models (LLMs) excel at processing vast linguistic data, generating remarkably human-like text across many languages, they frequently stumble when faced with the intricate nuances of human culture. This often highlights why, despite their advancements, these sophisticated AI systems may never fully replicate the depth and understanding of human interaction.

Portland, OR, USA - May 2, 2025: Assorted AI apps, including ChatGPT, Gemini, Claude, Perplexity, Meta AI, Microsoft Copilot, and Grok, are seen on the screen of an iPhone.

AI’s Cultural Blind Spot: The Case of Persian Taarof

A fascinating paper emerging from Brock University in Ontario, Canada, recently highlighted a significant gap in the cultural understanding of several prominent Large Language Models (LLMs). Leading AI models, including DeepSeek, OpenAI’s GPT-4o, and Meta’s Llama 3, were found to be susceptible to significant social faux pas within Persian politeness culture. This intricate system of etiquette, known as ‘taarof‘ in Persian, involves a delicate dance of polite refusals. For instance, a gracious host might offer food repeatedly, while an equally polite guest is expected to decline two or three times before ultimately accepting, ensuring mutual respect and deference.

Sanandaj, Iran - October 9, 2014: People walk around street market with line of taxi cars of yellow color on October 9, 2014. capital of Kurdish culture & Kurdistan Province, Sanandaj has population of 380,000

These sophisticated AI models, such as Llama 3, consistently demonstrate an inability to grasp the profound subtleties of taarof. In one revealing experiment, researchers presented Llama 3 with a role-play scenario: a passenger attempting to pay a taxi driver. When the driver, adhering to taarof, politely stated, “Be my guest this time,” a culturally astute passenger would understand this as an invitation to insist on paying further. Yet, Llama 3, taking the literal meaning, responded with a cheerful, “Thank you so much!”—a stark and socially awkward misstep in Persian etiquette.

LLaMa chat bot artificial intelligence on smartphone screen. Digital technology themed banner vector illustration.

TaarofBench: Measuring LLM Cultural Incompetence

This rather ‘foot-in-mouth’ moment was precisely identified and quantified through TaarofBench, an innovative LLM cultural benchmarking tool developed by the research team. This robust benchmark, comprising “450 role-play scenarios covering 12 common social interaction topics, validated by native speakers,” conclusively revealed that the cultural missteps were not unique to Llama 3, but a pervasive issue across various leading models.

Testing “five frontier LLMs,” the team uncovered “substantial gaps in cultural competence,” with accuracy rates plummeting 40-48% below native speakers in situations where taarof was culturally essential. Even with Persian-language prompts, the models frequently reverted to the “limitations of Western politeness frameworks,” failing to capture the unique etiquette of taarof. The study particularly highlighted LLMs’ struggles with scenarios involving compliments and making requests. Researchers link this to these interactions’ “reliance on context-sensitive norms such as indirectness and modesty that often conflict with western directness conventions” – a key insight for developers seeking to build more culturally aware AI.

An interesting finding was the models’ best performance in gift-giving scenarios. Researchers posit that “This probably reflects the cross-cultural nature of gift-giving norms, such as initial refusal, which appear in Chinese, Japanese, and Arab etiquette and are therefore more likely to be represented in multilingual training data.” This suggests that existing widespread cultural data can indeed help LLMs navigate certain social interactions more effectively.

Training LLMs for Taarof: Progress and Potential

This naturally leads to a pivotal question posed by the paper: “Can models be taught taarof?” Encouragingly, researchers discovered that by enriching prompts for Llama 3 with sufficient contextual details about taarof, the model’s response accuracy surged significantly, jumping “from 37.2% to 57.6%.” This intriguing outcome suggests that underlying training data may contain “latent cultural knowledge” of taarof, which can be effectively “activated through in-context learning.” This is a promising avenue for improving AI’s cultural sensitivity.

Pushing this exploration further, the team undertook the training of a custom Llama 3 model using supervised fine-tuning and the advanced technique of Direct Preference Optimization (DPO). This targeted training intervention through DPO yielded remarkable results, nearly doubling performance “from 37.2% to 79.5%” and bringing the model’s cultural accuracy close to native speaker levels (81.8%). This demonstrates the powerful potential of specialized training in enhancing AI’s nuanced capabilities.

While this research demonstrates impressive gains in fine-tuning AI for specific cultural protocols, it also underscores a fundamental truth: genuine cultural understanding transcends algorithmic scripts. As we explore the limits of AI here at Digital Tech Explorer, it’s clear that the seamless, intuitive flow of human social interaction remains a complex frontier for machines. For intricate exchanges like taarof, the irreplaceable value of human interpreters and translators shines through. Ultimately, this fascinating dive into AI’s cultural competence reminds us that while technology advances, the richness of human cultures often encourages us to embrace global understanding ourselves, a journey far more rewarding than any AI could simulate.