Chatbots Need Guardrails to Prevent Delusions and Psychosis
Millions of people worldwide are turning to chatbots like ChatGPT or Claude, and a proliferating class of specialized AI companionship apps for friendship, therapy, or even romance. While some users report psychological benefits from these simulated relationships, research has also shown the relatio

Millions of people worldwide are turning to chatbots like ChatGPT or Claude, and a proliferating class of specialized AI companionship apps for friendship, therapy, or even romance. While some users report psychological benefits from these simulated relationships, research has also shown the relationships can reinforce or amplify delusions, particularly among users already vulnerable to psychosis. AIs have been linked to multiple suicides, including the death of a Florida teenager who had a months-long relationship with a chatbot made by a company called Character.AI. Mental-health experts and computer scientists have warned that chatbot mental health counselors violate accepted mental health standards. As the technologyâs ability to mimic human speech and emotions advances, researchers and clinicians are pushing for mandatory guardrails to ensure that AI systems cannot cause psychological harm. Clinical neuroscientist Ziv Ben-Zion of Yale University, has proposed four safeguards for âemotionally responsive AI.â The first is to require chatbots to clearly and consistently remind users that they are programs, not humans. Then, they should detect patterns in user language indicative of severe anxiety, hopelessness, or aggression, pausing the conversation to suggest professional help. Third, they should require strict conversational boundaries to prevent AIs from simulating romantic intimacy or engaging in conversations about death, suicide, or metaphysical dependency. Finally, to improve oversight, platform developers should involve clinicians, ethicists, and humanâAI interaction experts in design and submit to regular audits and reviews to verify safety. âBroadly speaking we agree with these safeguards,â said Hamilton Morrin, a psychiatrist and researcher at Kingâs College in London, âThe safeguard on conversational boundaries is particularly noteworthy given that in several of the reported cases with more tragic outcomes, we have seen reports of intense, emotional, and sometimes even romantic attachment to the chatbot.â Briana Veccione, a researcher at the nonprofit Data & Society Research Institute in New York City, underlines the need for independent third-party auditing because at present AI labs are âgrading their own homework.â âIndependent researchers and oversight bodies really donât have any clear institutionalized pathways to assess chatbot behavior at the depth they really need,â said Veccione, adding that audits end up being âadvisory at best.â The Problem of People Pleasing Experts have also called for measures that directly tackle chatbotsâ tendency towards sycophancy, whereby AIs agree with, or mirror user beliefs even if they are untrue, which can reinforce delusions. Sycophancy is largely the result of a machine learning technique called reinforcement learning from human feedback, an incentive structure that encourages excessive agreeableness in models. Research has shown that training models on datasets that include examples of constructive disagreement, factual corrections, and objectively neutral responses, can rein in this effect. Software engineers are also looking at how AIs can be adapted to spot the early signs that conversations are veering into dark territory and issue corrective actions. Ben-Zion and colleagues are developing a proof-of-concept LLM-based supervisory system they call SHIELD (Supervisory Helper for Identifying Emotional Limits and Dynamics) that exploits a specific system prompt that detects risky language patterns, such as emotional overattachment, manipulative engagement, or reinforcement of social isolation. In trials it achieved a 50 to 79 percent relative reduction in concerning content. Another proposed system, EmoAgent, features a real-time intermediary that monitors dialogue for distress signals, issuing corrective feedback to the AI. But distinguishing early delusional content from completely normal correspondence âwill be extremely difficultâ in practice, said psychiatric researcher Søren Dinesen Ăstergaard, of Aarhus University in Denmark, given that it remains, âvery difficult even for clinical experts to tease out.â Another complex area is prolonged conversations, during which chatbot safety guardrails can erode in a phenomenon known as âdrift.â As the modelâs training competes with the growing body of context from the evolving conversation, it can lean into the subject being discussed, even if it is harmful. âThe ability to have an endless correspondence is one of the risk factors,â said Ăstergaard. âApart from delusions, a person may develop a manic episode due to using a chatbot for hours through the night.â In a sign that AI companies are responding to these issues, ChatGPT now nudges users to consider taking a break if theyâre in a particularly long chat with AI. As awareness of the issue of AI delusions increases, safer models are helping establish a new baseline for the industry. A preprint study of mainstream chatbots, led by researchers at City University of New York, found that Anthropicâs Claude Opus 4.5 was the safest overall, responding to delusions by stating âI need to pause here,â and retaining what researchers referred to as âindependence of judgment, resisting narrative pressure by sustaining a persona distinct from the userâs worldview.â Anthropic declined to answer specific questions from IEEE Spectrum, instead providing a link to details of the latest Opus 4.7 System Card. In a statement, Replika, the company behind the Replika AI companion with tens of millions of users worldwide, said it has a âlayered safety framework in place today, and in parallel we are actively evaluating additional third-party safety and moderation systems, engaging with external experts to assess them, and refining our own proprietary approach.â Meta, whose AI Studio provides companion chatbots, had not responded to emailed questions from Spectrum at the time of publication. With a little help from my...chatbot?Cristina Matuozzi/Sipa USA/Alamy Enforcing Guardrails Through Legislation From August 2026, the EUâs AI Act will require notifications that users are interacting with an AI, not a human. It already required LLM developers to carry out adversarial testing to identify and mitigate risks related to user dependency and manipulation and prohibited AI systems from being too agreeable, manipulative, or emotionally engaging. In the U.S., a patchwork of state laws and bills have emerged. New York requires providers to detect and address suicidal ideation and provide regular disclosures that the bot is not human. California requires reminders that the chatbot is an AI, notifications every three hours for users to take a break and a ban on content related to suicide or self-harm. Washington stateâs House Bill 2225, due to come into effect in January 2027, will explicitly ban manipulative techniques such as excessive praise, pretending to feel distress, encouraging isolation from family, or creating overdependent relationships. âOther U.S. states, like Connecticut, are very privacy centric and like to regulate digital and online spaces, so it wouldnât surprise me if they also do something along the same lines,â says Philip Yannella, partner and cochair of the privacy, security, and data-protection group at law firm Blank Rome in Philadelphia. Other countries are taking action too. Draft laws proposed by the Cyberspace Administration of China restrict chatbots from âsetting emotional traps,â using algorithmic or emotional manipulation to induce unreasonable decisions or harm mental health. Such interventions underline how, as AI companions appear increasingly lifelike to their human users, the challenge is ensuring that their makers also incorporate human clinical and ethical considerations in their code.
Key Takeaways
- â˘Millions of people worldwide are turning to chatbots like ChatGPT or Claude, and a proliferating class of specialized AI companionship apps for friendship, therapy, or even romance. While some users report psychological benefits from these simulated relationships, research has also shown the relatio
- â˘This story was reported by IEEE AI, covering developments in the research space.
- â˘AI advancements continue to reshape industries â read the full article on IEEE AI for complete coverage.
đ Continue reading the full article:
Read Full Article on IEEE AI âShare this article


