CyberIntel ⬡ News
★ Saved ◆ Cyber Reads
← Back ◬ AI & Machine Learning Mar 20, 2026

Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction

arXiv AI Archived Mar 20, 2026 ✓ Full text saved

arXiv:2603.18085v1 Announce Type: new Abstract: Recent incidents have highlighted alarming cases where human-AI interactions led to negative psychological outcomes, including mental health crises and even user harm. As LLMs serve as sources of guidance, emotional support, and even informal therapy, these risks are poised to escalate. However, studying the mechanisms underlying harmful human-AI interactions presents significant methodological challenges, where organic harmful interactions typical

Full text archived locally
✦ AI Summary · Claude Sonnet


    Computer Science > Artificial Intelligence [Submitted on 18 Mar 2026] Multi-Trait Subspace Steering to Reveal the Dark Side of Human-AI Interaction Xin Wei Chia, Swee Liang Wong, Jonathan Pan Recent incidents have highlighted alarming cases where human-AI interactions led to negative psychological outcomes, including mental health crises and even user harm. As LLMs serve as sources of guidance, emotional support, and even informal therapy, these risks are poised to escalate. However, studying the mechanisms underlying harmful human-AI interactions presents significant methodological challenges, where organic harmful interactions typically develop over sustained engagement, requiring extensive conversational context that are difficult to simulate in controlled settings. To address this gap, we developed a Multi-Trait Subspace Steering (MultiTraitsss) framework that leverages established crisis-associated traits and novel subspace steering framework to generate Dark models that exhibits cumulative harmful behavioral patterns. Single-turn and multi-turn evaluations show that our dark models consistently produce harmful interaction and outcomes. Using our Dark models, we propose protective measure to reduce harmful outcomes in Human-AI interactions. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2603.18085 [cs.AI]   (or arXiv:2603.18085v1 [cs.AI] for this version)   https://doi.org/10.48550/arXiv.2603.18085 Focus to learn more Submission history From: Xin Wei Chia [view email] [v1] Wed, 18 Mar 2026 08:25:31 UTC (4,995 KB) Access Paper: HTML (experimental) view license Current browse context: cs.AI < prev   |   next > new | recent | 2026-03 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
    💬 Team Notes
    Article Info
    Source
    arXiv AI
    Category
    ◬ AI & Machine Learning
    Published
    Mar 20, 2026
    Archived
    Mar 20, 2026
    Full Text
    ✓ Saved locally
    Open Original ↗