CyberIntel ⬡ News
★ Saved ◆ Cyber Reads
← Back ◬ AI & Machine Learning Apr 22, 2026

SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution

arXiv AI Archived Apr 22, 2026 ✓ Full text saved

arXiv:2604.18982v1 Announce Type: new Abstract: Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such agents via reinforcement learning requires solving the credit assignment problem: determining how individual utterances contribute to multi-turn dialogue outcomes. Existing approaches directly employ language models to distribute episode-level rewards, yielding attributions that are retrospective and l

Full text archived locally
✦ AI Summary · Claude Sonnet


    Computer Science > Artificial Intelligence [Submitted on 21 Apr 2026] SAVOIR: Learning Social Savoir-Faire via Shapley-based Reward Attribution Xiachong Feng, Yi Jiang, Xiaocheng Feng, Deyi Yin, Libo Qin, Yangfan Ye, Lei Huang, Weitao Ma, Yuxuan Gu, Chonghan Qin, Bing Qin, Lingpeng Kong Social intelligence, the ability to navigate complex interpersonal interactions, presents a fundamental challenge for language agents. Training such agents via reinforcement learning requires solving the credit assignment problem: determining how individual utterances contribute to multi-turn dialogue outcomes. Existing approaches directly employ language models to distribute episode-level rewards, yielding attributions that are retrospective and lack theoretical grounding. We propose SAVOIR (ShApley Value fOr SocIal RL), a novel principled framework grounded in cooperative game theory. Our approach combines two complementary principles: expected utility shifts evaluation from retrospective attribution to prospective valuation, capturing an utterance's strategic potential for enabling favorable future trajectories; Shapley values ensure fair credit distribution with axiomatic guarantees of efficiency, symmetry, and marginality. Experiments on the SOTOPIA benchmark demonstrate that SAVOIR achieves new state-of-the-art performance across all evaluation settings, with our 7B model matching or exceeding proprietary models including GPT-4o and Claude-3.5-Sonnet. Notably, even large reasoning models consistently underperform, suggesting social intelligence requires qualitatively different capabilities than analytical reasoning. Comments: ACL 2026 Findings Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2604.18982 [cs.AI]   (or arXiv:2604.18982v1 [cs.AI] for this version)   https://doi.org/10.48550/arXiv.2604.18982 Focus to learn more Submission history From: Xiachong Feng [view email] [v1] Tue, 21 Apr 2026 02:08:25 UTC (101 KB) Access Paper: HTML (experimental) view license Current browse context: cs.AI < prev   |   next > new | recent | 2026-04 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
    💬 Team Notes
    Article Info
    Source
    arXiv AI
    Category
    ◬ AI & Machine Learning
    Published
    Apr 22, 2026
    Archived
    Apr 22, 2026
    Full Text
    ✓ Saved locally
    Open Original ↗