CyberIntel ⬡ News
★ Saved ◆ Cyber Reads
← Back ◬ AI & Machine Learning Mar 31, 2026

CounterMoral: Editing Morals in Language Models

arXiv AI Archived Mar 31, 2026 ✓ Full text saved

arXiv:2603.27338v1 Announce Type: new Abstract: Recent advancements in language model technology have significantly enhanced the ability to edit factual information. Yet, the modification of moral judgments, a crucial aspect of aligning models with human values, has garnered less attention. In this work, we introduce CounterMoral, a benchmark dataset crafted to assess how well current model editing techniques modify moral judgments across diverse ethical frameworks. We apply various editing tech

Full text archived locally
✦ AI Summary · Claude Sonnet


    Computer Science > Artificial Intelligence [Submitted on 28 Mar 2026] CounterMoral: Editing Morals in Language Models Michael Ripa, Jim Davies Recent advancements in language model technology have significantly enhanced the ability to edit factual information. Yet, the modification of moral judgments, a crucial aspect of aligning models with human values, has garnered less attention. In this work, we introduce CounterMoral, a benchmark dataset crafted to assess how well current model editing techniques modify moral judgments across diverse ethical frameworks. We apply various editing techniques to multiple language models and evaluate their performance. Our findings contribute to the evaluation of language models designed to be ethical. Comments: 7 pages (10 + 1 reference + 6 appendix). Honors thesis completed in June 2024, write-up completed in 2025 Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2603.27338 [cs.AI]   (or arXiv:2603.27338v1 [cs.AI] for this version)   https://doi.org/10.48550/arXiv.2603.27338 Focus to learn more Submission history From: Michael Ripa [view email] [v1] Sat, 28 Mar 2026 17:13:30 UTC (192 KB) Access Paper: HTML (experimental) view license Current browse context: cs.AI < prev   |   next > new | recent | 2026-03 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
    💬 Team Notes
    Article Info
    Source
    arXiv AI
    Category
    ◬ AI & Machine Learning
    Published
    Mar 31, 2026
    Archived
    Mar 31, 2026
    Full Text
    ✓ Saved locally
    Open Original ↗