← Back ◬ AI & Machine Learning Apr 23, 2026

Emergence Transformer: Dynamical Temporal Attention Matters

arXiv AI Archived Apr 23, 2026 ✓ Full text saved

arXiv:2604.19816v1 Announce Type: new Abstract: The Transformer, a breakthrough architecture in artificial intelligence, owes its success to the attention mechanism, which utilizes long-range interactions in sequential data, enabling the emergent coherence between large language models (LLMs) and data distributions. However, temporal attention, that is, different forms of long-range interactions in temporal sequences, has rarely been explored in emergence phenomenon of complex systems including

Full text archived locally

✦ AI Summary · Claude Sonnet

Computer Science > Artificial Intelligence [Submitted on 18 Apr 2026] Emergence Transformer: Dynamical Temporal Attention Matters Zihan Zhou, Bo-Wei Qin, Kai Du, Wei Lin The Transformer, a breakthrough architecture in artificial intelligence, owes its success to the attention mechanism, which utilizes long-range interactions in sequential data, enabling the emergent coherence between large language models (LLMs) and data distributions. However, temporal attention, that is, different forms of long-range interactions in temporal sequences, has rarely been explored in emergence phenomenon of complex systems including oscillatory coherence in quantum, biophysical, or climate systems. Here, by designing dynamical temporal attention (DTA) with time-varying query, key, and value matrices, we propose an Emergence Transformer. This architecture allows each component to interact with its own or its neighbors' past states through dynamical attention kernels, thereby enabling the promotion and/or suppression of the emergent coherence of components. Interestingly, we uncover that neighbor-DTA consistently promotes oscillatory coherence, whereas self-DTA exhibits an optimal attention weight for coherence enhancement, owing to its non-monotonic dependence on network structure. Practically, we demonstrate how DTA reshapes social coherence, suggesting strategies to either enhance agreement or preserve plurality. We further apply DTA to the paradigmatic Hopfield neural network, achieving emergent continual learning without catastrophic forgetting. Together, these results lay a foundation and provide an immediate paradigm for modulating emergence phenomenon in networked dynamics only using DTA. Subjects: Artificial Intelligence (cs.AI) Cite as: arXiv:2604.19816 [cs.AI] (or arXiv:2604.19816v1 [cs.AI] for this version) https://doi.org/10.48550/arXiv.2604.19816 Focus to learn more Submission history From: Bo-Wei Qin [view email] [v1] Sat, 18 Apr 2026 01:10:44 UTC (5,140 KB) Access Paper: HTML (experimental) view license Current browse context: cs.AI < prev | next > new | recent | 2026-04 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)

💬 Team Notes