CyberIntel ⬡ News
★ Saved ◆ Cyber Reads
← Back ◬ AI & Machine Learning May 22, 2026

Parser-Free Querying of Security Logs

arXiv Security Archived May 22, 2026 ✓ Full text saved

arXiv:2605.22027v1 Announce Type: new Abstract: Security analysts routinely query system logs to detect threats and investigate incidents, but each log source uses its own semi-structured format: logs are cheap to produce, but expensive to use. The standard approach, building per-source parsers to normalize logs into structured schemas, is powerful but requires continuous engineering effort for each new format. Querying raw logs directly with tools like grep avoids this cost, but requires analys

Full text archived locally
✦ AI Summary · Claude Sonnet


    Computer Science > Cryptography and Security [Submitted on 21 May 2026] Parser-Free Querying of Security Logs Evan Luo, Julien Piet, David Wagner Security analysts routinely query system logs to detect threats and investigate incidents, but each log source uses its own semi-structured format: logs are cheap to produce, but expensive to use. The standard approach, building per-source parsers to normalize logs into structured schemas, is powerful but requires continuous engineering effort for each new format. Querying raw logs directly with tools like grep avoids this cost, but requires analysts to know each source's message variants and cannot express the multi-line temporal queries that security investigations demand. We present Sieve, a system that generates executable query code from natural-language security questions by grounding a large language model with lightweight, automatically extracted log-format context, requiring only one LLM call per query followed by deterministic execution. Evaluating 133 security queries across 5 log types, we find that Sieve achieves over a 3x reduction in error rate on complex temporal and cross-event queries compared to manual analyst scripting, with the largest gains on the multi-line correlation tasks most critical to active investigations. Our results and benchmark provide evidence that LLM-generated code can bridge the gap between the expressiveness of structured log querying and the immediacy of working directly with raw files. Comments: 21 pages, 5 figures, 15 tables Subjects: Cryptography and Security (cs.CR) Cite as: arXiv:2605.22027 [cs.CR]   (or arXiv:2605.22027v1 [cs.CR] for this version)   https://doi.org/10.48550/arXiv.2605.22027 Focus to learn more Submission history From: Evan Luo [view email] [v1] Thu, 21 May 2026 05:58:04 UTC (100 KB) Access Paper: HTML (experimental) view license Current browse context: cs.CR < prev   |   next > new | recent | 2026-05 Change to browse by: cs References & Citations NASA ADS Google Scholar Semantic Scholar Export BibTeX Citation Bookmark Bibliographic Tools Bibliographic and Citation Tools Bibliographic Explorer Toggle Bibliographic Explorer (What is the Explorer?) Connected Papers Toggle Connected Papers (What is Connected Papers?) Litmaps Toggle Litmaps (What is Litmaps?) scite.ai Toggle scite Smart Citations (What are Smart Citations?) Code, Data, Media Demos Related Papers About arXivLabs Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)
    💬 Team Notes
    Article Info
    Source
    arXiv Security
    Category
    ◬ AI & Machine Learning
    Published
    May 22, 2026
    Archived
    May 22, 2026
    Full Text
    ✓ Saved locally
    Open Original ↗