AttackonCTF: Defending Hardware Security Competition Benchmarks in the Age of LLMs
arXiv SecurityArchived Jun 16, 2026✓ Full text saved
arXiv:2606.15809v1 Announce Type: new Abstract: Hardware security competitions such as HackTheSilicon serve as benchmarking platforms for evaluating vulnerability detection methods and for training humans and AI. However, our study reveals that LLMs threaten their validity. Instead of genuine security reasoning, detectors exploit a diff-style syntactic comparison, achieving an 83% detection rate, undermining fair evaluation. To mitigate this, we propose the first LLM-oriented, semantics-preservi
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 14 Jun 2026]
AttackonCTF: Defending Hardware Security Competition Benchmarks in the Age of LLMs
Mohamadreza Rostami, Nikhilesh Singh, Stephen Muttathil, Lichao Wu, Chen Chen, Huimin Li, Jeyavijayan Rajendran, Ahmad-Reza Sadeghi
Hardware security competitions such as HackTheSilicon serve as benchmarking platforms for evaluating vulnerability detection methods and for training humans and AI. However, our study reveals that LLMs threaten their validity. Instead of genuine security reasoning, detectors exploit a diff-style syntactic comparison, achieving an 83% detection rate, undermining fair evaluation. To mitigate this, we propose the first LLM-oriented, semantics-preserving obfuscation framework for these benchmarks. Unlike IP-protection approaches, it applies human-readable transformations and controlled diff-noise while preserving functionality. On HackTheSilicon, the framework reduces LLM-based detection accuracy by 50% with only 10% obfuscation and by 78.6% under complete obfuscation, restoring benchmark reliability.
Subjects: Cryptography and Security (cs.CR)
Cite as: arXiv:2606.15809 [cs.CR]
(or arXiv:2606.15809v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2606.15809
Focus to learn more
Submission history
From: Mohamadreza Rostami [view email]
[v1] Sun, 14 Jun 2026 13:22:27 UTC (2,435 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-06
Change to browse by:
cs
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)