Detecting Bot Detection: Prevalence, Techniques, and Implications for Web Measurement Research
arXiv SecurityArchived Jun 15, 2026✓ Full text saved
arXiv:2606.14525v1 Announce Type: new Abstract: Browser automation frameworks are essential tools for security and privacy research on the web, yet bot detection scripts increasingly probe their artifacts, threatening measurement validity as automated browsers may be blocked or served different content. Prior work measures detection deployment, while we measure blocking-induced sample loss. Through a literature survey of top-tier security, privacy, and web measurement venues, we find that 83% of
Full text archived locally
✦ AI Summary· Claude Sonnet
Computer Science > Cryptography and Security
[Submitted on 12 Jun 2026]
Detecting Bot Detection: Prevalence, Techniques, and Implications for Web Measurement Research
Ralf Gundelach, Michael Mühlhauser, Dominik Herrmann
Browser automation frameworks are essential tools for security and privacy research on the web, yet bot detection scripts increasingly probe their artifacts, threatening measurement validity as automated browsers may be blocked or served different content. Prior work measures detection deployment, while we measure blocking-induced sample loss. Through a literature survey of top-tier security, privacy, and web measurement venues, we find that 83% of papers omit any discussion of bot detection blocking. To address this gap, we conduct a measurement study of 10,000 websites across four browser configurations (40K page visits in total) to quantify detection prevalence and employed techniques. Using custom instrumentation to detect when sites probe for automation, we develop a taxonomy of bot detection techniques and measure how often they appear in practice. Chromium headless encounters a 15% soft block rate compared to 7% for other configurations. Across all conditions, 82% of blocks are attributable to bot detection (59% vendor-confirmed, 23% inferred from condition-dependent blocking), predominantly by providers with integrated bot detection such as Cloudflare (37% block rate) and Akamai (26%). A header spoofing experiment establishes that 75% of Chromium-headless-only blocks are caused by header-level signals alone, yet JavaScript-based environment probing is more extensive than current blocking rates suggest. These findings demonstrate that bot detection creates systematic, provider-correlated sample loss that the web measurement community neither measures nor reports. The downstream effect on specific measurement outcomes remains future work.
Subjects: Cryptography and Security (cs.CR); Computers and Society (cs.CY)
Cite as: arXiv:2606.14525 [cs.CR]
(or arXiv:2606.14525v1 [cs.CR] for this version)
https://doi.org/10.48550/arXiv.2606.14525
Focus to learn more
Submission history
From: Ralf Gundelach [view email]
[v1] Fri, 12 Jun 2026 14:59:11 UTC (49 KB)
Access Paper:
HTML (experimental)
view license
Current browse context:
cs.CR
< prev | next >
new | recent | 2026-06
Change to browse by:
cs
cs.CY
References & Citations
NASA ADS
Google Scholar
Semantic Scholar
Export BibTeX Citation
Bookmark
Bibliographic Tools
Bibliographic and Citation Tools
Bibliographic Explorer Toggle
Bibliographic Explorer (What is the Explorer?)
Connected Papers Toggle
Connected Papers (What is Connected Papers?)
Litmaps Toggle
Litmaps (What is Litmaps?)
scite.ai Toggle
scite Smart Citations (What are Smart Citations?)
Code, Data, Media
Demos
Related Papers
About arXivLabs
Which authors of this paper are endorsers? | Disable MathJax (What is MathJax?)