Forensic FocusArchived May 11, 2026✓ Full text saved
Email investigations are evolving fast—Eric Fookes of Fookes Software discusses AI, cloud evidence, privacy, and the future of forensic email analysis.
Full text archived locally
✦ AI Summary· Claude Sonnet
Eric Fookes is the founder and CEO of Fookes Software, a Swiss company with a long-standing reputation for developing high-quality software for email forensics, eDiscovery, and data processing. Best known for Aid4Mail, its technology is used by law firms, government agencies, Fortune 500 companies, and major enterprise platforms. In this interview with Forensic Focus, Eric discusses the challenges of email investigations and how the field is evolving.
Tell us about your background and why you decided to launch Fookes Software.
I came to software development through an unusual route. My academic background is in Earth Sciences—I completed my MSc in Carbonate Sedimentology at the University of Geneva in 1991, and during my graduate research I discovered a new genus and species of foraminifer, Troglotella incrustans. What made it remarkable was its boring capability—the ability to drill into the substrate, which had not been documented in any other foraminifer. I co-authored the formal description with Professor Wernli, the paleontologist on the project; it remains a discovery I am particularly proud of. In parallel, I had been writing software as a tool for my research since 1989, starting with a DOS program that modeled the effects of sea-level change on sedimentary deposits. That was my first commercial product.
Fookes Software came together in 1996, initially as a way to distribute NoteTab, a text editor I built while learning Delphi. It took off faster than I expected—NoteTab Pro won PC Magazine’s Shareware Award in 1998—and the business grew from there.
Aid4Mail has a more personal origin. In 1999, I was drowning in my own email and needed a way to search, filter, and organize it. I built Mailbag Assistant to solve my own problem. By 2002, a digital forensics company had approached me about building a forensic variant, which became E-mail Examiner. Aid4Mail followed in 2005, and from there the focus shifted steadily toward digital forensics and eDiscovery—an area where our accuracy, recovery, and format-fidelity requirements matched what the market was asking for.
The through-line across all of this is that I build tools to solve real problems. Geology taught me to follow the evidence carefully; that discipline carries directly into forensic software.
Get The Latest DFIR News
Join the Forensic Focus newsletter for the best DFIR articles in your inbox every month.
Unsubscribe any time. We respect your privacy - read our privacy policy.
How has the email forensics and eDiscovery landscape changed since Aid4Mail was first released in 2005?
When we released Aid4Mail in 2005, email investigations were almost entirely a local-storage exercise: PST files from Exchange and Outlook, mbox files from Netscape Mail and Eudora, and various other desktop client formats. You seized a laptop, you imaged the mail store, and you processed it. That is no longer the default in many matters.
Three shifts define the period. First, email moved to the cloud. The early shift was IMAP-based webmail (Yahoo, Hotmail, the original Gmail), and Microsoft 365 and Google Workspace followed years later as full enterprise platforms. Collection now means APIs, OAuth, throttling, and tenant-wide access strategies. Second, attachments moved out of the message body and into shared drives. A 2005-era email carried its evidence with it; a 2026-era email often carries a hyperlink to a OneDrive or Google Drive document that may have been edited dozens of times since it was shared. Third, the regulatory and privacy landscape has tightened substantially, with the EU’s GDPR, Switzerland’s nFADP, South Korea’s PIPA, and data residency requirements that increasingly shape where evidence can be processed.
Volume has grown by orders of magnitude, and the range of evidence formats has expanded significantly. At the same time, the tooling has matured: EDRM has standardized vocabulary and workflows, EDRM MIH now helps enable cross-platform deduplication, and LLMs have, in the last eighteen months, opened up capabilities that did not exist before. The bar for what counts as a defensible email investigation has risen—and it continues to rise.
What makes email such a challenging source of evidence and how does Aid4Mail help during investigations?
Email looks simple from the outside. A sender, a recipient, a subject, a body, perhaps an attachment. The difficulty is that almost every part of that picture hides complexity.
Format heterogeneity is the first issue. A single investigation might involve PST and OST files, Apple Mail’s EMLX, mbox archives from Google Takeout or Thunderbird, OLM from Outlook for Mac, raw EML and MSG files, Maildir directories, and live IMAP, Microsoft Graph, or Google API connections. Each has its quirks, and many tools handle some formats well and others poorly or not at all.
Second, a substantial portion of what reaches a mailbox is noise. In one internal test we ran on a 9,427-email business mailbox, the non-personal email filter identified 8,597 automated or bulk messages—over 91% of the inbox before any keyword or AI filter was applied. Investigators who ingest everything and then filter are carrying that dead weight through every downstream stage.
Third, the evidence you most need is often the evidence that is hardest to reach: double-deleted messages in Exchange Recoverable Items, corrupted mbox files, cloud-hosted attachments that no longer exist at the URL in the email, messages carved from unallocated disk space. Mbox corruption deserves particular mention—it happens easily, often through abrupt application closures, partial syncs, or filesystem errors, and is far more common in casework than people realize.
Aid4Mail addresses these issues in sequence. Native pre-acquisition filtering against Microsoft 365, Gmail, and IMAP typically reduces collection volume by around 90%. Cloud attachment collection pulls the actual documents linked from emails, with revision matching and full metadata. Recovery routines reach unpurged mail in IMAP, Exchange, PST, and OST, and MIME carving extracts messages from disk images and damaged files. AI classification then triages what remains, so the investigator’s time goes to the material that actually matters.
What are the biggest mistakes organizations make when handling email evidence?
The most common mistake, by a wide margin, is over-collection. Teams acquire entire mailboxes and entire archives because it feels safer, then try to filter downstream. That approach inflates hosting costs, lengthens review, and—crucially—pulls more personal data into scope than may be proportionate or defensible under GDPR-style data-minimization regimes. Pre-acquisition filtering at the source is almost always a better answer.
A second mistake is over-reliance on keyword search. The Blair & Maron study from 1985 remains the definitive cautionary tale: experienced attorneys believed they had achieved at least 75% recall on a document collection, and the actual figure was 20%. Four decades later, keyword search in typical production use still misses 60–80% of responsive documents, largely because people use different words for the same thing. Informal, coded, or euphemistic language—exactly what matters in insider-threat and corruption investigations—evades keyword detection consistently.
A third mistake is treating cloud attachments as if they were the hyperlinks in the email body. They’re not. The evidence is not merely the URL. It is the linked document itself, ideally with the version and metadata that show what was available when the email was sent. Without that, the collection is incomplete, and that incompleteness is usually discovered at the worst possible moment.
Finally, many organizations underestimate the forensic challenge of recovering deleted and corrupted email. Standard processing skips past what it can’t read. That’s understandable, but it’s also where investigations frequently find their most significant material.
Aid4Mail 6 introduces AI integration. What opportunities does AI create in email forensics, and where should practitioners remain cautious?
AI has changed the economics of email review. Traditional TAR systems require training—hundreds or thousands of manually reviewed documents to teach the model what matters in a specific matter. LLMs skip that step entirely. You write a natural-language prompt, and the model can begin classifying immediately, reasoning about content and context rather than matching patterns.
The accuracy is genuinely strong. In our April 2026 benchmark across 32 models, the ten retained candidates all exceeded 94% F1 on a binary insider-threat classification task, every one achieved at least 99% recall, and the top four clustered between 97.6% and 99.6% F1 across cloud and offline deployments. The full methodology and results are documented in our AI Classification Benchmark Report. For context, the best-evidenced ceiling for TAR 2.0 (Continuous Active Learning) is roughly 96% precision and 96% recall—so on the benchmark task, the top AI models matched or exceeded that ceiling without a per-matter training signal.
Practitioners should remain cautious on several fronts. Prompt–model fit is a real variable: we have seen cases where a model that excels at multi-category classification struggles with binary responsiveness, and vice versa. Low-prevalence corpora depress absolute precision through base-rate effects regardless of method. And cloud model behavior can shift silently between versions, which is why we recommend pinning a specific model—or using offline models—for long-running matters where reproducibility is important.
Hallucination is often the first concern people raise. In Aid4Mail’s case, the risk is materially reduced by design: the model is constrained to a fixed set of labels, it is evaluating specific email content rather than generating open-ended text, and every output traces back to verifiable input. Classification errors we have observed are genuine classification errors, correctable through prompt refinement, not fabrication.
AI is a force multiplier for investigative judgment, not a substitute for it. That framing has held up in our testing and in the investigative workflows we have seen.
How does Aid4Mail fit into workflows alongside platforms like EnCase, AXIOM, or Relativity rather than replacing them?
This is a question we spend a lot of time on, because it is central to how most practitioners actually use Aid4Mail. The short answer: Aid4Mail is a specialist email tool that sits alongside general-purpose forensic and review platforms. It complements them; it does not replace them.
General-purpose platforms are built to manage cases across many evidence types—disk images, mobile extractions, chat logs, email, documents, artifacts. They do that well. Where Aid4Mail usually adds value is in the email-specific work that happens before ingestion: cloud mailbox collection with server-side filters, collection of cloud-hosted documents linked from emails, recovery of unpurged or corrupted messages, and early culling before data reaches the primary platform.
A typical workflow looks like this. The investigator connects Aid4Mail to the custodian’s Microsoft 365 account—often through App-Only Access for tenant-wide collection—applies a pre-acquisition filter, and cuts the collection volume by around 90% at the source. Cloud attachments are pulled with their metadata and, where relevant, the specific revision that existed when the email was sent (or the latest update, depending on your Aid4Mail settings). AI classification runs over what remains, optionally including linked attachments. The refined dataset is then exported—to PST, to a load file in Concordance DAT/OPT format, to searchable PDF with Bates stamping, or whatever the primary platform expects—and handed off.
Everything after that happens in the primary platform: correlation with other evidence, large-scale review workflows, production. Aid4Mail’s job ends at the handoff.
A couple of downstream details matter for interoperability. EDRM MIH support means deduplication can be maintained across platforms without re-ingesting data. And the free Aid4Mail Email Viewer—portable, zero installation, runs in a browser from a USB or network drive—lets investigators share curated sets with outside counsel, regulators, or prosecutors who do not have a license for the primary review platform.
The point is not that Aid4Mail is better than any of these platforms. It is that email has specific characteristics, and treating email specialist work as a discrete stage in the workflow tends to save time, reduce cost, and produce cleaner data for everything that comes after.
How does Aid4Mail’s approach to data processing address tightening data privacy regulations?
Privacy regulation has been one of the strongest tailwinds for how we build Aid4Mail. GDPR established data minimization and purpose limitation as baseline principles; Switzerland’s revised nFADP, which aligns closely with GDPR, applies to us directly as a Swiss company; South Korea’s PIPA has shaped how we approach collection in Korean investigations. The trend is toward less data movement, tighter purpose binding, and clearer provenance.
Aid4Mail processes locally by default. The software runs on the investigator’s laptop, workstation, or on-premises server; acquired data stays under the organization’s control; there is no cloud ingestion step required for core functionality. Pre-acquisition filtering is probably the most direct privacy tool we offer—by applying date ranges, custodian scope, and keyword constraints at the source, we reduce the volume of personal data that enters the investigation at all. On a typical case, that is a 90% reduction in personal information pulled into scope.
For AI, we support two deployment models. When practitioners can use cloud AI, we recommend the enterprise platforms—Amazon Bedrock, Google Vertex AI, Microsoft Foundry—because they offer explicit regional deployment. Bedrock and Vertex AI together cover major US regions, Canada, the UK, several European jurisdictions, and parts of Asia-Pacific, with leading commercial models available regionally. That flexibility matters whether the residency constraint comes from US sectoral regulation, GDPR, Canadian privacy law, or organizational policy on where data can be processed.
Where even regional cloud processing is not acceptable—classified investigations, strict air-gapped environments, certain government contexts—offline AI is the answer. Aid4Mail supports fully offline AI through the Ollama and LM Studio ecosystems, which lets investigators run general-purpose LLMs such as Mistral Small 3.2 24B, Gemma 4 26B, and others, locally on a single high-end consumer GPU. We validated offline processing at scale in a 34,097-email production pilot run entirely on the local network. Separately, our benchmark results show offline models can compete with cloud models on accuracy, depending on model, hardware, and task. To put the scale in concrete terms: an unattended weekend run (62 hours, Friday evening to Monday morning) processes around 614,000 emails on Mistral Small 3.2 24B at no API cost. The fastest cloud option in the same window—Grok 4.1 Fast—handles roughly 439,000 emails for approximately $130 in API charges.
Privacy and forensic rigor are not in tension in our experience: they point in the same direction. The less irrelevant data you over-collect, the easier it is to document scope, provenance, and proportionality.
What advantages does an independent, specialist software company offer practitioners compared to larger vendors?
The honest answer is that specialist vendors and larger platform providers both have their place, and experienced forensic teams usually use both.
What a specialist offers is focus. Email is what we do. We have been doing it for 27 years, and we do not have a roadmap competing with mobile forensics, disk imaging, or case management. That shows up in specific places: format coverage, recovery rates on corrupted mailboxes, the depth of our search syntax, the speed at which we can integrate new cloud authentication schemes or new AI providers. When Microsoft 365 App-Only Access became the right way to do tenant-wide collection, we were able to ship support quickly because the engineering work sits at the center of what we do, not at the edge.
Practitioners also get direct access to the people building the software. When a customer hits a bug in Aid4Mail, has trouble accessing a cloud mailbox, or needs help configuring AI, the response is direct and quick. Code fixes typically ship within a release cycle rather than a product cycle.
There is a Swiss dimension to this as well. We are based in Charmey, in the Swiss Prealps, and we keep the core email-processing logic in-house and maintain direct control over the components we use. That matters for integrity: we know exactly what our code does, and we can document and explain algorithm behavior to the standard a defensible matter requires.
What we do not offer is a single platform that handles every evidence type in an investigation. That is not our role, and it is why complementary workflows with EnCase, AXIOM, Relativity, and similar platforms are the norm rather than the exception.
Looking ahead, what do you see as the biggest developments coming in email investigations and eDiscovery?
Three trends stand out, and they reinforce each other.
The first is the continued absorption of email into complex cloud ecosystems. The boundary between “email” and “shared document” and “thread of comments” is already porous, and it will keep blurring. Investigations that start with email frequently now reach into OneDrive, SharePoint, Google Drive, and the collaboration platforms stitched around them. Workflows that treat email as a standalone object risk leaving part of the evidence behind: the linked documents, revision history, and sharing metadata that increasingly define what the message actually meant.
The second is the maturation of AI. We are past the novelty stage. What is ahead is operational rigor: reproducibility across matters, pinned model versions for long-running investigations, defensible methodology statements, clearer standards for prompt validation. The 65% inter-assessor agreement ceiling that Voorhees identified for human reviewers applies to AI outputs as well, and we will see more honest discussion of what “ground truth” actually means in a forensic context. Offline AI will also grow in importance as regulatory and national-security constraints tighten.
The third is regulatory complexity. Data residency, cross-border transfer restrictions, sector-specific rules in healthcare and finance, and emerging AI regulation are all moving at once. The practical consequence for investigators is that methodology has to become more explicit and more documented. Workflows that once got away with “we collected everything and filtered later” are steadily becoming untenable.
Underneath all of it, the volume keeps growing. The gap between what a mailbox contains and what an investigator can review manually has long been unbridgeable; the question now is which tools close that gap defensibly. That is the frame I would expect the next decade of email investigations to work within.
And finally, what do you enjoy in your spare time?
I live in Charmey, a village in the Swiss Prealps, so the mountains are part of daily life. Much of my time outside work is spent hiking with a camera, mostly for landscape and nature photography. The region changes completely with the seasons, and I still find new subjects close to home. Some of my images have been used in magazines and local tourist brochures, which has been quietly rewarding. I also enjoy skiing in winter.
My wife and I also enjoy traveling across Europe in our motorhome. It is a slower way to travel, but that is part of the appeal: you notice more, stop more often, and the camera naturally comes along. Apart from that, I read widely, especially history, science, and the classics. Geology taught me to observe carefully and follow evidence wherever it leads, and that habit has stayed with me. It is a fairly quiet life, and I like it that way.