Crisis lessons from OT incident response as cyber-physical attacks unfold within normal industrial operations - Industrial Cyber
Industrial CyberArchived Apr 29, 2026✓ Full text saved
Crisis lessons from OT incident response as cyber-physical attacks unfold within normal industrial operations Industrial Cyber
Full text archived locally
✦ AI Summary· Claude Sonnet
Attacks and Vulnerabilities
Control device security
Critical infrastructure
Cyber-Physical
Features
Industrial Cyber Attacks
Malware, Phishing & Ransomware
Risk & Compliance
Secure Remote Access
SOC & Incident Response
Tech Solutions
Threat Landscape
Crisis lessons from OT incident response as cyber-physical attacks unfold within normal industrial operations
March 22, 2026
Industrial cyber threats and attacks are rarely announced by blaring sirens warning organizations of their impending threats or danger. Instead, they look like mundane processing activities until a valve opens prematurely, a controller receives an order it should never obey, or operators detect a disconnect between how the plant is operating and how it should be operating. That is what cyber-physical attacks look like in OT environments, where the adversaries lurk inside the organizations and escalate quickly into safety and operational incidents.
OT incident response is very different when compared to IT compromises or attacks. In many industrial applications, the priorities lie in organizational safety, reliability, and availability, before confidentiality. A compromised PLC or control loop can wreak havoc in minutes, producing equipment destruction or even human injury, making conventional responses, such as immediate isolation or aggressive scanning, could result in unsafe process conditions, rather than containing risk. Incident response emerges as a delicate balance that organizations need to strike to neutralize the threat while remaining stable to continue operations.
When cyber attacks affect systems, organizations are concerned about people’s safety and keeping operations running smoothly. Industrial organizations can look at guidelines from places like NIST SP 800-82 and ICS-CERT advisories. These sources stress that OT environments have to think about real-world effects, like equipment and putting lives at risk when making decisions, about containing cyber threats. Cyber attacks can cause harm, so organizations must prioritize human safety and operational continuity.
Detection is equally more challenging. With adversaries increasingly hiding inside legitimate industrial protocols such as Modbus and Common Industrial Protocol (CIP), and mixing malicious commands with regular traffic, traditional monitoring fails to detect these aberrations, necessitating the use of behavioral analysis and deep packet inspection to detect subtle alterations in process variables. OT decision-making becomes a key fault line in a crisis. OT response differs from IT incident-led reaction predominantly executed by security teams, in that it requires plant engineers, safety leaders, and executives to consider operational risk prior to reacting.
Restoring the organization after a cybersecurity threat or incident involves similar stakes. Resuming operation without checking for control system integrity may allow reintroduction of compromised logic into live systems. Tighter segmentation, stronger access control, and continuous monitoring must be employed even at the lowest network layers to protect Level 1 assets, such as sensors and actuators.
Why OT incidents demand a different response
Industrial Cyber reached out to industrial cybersecurity experts to examine how incident response shifts when a cyber intrusion impacts Safety Instrumented Systems or Basic Process Control Systems, compared with a conventional IT-led breach. The discussion also explores how priorities, coordination, and risk tolerance change when physical safety is on the line.
Paul Shaver, global practice leader at Mandiant’s Industrial Control Systems – Operational Technology Security Consulting practice
Paul Shaver, global practice leader at Mandiant’s Industrial Control Systems/Operational Technology Security Consulting practice, told Industrial Cyber that often, the priority is weighted toward restoring operations and production, especially in critical infrastructure sectors where the tolerance for downtime is much lower.
“Coordination often requires teams outside of a normal CSIRT response, such as engineering, operations, EH&S, and third-party teams; this is why OT-specific IR planning and TTXs are so critical,” Shaver said. “If hazardous operations are involved—whether there is a risk of damaged equipment, environmental impact, or risk to human life–additional steps must be taken to ensure safe conditions persist throughout the investigation, forensic process, remediation, and startup.”
For example, he said that if a logic or safety controller must be replaced, reloading specific configuration files and programs may require a specialized startup process or even a full recommissioning process to ensure everything is operating normally before production is restarted.
Mary Gannon, OT incident response lead at GuidePoint Security
“The key difference when responding to incidents in OT environments is the primary focus on handling the incident through the lens of ensuring the safety of people, processes, and the environment,” Mary Gannon, OT incident response lead at GuidePoint Security, told Industrial Cyber. “Conventional IT breaches are primarily concerned with protecting data, whereas OT incidents are first and foremost evaluated from the perspective of preserving safety, followed by mitigating impacts to production.”
She added that all decisions should be made in alignment to maintain safety, and decision-makers should be fully aware of the safety implications when handling any incident affecting the OT environment.
Michael Metzler, vice president, horizontal management cybersecurity for digital industries at Siemens
“In OT and focusing on Safety, the security priority order inverts the IT CIA triad to Safety → Availability → Integrity → Confidentiality,” Michael Metzler, vice president of horizontal management cybersecurity for digital industries at Siemens, told Industrial Cyber. “A successful attack can cause production downtime, equipment damage, safety risks to workers, and potentially catastrophic environmental incidents.”
Unlike IT, Metzler noted that manipulating a SIS or BPCS can push a physical process into a more dangerous condition than the intrusion itself. “A Secure State and a Safe State are not the same thing and can directly conflict, meaning every containment action must be evaluated against its physical consequences before execution. This is why cybersecurity analysts must work alongside process engineers and plant operators in a human-in-the-loop approach, interpreting alerts within the operational context of the facility rather than acting in isolation.”
Moreover, he observed that responders must also work within IEC 62443 while frameworks like the EU’s NIS2 hold executives personally accountable for failures, making a mishandled OT incident response not only operationally but also legally consequential.
Mike Holcomb, founder, UtilSec
“IT incident response is most concerned with keeping data secure and keeping the business functioning, while OT/ICS incident response has goals of ensuring the safety of people and the environment, while maintaining operations and preventing any potential damage to the facility,” Mike Holcomb, an OT/ICS cybersecurity consultant and educational content creator, told Industrial Cyber. “It requires team members across multiple disciplines like cybersecurity, engineering, and operations to come together to ensure that these goals are met.”
Identifying attacks inside routine industrial commands
In industrial environments, attackers often use native protocols such as Modbus or CIP to blend in with normal operations. The executives address how responders can distinguish between malicious command activity and routine operational variability without disrupting the process.
“This is the core objective behind anomaly detection: there are always normal operating parameters, and data historians often track these closely to help optimize production,” Shaver said. “These same values can determine when something abnormal has occurred, such as when a set point is changed, or a condition is triggered that is not supported by other corresponding data. If an anomaly occurs with no immediate reason, it should trigger an alert that needs to be triaged and investigated.”
In the incident response cycle, Gannon said that the preparation phase is critical for a multitude of reasons, one being understanding what ‘normal’ looks like for the organization. “Establishing baselines and identifying legitimate traffic is key for organizations to determine their environment’s nuances. Additionally, this will help organizations recognize anomalies and subsequently determine if an anomaly is potentially malicious.”
She identified that anomalies can be identified through collecting logs within the OT environment and through adequately operationalized monitoring. Distinguishing between malicious activity and routine operational variability starts with establishing an understanding of normal operations and having the capabilities in place to identify anomalies as they come up.
“Detecting malicious commands hidden within legitimate industrial protocol traffic requires moving beyond simple traffic monitoring toward deeper, application-aware inspection,” according to Metzler. “At the network perimeter, next-generation firewalls analyze Layer 7 application-level traffic. They inspect what is being instructed across industrial protocols, not merely that communication is occurring. Stateful inspection at the controller boundary adds a further layer of protocol-aware filtering.”
For ongoing visibility, he pointed out that passive non-intrusive monitoring is essential. “Finally, automated alerts alone are not enough. Human experts with deep industrial knowledge must interpret detections within the operational context of the facility.”
Holcomb said that to be able to understand when malicious activity is occurring in the environment, or even when there is an operational issue seen on the network that needs to be addressed, defenders must understand how the plant operations, along with the normal baseline activity of the network. He added that if there is any abnormal activity on the network, the appropriate team members can identify any potential operational or cybersecurity issues and address them promptly.
Who decides during an OT cyber crisis?
During an active incident, the executives address how decision-making authority should be structured between the SOC and plant leadership when the tradeoff is clear, isolate the network to contain the threat, or maintain production to avoid operational impact.
“There is no set answer that works for every organization; we see this work best when it is preplanned and tested,” Shaver observed. “The best decision trees come from building an IR plan that establishes a path to an authoritative decision-making process. That plan should be regularly tested and updated through effective TTXs–at both technical and executive levels–to find what works best, sometimes even requiring specific processes for every individual site.”
Gannon said that decision-making authority during an incident should be clearly defined ahead of an actual incident. “The SOC and plant leadership will have different priorities when handling an incident in OT, which is why having one centralized decision maker identified for each critical decision is key for a cohesive response.”
She added that organizations should have an OT Incident Response Plan that clearly defines the stakeholder(s) responsible for making critical decisions, such as isolation, system/site shutdown, eradication, handling of ransom/extortion, etc. A leading practice is to assign an Incident Commander to be the decision-making authority during an incident, who will be able to take all the input from the SOC, the plant, and other key stakeholders to make an appropriate decision.
“Clear governance is a prerequisite in OT incident response. When the stakes involve both physical safety and operational continuity, ad-hoc decision-making is a liability,” Metzler said. “The core principle is that security expertise and operational authority must be clearly separated. Security teams monitor, triage, and recommend, but final decisions on actions that affect physical processes must rest with those accountable for plant operations. This distinction matters most at the control level, where an incorrect automated response can be as damaging as the attack itself.”
He added that underpinning this is the need for pre-defined organizational measures: clear roles, escalation paths, and response processes established before an incident occurs. These must be paired with continuous monitoring that gives decision-makers real situational awareness.
Holcomb said the response can depend on the environment, so operations and cybersecurity teams must answer these questions before an incident occurs. “Only the appropriate engineering and operations team members can determine if and when IT and OT can be disconnected without introducing safety or operational availability issues, along with other related response questions, so it is very important that the appropriate team members come together before and during any incident to make these decisions together, not in a vacuum.”
Validating control system integrity before authorizing restart
Before authorizing a cold restart, the executives look into what technical validation steps are necessary to ensure PLC logic, engineering workstation configurations, or RTU (remote terminal unit) firmware have not been altered subtly or persistently.
“This highly depends on what was impacted during an incident,” Shaver said. “It could be as simple as comparing a last known good configuration to the configuration that will be used during the startup, or at the extreme, a full recommissioning of the production line that could take days, weeks, or even months of work.”
Before initiating a cold restart or any kind of definite recovery, Gannon pointed out that specific assets need to be validated and confirmed to be in the clear. “All validations should be done against existing golden images, golden baselines, and/or golden configuration files as applicable. Validations should be done on the PLC logic, the EWS configurations, OS, software, credentials/users, and the RTU firmware. Various known OT adversaries favor persistence, and ensuring that all OT assets in their current state are clear of even the most subtle anomalies is critical.”
A cold restart without thorough validation is one of the highest-risk actions in OT incident response, Metzler recognized. “The core principle is that system integrity must be verified before any component is brought back online. This means relying on established Backup and Restore capabilities to recover from known-good states, rather than assuming a restarted system is clean.”
Metzler added that equally important is root cause analysis before restoration. “Bringing systems back online without understanding how the compromise occurred risks reintroducing the same vulnerability. Incident analysis must precede any restart authorization. Finally, recovery is not a one-time event but part of an ongoing security management process.”
“A major challenge that OT/ICS environments have in incident response is in recovery. To be able to truly recover, OT needs to understand how the attacker was able to get into the environment and everything the attacker did since being in the environment.” Holcomb said. “With less than 10% of global OT/ICS environments performing security monitoring, according to the Dragos 2026 Year in Review Report, most environments do not have the ability to know what actually happened during an incident. If we don’t know what happened, how can we completely ensure the attacker is out of the environment before recovery? We can’t. So additional care must be taken in ensuring the validity of all systems before a plant is brought back online to ensure safety, operational availability, and prevent equipment damage.”
Defending Level 1 assets in connected environments
Many facilities still rely on legacy Level 1 assets that cannot be patched and were never designed for connectivity. The executives focus on the practical architectural and procedural controls that organizations can implement to make these environments defensible against modern threats, including supply chain compromise and increasingly automated attack techniques.
“I cannot stress this enough: good network segmentation, access controls, physical security, and overall security hygiene,” Shaver said. “Mandiant has long had a ‘theory of 99’ that we discuss pretty often: 99% of attacks that impact OT will start on the IT assets, 99% of malware will be IT malware, and 99% of detection opportunities will be on IT infrastructure. If access between at layer 3.5 is tightly controlled, we should be far less concerned about what happens below layer 3. The reality is that most asset owners will never have the ability to patch or update most of what resides at layer 2 and below, so we have to treat it as always vulnerable and mitigate threats by controlling access.”
Gannon evaluated that legacy Level 1 assets traditionally were not designed to withstand the threats of modern day, which means they need to have extra protection. “Micro-segmentation is a great solution, allowing a more granular, policy-driven approach to keeping those legacy lower-level assets protected from potential threats already moving around the OT network. Through micro-segmentation, the focus shifts from the big picture, macro view of segmenting the OT network from the IT network, and narrows in on specific assets, allowing a shield around the assets that only permits necessary and approved traffic.”
“Legacy Level 1 assets represent one of the most persistent challenges in industrial cybersecurity. Since these devices often cannot be patched and were not designed with connectivity in mind, the security strategy must shift from securing the device to securing the environment around it,” Metzler said. “The foundational approach, aligned with IEC 62443, is network segmentation and cell protection: legacy assets are enclosed within dedicated automation cells, separated from the rest of the network by industrial security appliances.”
He added that firewalls control all inbound and outbound communication, restricting traffic to explicitly permitted paths only. “VPN encryption protects inter-cell data transmission against espionage and manipulation. Passive asset discovery — enabled by tools such as Siemens’ SINEC Security Monitor — mirrors network traffic without placing any active load on sensitive legacy devices, providing continuous visibility into connected assets and automatically correlating them against vulnerability databases.”
Metzler noted that detected assets are automatically correlated against vulnerability databases, providing proactive awareness even where direct patching is impossible.
“In today’s world, we have to assume that any asset on the OT network will become compromised in the future,” Holcomb said. “So we have to look at the security controls that we can apply to help limit the impact when such an event occurs. Thankfully, by mastering the fundamental controls of OT cybersecurity, like network segmentation, asset management, incident response planning, and continuous vulnerability management, owners and operators can significantly reduce the impact of any incident.”
Anna Ribeiro
Related
US bill allows critical infrastructure operators to detect and neutralize rogue drones, closing key defense gaps
OT-ISAC flags rising energy sector cyber risk as OT exposure spreads beyond control rooms into distributed assets
Nozomi joins Dragos in dismissing ZionSiphon as flawed, likely AI-generated malware with no operational impact
CISA, NCSC warn Firestarter malware enabling persistent backdoor access to exposed Cisco firewall infrastructure
NCSC launches SilentGlass device to block hardware-based cyber threats, secure vulnerable display links
Supply chain risk takes center stage in cyber sovereignty as hidden dependencies, long-tail vendors come into focus
Cybersecurity agencies flags use of covert networks by China-linked actors for espionage, offensive operations
Cato traces large-scale Modbus/TCP activity targeting PLCs, exposing persistent gaps in OT security
Dragos dismisses ZionSiphon narrative, says code flaws and weak ICS logic render OT malware operationally ineffective
ENISA updates NCAF 2.0 to help governments measure and close cybersecurity gaps, push cyber maturity benchmarking