Effective communication between decision-makers and technical teams is critical when responding to a cybersecurity incident. Unfortunately, this is the area where Praetorian most commonly sees communications breakdown is during severe and business-impacting attacks. When communication fails, decision-makers find themselves frustrated by a lack of actionable information while the technical teams flounder to prioritize business needs over technical challenges. The result is unnecessary friction that hinders efforts to neutralize the threat and recover to normal operations.
Most of this friction comes from incident response plans that dogmatically approach an incident as a series of technical steps to complete. Borrowing from standards such as NIST Special Publication 800-61, these plans outline iterative processes to investigate, contain, eradicate, and recover. These are important actions in response to a threat; however, they are only one component of executing an adequate response. These plans fail to consider the complex decision-making required to perform those actions effectively when faced with an incident with severe or immediate operational impacts.
Your cyber incident response plan needs to empower decision-makers and provide mechanisms to keep them informed. A good plan not only considers decision-makers but enforces a structure entirely geared toward informed decision making. To build that plan, we must understand how decisions are made. The U.S Air Force’s OODA Loop framework offers one way to understand how people make decisions.
The OODA in OODA Loop stands for Observe, Orient, Decide, and Act. Originally conceived by Air Force fighter pilot Colonel John Boyd, the framework describes everything from tactical decision making to higher levels of strategic command and control (ref: https://www.airuniversity.af.edu/Portals/10/ASPJ/journals/Chronicles/Hill.pdf). The OODA Loop borrows heavily from other famous military strategists but perhaps most directly, from the wisdom of Sun Tzu. Sun Tzu’s notions of knowing thy enemy and the concept of subduing an attacker without battle are resident in the concept of the OODA Loop. Boyd presents the Loop not only as a decision-making tool but also as the requisite manner in which all decisions are made, even at an unconscious level.
An effective incident response plan allows you to execute your OODA Loop quickly which enables rapid remediation and recovery. Conversely, an ineffective plan’s OODA Loop is slowed by confusion, unnecessary information, inaction, and poor decisions. The plan should define information flows, pre-approved actions, and decision authorities, among other aspects.
As a loop, the decision-making process happens iteratively. Planning, training, and deliberate practice are all things that help to shorten the OODA Loop. Beating the attacker’s OODA Loop once is not sufficient; you must strive to outpace the attacker at every opportunity. This military-style decision making is useful in incident response scenarios because the fight to protect an organization’s crown jewels is exactly that — a battle.
Below, we describe an approach to incident response from the perspective of a decision-maker using the OODA Loop as our framework. For each phase, we provide key takeaways to consider when building or improving your incident response plan. While a complex incident may require thousands of decisions at all levels of an organization, this approach focuses on providing a structure and tempo for the entire team to support decision making at a business level.
Phase 1: Observe
Decision making starts with observations. Rather than using their senses to make observations about an incident, almost everything a decision-maker knows comes from their team. Your incident response plan should address how, when, and what information funnels up to business decision-makers.
Presenting decision-makers with information starts with a well-defined written process and agenda for executive status updates. These updates should occur frequently enough to enable a rapid response, but allow enough time between updates to enable accurate and thorough work (see Orient below). While update frequencies may vary by organization and the urgency of the incident, holding updates every 3 to 6 hours is a good starting point.
Routine updates could include:
- Actionable information
- Business impacts and risks
- Answers to key questions from previous updates
- Recovery plan status
- Suggested courses of action (see Decide below)
The team should also understand what conditions and critical information the decision-maker wants “right away” outside a regular status update. You might call these items Critical Information Requirements (CIRs).
Examples of CIRs might be:
- A data breach resulting in a loss of intellectual property
- Downtime beyond a certain time threshold
- Identification of a malicious insider
- Threats or other types of harassment to individuals or locations
- Violation of regulatory requirements or situations that require regulatory reporting
Tools such as dashboards enable incident response teams to provide real-time updates of the latest information and decrease the overhead of in-person status updates. However, moving to this solution too quickly may prove ineffective unless all response team members understand what information they need to include in this dashboard and why.
Key Takeaway: Build your incident response plan around frequent well-defined status updates that provide decision-makers with everything they need to make informed decisions. Anticipate what your business’s CIRs are and document them.
Phase 2: Orient
It is the entire response team’s job to Orient the decision-maker to the threat with answers to important questions that provide clarity about the reality of a complex situation. It is the responsibility of the decision-maker to drive the team’s efforts toward the most useful questions. The question is not how many systems were affected by ransomware, the question is which business functions are affected and what they need to recover. The question is not what type of cryptominer is on the company’s servers, the question is how attackers breached that area of the network and what information is at risk. With an understanding of the nature of the problem, decision-makers can better assess the business risk and overall impact.
Without a strong understanding of their organization’s business priorities, undirected technical teams are prone to focus on technical problems. This is understandable given the technical nature of their training. However, answers like “the attacker used this Windows API call to evade the detection stack” are of limited use to decision-makers in a crisis. When business priorities drive investigative priorities, technical teams provide the answers needed to make informed decisions.
Because we tend to draw conclusions almost as quickly as we have received the information, the Observe and Orient phases can be difficult to distinguish. It is useful to separate the two in stressful situations to ensure the proper synthesis of all available information. In this case, slow is smooth, and smooth is fast. The Orient phase, consisting of deliberate investigation and analysis, takes up the majority of the time between status updates.
Anticipate key questions and concerns ahead of an actual incident. Next, ensure your incident response plan and capabilities are adequate to address these concerns. These predictions should influence the procedures you write, the telemetry you collect, the technology you deploy, and the information you aggregate.
Key Takeaway: Incident response teams should prioritize investigative efforts on the organization’s most important questions. The answers to these questions should orient decision-makers to the problem and arm them with actionable information. Incident response plans and capabilities should anticipate these key questions and provide a process for synthesizing information for decision-makers.
Phase 3: Decide
As the name implies, this is the decision maker’s domain. The previous two stages exist to support the decision. Whenever possible, teams should recommend up to three distinct options, or courses of action (COAs), to decision-makers based on their analysis. The Decide phase is where potential actions are weighed against each other to determine what must happen next.
The decision-maker must take all available information into account to determine the best COA, the potential impacts, and confidence in success. Incident response plans can empower decision making by outlining pre-approved actions, decision thresholds, and decision authorities using tools such as a decision matrix. By outlining these considerations ahead of time, less time is spent during an incident running up and down the chain of command seeking clarification and avoids decision fatigue at multiple levels.
As an additional tactical consideration, all decision items must be tracked in some fashion. A decision without follow-through is no decision at all. Assigning a scribe or similar position during an incident can help to ensure decisions are recorded and followed. As mentioned above, technology solutions may also play a role here but should only be used to augment an existing process, not as a replacement for a process.
Key Takeaway: Decision-makers live here; all decisions must be recorded and communicated clearly. Provide decision-makers with options. The more decisions that can be delegated, pre-approved, and/or predicted ahead of time shorten this phase and ultimately shorten the loop.
Phase 4: Act
Once decisions are made, the task has to be carried out. If an owner was not assigned during the decision phase, one must be assigned for each decision item.
At this stage, measures of performance (was the task completed?) and measures of effectiveness (did the task have the intended outcome?) are developed. As teams execute assigned tasks, they must record outcomes and status for later reporting.
The Act stage is often quick in theory but poor communication channels, unclear guidance, or misunderstanding of decision criteria can easily cripple any team. A properly developed and practiced incident response plan enables teams to take quick and effective action with little friction when given a task. This is especially critical when a situation requires rapid remediation and recovery.
After execution, the Act phase feeds directly back into the Observe phase by providing performance and effectiveness measures as direct inputs. It becomes the decision maker’s job to point the team toward the next most important questions and concerns, continuing the cycle.
Key Takeaway: Decision items must be executed and outcomes communicated appropriately to be effective in starting the next iteration of the OODA Loop.
Keep the OODA Loop in mind as you develop and improve your incident response process. Build your process around decision-makers to ensure they have everything they need to make tough decisions. Improve your process with proper planning, training, practice, and execution to help to achieve a fast and efficient loop. There will always be fog and uncertainty, but as General George S. Patton Jr. once said, “A good plan today is better than a perfect plan next week.”