How to Throw a Disaster Recovery Tabletop Workshop
Ransomware and destructive malware. You either groan with media fatigue or cringe at the thought of getting blown off the map by bitcoin bandits…perhaps both. For many organizations, creating a multi-leveled disaster recovery plan to accommodate this potential threat is now a top priority. The problem is, many organizations create a DR plan but don’t test each year.
It’s easy to procrastinate DR testing, as it’s a costly activity in terms of both hours and infrastructure. But failure to test in a complete and realistic scenario can leave an organization woefully unprepared for some of the ancillary activities like communication and ownership of action. Essentially, the first time the crisis team meets should never be during a crisis.
Simulating an attack around a table with a few colleagues doesn’t replace live testing, but it does uncover things you may not otherwise think of. This blog series will help walk through the setup and execution of a tabletop exercise for testing your DR plan. In true RPG style, this post will show how to simulate an unfolding disaster and apply your DR strategy in response.
The advantage of running a tabletop exercise is its lightweight impact in terms of time and resources. These exercises can give you a real feel for how (un)prepared you are to deal with a digital smoking hole. Here’s how you can get started:
Identify Key Stakeholders
The first step to running a tabletop exercise is determining who needs to be involved in the event of a cyberattack. The workshop should include members of the core crisis team, such as: CIO, CISO, I&O leader(s), IT DR, and the Business Continuity Manager. Potentially other roles that face customers and suppliers could come into the mix. For example, if the payroll system is suddenly displaying a ransom demand, then HR needs to be in the loop. Different scenarios require different stakeholders, and it’s always shocking to see how many departments are impacted by any disaster.
Master Your Talk Track
Now that we have the party list, I want to get into the business justification because inviting your team to run around with imaginary hackers and trolls might be a career-limiting move. You are going to need to have your talk track down to convince everybody to get on board.
When pitching this to your team, explain how running a fictitious DR scenario promotes out-of-the-box thinking and forces you to think about things you hadn’t considered before. Tabletop exercises uncover lessons that help aid creativity and improve your existing DR and/or BCM plans.
Determine the Business Impact
Next up, it’s time to understand your mission-critical applications. Quantifying the importance of any individual application to the business can be challenging, so start by identifying which applications keep the business running. Ask yourself: What does the business do and produce? What services are provided? Who owns that responsibility? Are we in an emergency state when it’s impossible to ship and receive orders or is that a temporary inconvenience since other departments can keep running? When measuring an attack by “goods not produced per hour” (for example), you can determine what is really mission-critical when everything lights up red.
Understand Application Dependency Chains
Applications are all part of a chain that keeps the business running, so the next step is to find out how everything is connected. When you look at the supply and distribution chains and contingency management, the vulnerable components come into sharp focus. I have run workshops like this in the past, and you would be amazed at what you miss during paper planning for DR. If I am a logistics company, what happens if I cannot take new orders or track current shipments. If I am a car manufacturer, what happens when key components or portions of the supply chain stop? Which applications and services support those activities (don’t forget DNS, Time Servers, and AD)?
Start the Workshop
Now it’s time to execute your tabletop workshop. Begin by mapping the malware attack and its damage as it progresses. As you move along the attack’s journey, you will uncover many details that would have to be dealt with ad hoc during an actual crisis.
You should ask questions like: Is it feasible to initiate a restore for the number of systems impacted? Do we need more information? Can we reasonably restore a backup at this point? If replication is in use, do we have snapshots we can trust? Who gets notified when? Is it time to inform the CEO? Is it time to notify customers/suppliers or the media? Is there an analogue paper and pencil or whiteboard failover that is possible? All of this adds to the reality of the situation, and makes the group think about what departments are going to be involved in all these tasks. They are going to fail miserably the first time, and they should.
After the scenario is complete, there will be a behind-the-scenes portion where the curtain is lifted. This will be the start of follow-up.
Next up: It’s 8:00am, and folks are starting to come in. Susan from Help Desk called IT to say, “The team down in shipping are saying their scanners are displaying error messages when they try to receive or send.”
Want to learn more about our multi-leveled defense against ransomware? Read this tech deep dive, Polaris Radar: Monitor, Detect, Recover.