Business interruptions are costly. In addition to lost revenue, an outage can cause lasting damage to an organization’s reputation. For these reasons, smart business leaders prepare plans for rapid, smooth recovery from disruptions. This is business continuity planning (BCP). This article discusses how BCP works and how companies like yours can put one in place.
What is Business Continuity? Sometimes called business continuity and disaster recovery (BCDR), business continuity comprises a dynamic mix of planning, processes, and people that works together to ensure that a company can react quickly to a disruption of its operations and continue to function as a business. For a variety of reasons, we tend to view business continuity as a technology issue. And it is, in part. But there’s a lot more to it.
BCP is really about anticipating the full set of risks that could impair a business’s ability to function and devising ways to mitigate those risks. Risk factors could include natural disasters, geopolitical events, supply chain problems, labor problems, and more.
Business continuity is more important than ever. With cyber threats like ransomware threatening businesses large and small, companies are well advised to prepare for serious disruption. Even non-cyber events (like the COVID pandemic, for example) can throw an organization into chaos if it is not ready with a business continuity plan.
The stakes can be quite high, especially for smaller firms. According to FEMA, between 40% and 60% of small businesses close permanently after experiencing a disaster. FEMA further found that 90% of small businesses that are closed for at least five days will fail within a year.
The specifics of business continuity vary according to the size and complexity of an organization, as well as its risk profile. However, the core elements of business continuity are the same everywhere. They include a risk assessment, a business impact analysis, and one or more recovery strategies.
Ensuring business continuity starts with identifying and assessing threats to operations. This may sound obvious, but the process of figuring out how things can go wrong often reveals risks that no one thought of before. A wide variety of threats can disrupt the day-to-day operations of a company, including:.
External disruption like a cyber attack or corporate espionage
Natural disruption such as a hurricane, epidemic, flood or fire
Political disruption including war, terrorism, and government instability
Material disruption such as raw material shortages or breakdowns in the supply chain
Mechanical disruption like production machine failure, fleet maintenance, or IT hardware failure
Organizational disruption including the death or departure of a key executive, mergers and acquisitions, and international expansion
A risk assessment should list risks that are probable and serious. It should also be detailed. It’s not enough to say “ransomware” and call it a day. Indeed, a risk assessment should identify data assets are most critical to the business, which could include:
Core financial data, such as the general ledger and sales transactions history
Customer records with personally identifiable information
Employee records with personally identifiable information
Record of products in production and inventory
Information associated with ongoing research and development
Every business is different, so a manifest of critical datasets will too be different for every organization. But if you don’t do a full assessment of your business data, you will not be able to make good decisions about what to protect the most.
With a risk assessment in hand, you can now evaluate the potential impact of these threats on business operations. Probability is important here. Not every threat deserves the same level of attention. The moon could crash into planet earth, but since that’s not very likely it should not make it to your business impact analysis (BIA).
The purpose of the BIA is to develop an understanding of how much damage a particular threat can do to your organization’s ability to conduct its business. First, you have to identify your critical business functions. These are the operations and assets without which your business cannot go on. If you’re like most businesses, you cannot function without your enterprise resource planning (ERP) system, which handles transactions and data for finance, human resources, procurement, manufacturing and more. Your ERP supports much of your business; if it goes down, your business goes down.
Having identified critical functions, the next step is to use a quantitative method that factors in probability with the level of impact. For example, you might measure impacts on a scale of 1 to 10, with a day-long ERP outage rating a 5 and a catastrophic fire that destroys all of your inventory garnering a 10. However, if the ERP outage is twice as likely as the fire, a BIA would rank them as having the same impact.
In some cases, the BIA will also incorporate a financial loss estimate. If your business generates a million dollars in revenue per day, a one-day ERP outage will cost about a million dollars, give or take reputation damage. The warehouse fire might present the potential for a billion-dollar loss, in contrast. Yet, if there’s a 1% chance of an ERP outage and a .001% chance of a fire, the two events are of equal financial impact, according to BIA methodology.
What if one of these risks comes to pass? If you play a role in maintaining business continuity, you have to think through how to quickly restore business function. In the practice of data protection, these are referred to as a recovery time objective (RTO) and recovery point objective (RPO). An ERP system, for example, might have an RTO of one minute. That means that if a cyberattack, for example, takes the system down a backup ERP instance will restore ERP functionality to users within one minute.
The RPO is about how far back in time the recovery will go. Let’s say the ERP has a five-minute RPO. If the ERP goes down, the goal is to have the backup instance running with data that includes transactions that occurred until five minutes ago.
The smaller the RTO and RPO, the better. In certain highly critical financial systems, the RTO and RPO might be measured in seconds. Sometimes, the failover is so fast and the RPO so narrow that users are barely aware that anything has even gone wrong.
Business continuity should encompass a recovery strategy for each threat that’s serious enough to merit inclusion in the BIA. The strategy has to match the threat and be specific enough to deliver on the expected RTO and RPO. For example, if the customer database is stored in an on-premises storage array and the most serious threat to it comes from ransomware, then the continuity strategy should involve backing it up to a system that’s resistant to ransomware.
This might mean having some “immutable” AWS backups. Or a general strategy, like the 3-2-1 backup policy should be mandated for all critical data assets. With 3-2-1, you keep three copies of your data, one of which is always off site. This might be part of your enterprise data protection plan.
Alternatively, if the biggest threat to the database is from a natural disaster, the backup instance should be in a geographic region that won’t be affected by the same disaster. A data center in Florida would therefore be backed up by a site in Arizona, for example. Indeed, alternate business locations factor into many recovery strategies.
The same kind of thinking should apply to supplier relationships. For instance, if you make cars, you can’t run out of spark plugs. If your spark plug vendor has a ransomware attack and can’t ship you anything for a month, you need a contingency vendor who is available to ship you spark plugs within a predefined period of time—an RTO for sparkplugs, so to speak.
The result of all this deliberation will be a BIA chart like the simplified example shown here. For each risk, there will be a probability, an estimate of financial impact, and a continuity strategy. An RTO and RPO might be included as well.
Ransomware attack on ERP
|Cloud data backup
Catastrophic warehouse fire
Spread inventory across multiple warehouses.
Fire suppression system.
Business continuity planning operationalizes the continuity strategies and the BIA. Without a BCP, there will be no business continuity. This wisdom is not as widespread as you might imagine. According to the global consulting firm Mercer, in 2020 more than half of global businesses did not have a business continuity plan in place. Let’s hope they don’t face a major disaster.
The best way to think of a BCP is a systematic plan that adds people, process, and organizational structure to your business continuity strategies and turns them into actions. It integrates plans and steps taken by people and involves different information systems. BCPs are important because they convert the idea of business continuity into coherent, results-driven actions. BCPs are necessary because it’s not enough to think about business continuity without a concrete plan for making it happen.
The organizational aspect of a BCP is of great importance. While it’s tempting to view business continuity as a technical issue, the reality is that resiliency comes from connecting people with processes and systems. The BCP addresses the question, “Who will take action if there is a disaster?” The BCP answers this question with many specifics, e.g., if risk X occurs, Person A is assigned to take Y step to recover. If Z risk occurs, Person B is assigned to take C step to recover, and so forth.
How does a BCP take shape? First, someone has to decide to create a BCP. This may sound obvious, but it’s a step that some organizations miss. Or, they assign it to someone who lacks the authority to realize the plan. Executive sponsorship is useful here. Someone high up enough to allocate budget and assign tasks to people needs to be in charge, or at least supervise the project.
The next step is to establish a BCP team. This will be made up of people from different departments, such as physical security, cybersecurity, IT, HR, and individual lines of business. Working together, the team members contribute their knowledge of critical business processes, how they are to be prioritized for recovery, and how the recovery will take place.
The team develops the BCP. It may take time and the team members will invariably have other jobs to do. So everyone needs to be patient and let the process unfold at a reasonable pace. The deliverables include the BIA and recovery strategies, along with specific assignments of action steps to people who understand their roles and responsibilities.
The BCP development process contains three further steps that can make a difference between success and failure in the event of a disaster: training, testing, and updating. Anyone who is expected to perform a business continuity task needs to be trained to do so. The entire plan needs to be tested on a regular basis, perhaps once a year. That way, if there are gaps in the plan, testing will reveal them. Testing frequently reveals that people may not understand their roles, prompting better training as a follow up. Also, given that business and systems are not static, the team needs to update the plan on a regular basis.
A lot of things can go wrong with a BCP. Some of them have nothing to do with the BCP itself but are still relevant to its success. For example, not having countermeasures and controls in place to detect a disaster before it happens can have catastrophic results. Cybersecurity solutions for anomaly detection, to name one example, can mean the difference between a minor outage and a company-wide shut down.
Other challenges to BCP include under-staffing or under-resourcing the BCP team: They need the time and space to do it right. Once at work, the team might miscalculate the business impacts and probabilities of individual risks. Or, they might err in the way they designate recovery strategies. This is why testing and training are essential, and why a lack of testing can be dangerous.
Good BCPs are evident in a number of publicly known business continuity success stories. For example, New York University found itself prepared to handle the incredible disruption of the 9/11 attacks, which took place just two and a half miles away. NYU had wisely set up a BCP and a command center that enabled the university to coordinate its emergency response and evacuation with emergency services and the police. Their BCP also covered their electronic systems.
If a BCP fails, the best practice is to investigate what went wrong and prepare for more effective action next time. For example, Delta Airlines experienced a serious IT outage in 2016. A long delay in getting backup systems online caused a reported $100 million in losses, along with a hit to the airline’s good name. The takeaway for management was that the airline could have benefited from a more coherent and up-to-date data recovery plan and accompanying backup systems as part of its BCP.
The California Department of Motor Vehicles (DMV) had a similar problem in 2016. When IT systems went out, both of the DMV’s backup solutions went offline simultaneously. This event caused the DMV to be inoperable for a period of days. The lesson learned here was that backup systems cannot share the same power source, which is how the environment had been set up.
BCP continues to evolve. Technology companies and innovative thinkers in the business world are coming up with new ways to do BCP better, faster, and cheaper. Advances include the automation of business continuity processes and the application of artificial intelligence (AI) to recovery strategies. Data security like Rubrik–which offer solutions that reduce the impact of ransomware attacks–also bolster resiliency by mitigating serious cyber threats to business continuity. However BCP changes, though, the keys to success will be adaptability and agility in a changing business landscape.
A: Business continuity encompasses risk assessment, a business impact analysis (BIA), and recovery strategies.
A: A BCP should be updated regularly, perhaps once a year, though in a large organization, more frequent updates might be wise. If there is a major organizational restructuring, that, too, should prompt an immediate BCP update.
A: Restoring data from a backup system is an example of business continuity. If a critical system, such as ERP, goes down, a business continuity plan should provide for a fast restoration of its data and functionality.