Every few years, we get reminders of how vulnerable we are to acts of nature. Events beyond our control, including fires, floods and hurricanes causing large-scale disaster, have been experienced in various parts of North America recently. Both man-made and natural events will occur, frequently without warning.
Randy Johnston is a top-rated technology speaker at the annual Accountex USA conference. Randy is broadly known for his Technology Update presentation, which he updates continuously. At Accountex 2018 in Boston, Randy will be presenting on Cloud Technology. Randy has expertise in technology, security, accounting, software and computer infrastructure, and strategic planning and management.
With technology, additional risks come from hacks by bad actors resulting in data breaches or malware infections, hardware manufacturer errors that lead to issues such as the Spectre or Meltdown processor exploits, and software manufacturers making erroneous updates to their software that lead to work stoppages. These events all illustrate the need for Business Continuity and Disaster Recovery.
Let’s separate these two concepts, try to understand the difference, and focus on what we can do to improve business continuity.
- The need for an organization to continue to function at some minimal level, even after a disastrous event.
- Doing the right things to survive!
- Leave the unimportant for later.
One might say that business continuity is having the ability to continue to perform the key functions of a business no matter what happens.
- Have the ability to respond to an unplanned interruption.
- Implement a technology and communications recovery plan.
- Successfully restore an organization’s critical operational functions.
Disaster recovery, on the other hand, is about restoring normal operations after a cataclysmic event. Minimally, you must have your data to recover, and a plan of action to restore all normal operations. This document is what a disaster recovery plan should cover. Further, if a disaster plan is not in writing, it doesn’t exist. If a business continuity/disaster recovery (BCDR) plan isn’t tested, in has no value. Sadly, few businesses have a BCDR plan at all and even fewer have a methodology to execute a regular testing discipline.
There’s often a long lag in updating statistics following major events like the storms and fires of 2017 — and even following a major event like Hurricane Katrina as far back as 2005. A good rule of thumb is that while disasters statistically happen to less than 1% of all businesses in any given year, events that lead to the need for business continuity happen to almost every business every year.
A few simple examples can help you appreciate this point. Consider whether any of the following happened to you or to someone you know in the past year
- A file became corrupted and had to be recovered.
- Communications to the Internet were lost for more than a few minutes.
- Some piece of computer hardware failed.
- Somehow a licensing fee was not paid on a timely basis, whether this was for a web site, email, or a software product used on a daily basis.
- There was a theft, loss or breakage of some piece of equipment used by the business.
Now ask yourself, how did you continue to operate? Did you just “figure it out” at the time? Would it have been easier to pre-plan alternatives?
Business Continuity Planning to Reduce Down Time
Let’s consider a few areas where business continuity preparedness would reduce down time.
- Upgrade and update technology strategic and tactical plan
- Review of insurance
- Training of users
If I had to name the three most important items related to both Business Continuity and Disaster Recovery they would be 1) backup, 2) backup, 3) backup! Backup is taken for granted and is often perceived to be “someone else’s job.” However, if you are the decision maker or person responsible for IT, you are still responsible for backup even after the task has been delegated or outsourced.
Consider the old adage, “trust, but verify.” You can trust that someone else is making a backup, but it is your job to make sure that the backup has been tested by requesting and completing a restore procedure. This needs to be done at both the individual file and at the complete server levels. If you are outsourced to a cloud provider, how can you verify that their restoration and/or recovery procedures are working? You have to develop a testing methodology.
In my Network Management Group, Inc. operations, we routinely help organizations migrate and/or recover data. A quick solution for many organizations for backup is to install a backup appliance. These units are a combination of hardware and software configured to make backups and to ease restoration of a single file or a complete server. They can make a copy of any changed data every 15 minutes, and can be used to temporarily run one or more servers if there is a catastrophic failure.
We test our clients’ backups on a continuous rotation, resulting in a technician’s review about every week. You should consider how often your backup is tested. Clearly this cycle should be no longer than monthly. Backups are not “set it and forget it.” In addition, the backup log files don’t always indicate a successful or restorable backup. Backups must be tested.
Cloud providers should be performing similar tests to what we are recommending, but unfortunately they are human and also make mistakes. Because data is so critical to BCDR, we suggest that the data is in at least three different places and that an archival copy is kept off-line, if that is practical. Backup Appliances are usually the quickest and most sure way to copy, restore, and operate in BCDR situations. Backup Appliances can also be configured and used in various ways that benefit your business the most.
Note the thinking: You have to have a complete, valid copy of your data!
One thing that experience has taught: if something is going to fail, that failure will occur at the worst possible time. Because of this risk, building systems that have no single point of failure is critical to business continuity. Think redundancy everywhere. Redundancy may seem expensive, but how expensive is it if you can’t operate at all?
We suggest you consider how much down time costs you per hour. Pick an hourly rate, such as $50/hour and multiply that by the number of people. For example, if you have 10 people, being down for an hour would cost $500 ($50×10) and being down for a day would cost $4,000. If you were down for two or three days because a communication line didn’t work, can you see that figuring out how to be redundant, in advance, may be a good use of time and money?
To prevent downtime, the philosophy of redundancy recommends two communication lines of different types, and from different carriers, configured for automatic failover. If you are cloud-centric, redundant communication lines are mandatory. For example, you could have a Cable Modem and DSL, or a Cable Modem and a cellular hotspot, or a Cable modem and MPLS, or a fiber optic and a wireless connection.
Redundant thinking does not stop with the communication lines. Many of us retire old computers when they are 3-5 years old. How about smartphones? What do you do with the old computers and phones? Should you keep some as spares? What about your old server and SAN? It was working when you decided to retire it. Can you put these devices in another location “just in case”? How about switches? If you need 30 connections, instead of getting one 48 port switch, wouldn’t two 24 port switches allow for one switch to completely fail, yet still leave most people able to work off of the working switch? Vendors that sell firewalls will often sell a hot spare for very little additional money, and you can have redundancy for that hardware device.
Note the thinking: Allow no single point of failure!