by Kay Ewbank
Could your business recover from an IT disaster? Kay Ewbank suggests some strategies.
HardCopy Issue: 60 | Published: May 1, 2013
Mention disaster recovery to someone in IT and you get a similar reaction to talking about plane crashes at the airport. It’s that mix of bravado and fear, combined with fingers crossed behind the back. Many people quote a statistic suggesting that 80 per cent of businesses affected by a major incident close within 18 months, and although no-one seems able to find an actual survey backing this up, it’s obvious that a large-scale IT failure is going to cause big-time problems. What’s more, disasters do happen, and they don’t need to be dramatic to cause problems. A corrupted hard disk on the database server could lose all your customer data; an email system down could mean you’re paying staff to sit and play Solitaire.
There’s a tendency to put disaster recovery a long way down the list of everyday priorities because there are always more important things to do, as in: “If you don’t win that new order, there won’t be a company to recover!” This may be the reason Gartner estimates that only 35 per cent of SMEs have a comprehensive disaster recovery plan in place.
If you don’t have any plan in place, then start with the basics. It’s better to have the essentials covered and then add the trimmings over time. Work out what would really be disastrous if it disappeared, and how rapidly the company would need to regain facilities such as email and Web presence.
Rising to the occasion
What constitutes a disaster isn’t limited to your office catching fire; circumstances outside your control can still cause problems. Northdoor is a specialist in IT services that started life delivering electronic transaction systems for the London insurance market. The company implemented a tiered DR solution using live replica servers located with the Infrastructure-as-a-Service (IaaS) specialist Rise. Jason Wyatt, Technical Development Manager at Rise, says that virtualisation has allowed SMEs to implement the disaster recovery solutions that were typically reserved for much larger cash-rich organisations: “Microsoft’s Hyper-V has allowed companies like Rise to offer IaaS from a cloud platform at a very competitive price to small businesses without the need for dedicated hardware.”
Northdoor had to use its disaster recovery plan because of power problems in the City of London over the summer. Northdoor’s Director of Integrated Solutions, Jon Milward, said that “The power interruptions led to some server failures, so while resolving the issue we were running our critical email and helpdesk applications from our disaster recovery server with Rise. It was quick to change over and robust. Customers were totally unaware of what had happened. There was no negative impact on the business.”
Northdoor had put into place a two-tier cloud-based system to be cost effective. Milward: “The critical systems are done at the highest level of availability. There is literally a live replica server with Rise for each one of our critical systems. These are always up and running, and constantly being replicated with every data change we make. It’s essentially server mirroring in the cloud.” If one of the critical servers fails, Northdoor can switch over to the mirrored replica in the Rise environment, and because it’s constantly updated, the switchover can be made in 15 minutes without any data loss.
Less critical systems have data changes replicated in Rise’s datacenter, but not on live replica servers. If one of the on-premises systems fails, the Northdoor team spins up a new cloud-based server and then restores the data from the repository onto it.
A disaster recovery (DR) strategy obviously starts with the backup of servers and data, but the strategy does need to be more than just having backups. Giovanni Goduti, UK Sales Director, Data Management at CA Technologies, says that too many companies believe that if they’re backing up their data, they’ve got a disaster recovery strategy, and they’re doing enough to stay out of trouble: “If a company experiences a true disaster, they need to have a lot more in place to be secure, including off-site replication and a real understanding of how to get back up and running.”
The DR strategy also needs to incorporate details of who is responsible for declaring an emergency and sorting the problem out. Work out how employees will be told (remembering you won’t be able to email them if someone’s stolen the email server), and how you’re going to reassure customers that normal service will soon be resumed. Jason Wyatt, Technical Development Manager at Rise, says that the business needs to understand how their employees will access the DR environment and what needs to be put in place to make this effective, adding ”Businesses often overlook small details like secondary DNS records which, if not properly thought through, will extend the amount of time before being fully operational.”
Once you’ve got your plan in place, put a regular review in place: after all, business practices change, new hardware gets added, old applications become obsolete. The plan needs to reflect what your company is doing now, not what it was doing five years ago. Giovanni Goduti of CA says “Don’t allow DR to become an afterthought. Everyone knows that budgets and time are limited, and there’s a temptation to concentrate on the headline needs first and push DR to the back of the queue. That’s a big mistake: whenever you’re making or changing plans, you have to review your DR strategy at that time to see what needs changing and to be sure you’ve got the means in place to recover from a disaster.”
One key thing to remember about DR is that unless you’ve tested your plan, you don’t really have one – you just have a collection of ideas that may or may not work. Gareth Fraser-King of Symantec says, “The most important point is probably ’always test’. I find that most companies put in DR solutions, even have a DR handbook, and they won’t ever have actually tested the thing. If you say to them, ‘And now recover it’, they go pale.”
Testing is difficult, but unless you’ve actually tried recovering a server from your backup, you won’t know whether it works or not. The good news is that virtualisation has made the task of testing backups considerably easier as you can create a virtual replica of your real infrastructure, restore to the virtual location and check that everything works – at least in the virtual world.
When you’re testing, you should be prepared for problems to surface. According to a survey by Symantec that covered 15 countries, 30 per cent of companies with 500 or more employees that test their plan at least once a year report a failure. However you shouldn’t see problems as failures; they’re a sign that your testing is working, giving you a chance to put things right while you still have the original data.
‘Keep it simple’ is another key piece of advice. If the worst happens and you need to put things back together, you really want to be able to run just one recovery process to get your system back. You don’t want to have to run the main recovery app, then the database recovery app, then the mail server recovery app, and so on. As Gareth Fraser-King of Symantec says, multiple backup solutions can be a potential problem: “If the company is using a proprietary solution to back up the database or the mail server, a different one for part of the data centre, then there’s an overall backup for the whole of the centre, that can be a problem not only in a recovery situation, but in knowing whether the backup actually works. The admin for the database proprietary element, for example, has no visibility into the data centre backup – they just get told it’s been backed up. That means they don’t really know whether they have full backups or not. What’s more, if a problem occurs, there are so many stages to recovering the working system that it’s difficult to know whether the end result will work.”
Fraser-King says the main thing is to have an integrated solution rather than a mish-mash of point solutions, because an ad hoc solution generally doesn’t work. ”The route to success is to keep things simple. Go for a single source – one app that covers everything. If I was in a smaller company with just a single server running, my main concern would be to get a single backup source to eliminate potential confusion, and to maximise the chances of recovering if something does go wrong.”
Deciding where your backups are physically located is an important part of a disaster recovery plan. Ideally you want to have some elements locally with others held off-site, and most insurance companies will require you to demonstrate that you keep off-site backups. Giovanni Goduti of CA says that an important question to ask yourself is how you can provide off-site protection for your data: “Ideally, you want to back up your data to disk and then replicate it to a second site, whether that be a site you already have, or a cloud destination.”
The option of storing backups in the cloud has widened your choices. If you are thinking about using an off-site recovery system then your options range from simple data backup where the cloud replaces those boxes of tapes in the basement, to full hot-site recovery plans in which your users can switch over in minutes to using a full replica set of servers ready with all your business data.
Such systems rely on minute-by-minute replication from your local Storage Area Network to a remote SAN. Many companies will go for the middle ground option, where data is copied to the remote site and servers are available at the DR site but don’t run all the time. Instead, your data and apps are loaded from the backup if and when it becomes necessary. The tricky part here is ensuring that the servers you’re going to use have the same configuration as your local servers; anyone who’s tried restoring a local server and run into difficulties because of a missing driver or different BIOS will appreciate just how difficult this can be.
Cloud-based virtual systems may prove more forgiving. In this scenario, you create a copy of your physical servers to a virtual server, then send the file containing the virtual server to the recovery site. You can choose the frequency at which the virtual server file will be transmitted, and if you need to carry out a recovery, the virtual server is turned on, you configure your network to point to it rather than the local device, and you should be back up and running.
Despite the simplicity, Symantec’s Fraser-King says that many companies still feel uneasy about storing their data in the cloud. However, he thinks a combination of on-premises and off-premises backup (so-called cloud hybrid solutions) are ideal, with cloud essentially being the modern equivalent of tape: “The cloud has all of the qualities of tape – simplicity, the ability to store as much information as necessary without worrying about running out of space, and the security of off-premises backups. In cloud hybrid solutions, you take local copies that can be used if you need to quickly restore a system or some data. Those local copies are then used to create the off-site cloud backup. You stream from the onsite backup to the cloud backup, and the process can take as long as necessary with no impact on front-line servers. The combination of a local backup for fast restoration and a remote backup for security, along with the lack of impact when creating the remote copy, means you overcome most of the problems associated with backing up data.”
It’s worth noting, adds Fraser-King, that the experience of backing up to the cloud has changed over recent years: “When I first started backing up my data to the cloud, it was slow and painful; low connection speeds and the way the software worked meant backing up 10GB could take three days! Now I’ve got much more data but things have improved to the point where I can back up 15GB in three hours if necessary. What’s more important is that I don’t have to think about it or be held up, because in general I have block level continuous data protection, and the data gets transferred packet by packet with no system impact.”
If you are considering a cloud-based backup provider, though, it’s vital that you can trust the provider; you want to know that your cloud backup provider has been operating for a while. What you don’t want is a company that suddenly disappears, along with your data. It’s also important to consider the physical location of their servers, and whether that has implications for your obligations with regards to data protection.
While virtualisation is a great asset when used to create copies of systems, it has made backup much more complicated. Things used to be very simple: take that data or system from here and put a copy of it over there, just in case here doesn’t work any more. Virtualisation has allowed you to create a whole series of virtual systems within a single physical system, and the ‘thing’ that you’re trying to copy and put somewhere else is potentially dynamic, so it’s a lot harder to confirm that you’ve got a full, working backup. Software such as vMotion that lets you move running virtual machines between hosts means your apps can move around so that the infrastructure changes rapidly, making it potentially difficult to follow just what’s running where.
Disaster recovery solutions
When considering products that can be part of your backup solution, you should look for software that can cope with both virtual and physical environments, and that can backup and restore entire servers and server applications. Symantec Backup Exec 2012 is an integrated product that protects virtual and physical environments. It can be used for both backup and disaster recovery, and can recover a good selection of environments including entire servers, Microsoft applications such as SQL Server and Exchange, and VMware or Microsoft Hyper-V virtual environments. Backup Exec ensures you can keep using your network and servers while backup is taking place, with support for load-balancing and bandwidth throttling. Another part of the Backup Exec family is Backup Exec.cloud which supports both hybrid and cloud-based backup to Symantec data centres. Under Backup Exec.cloud your data is backed up continuously with changes automatically synchronised to the cloud and, if a hybrid system, to your own servers. Quest NetVault Backup (now part of Dell) provides cross-platform backup and recovery for both physical and virtual machines. The management console can run on Windows, Linux and Mac OS X. NetVault Backup can be used to back up and restore applications including Microsoft SQL Server and Exchange, MySQL, DB2, and Oracle.
Computer Associates ARCserve has good support for both virtual and cloud environments. It has both VM-level and host-level backups for VMware servers, and VM-level backups for Microsoft Hyper-V and Citrix XenServer. The latest version has added remote virtual standby which lets you replicate image-based backups to an off-site facility and convert them into bootable VMs. You can also have a disk-to-disk-to-cloud (D2D2C) backup policy where backed up disks are then copied to the cloud for remote, off-site backup. Public cloud services such as Microsoft Azure and Amazon Web Services EC2 are supported. Disaster recovery is always going to be a headache, and the only way you really find out that you’ve solved the problem is if the worst happens. What’s important is that you have a strategy in place and that you follow through and keep following through.