Checklist for Disaster Recovery Planning in a Virtualized Environment
As an IT professional, your job is to design, develop and manage the technology systems that support the business’s ability to turn a profit. But equally important, is the ability to restore these systems as rapidly as possible in a disaster. Before you begin spending money on your disaster recovery software, hardware and facilities, you need to know where to start. You should first assess your business and gain an intimate understanding of which processes, systems and data are most critical to the success and future survival of your company.
Jumping into the creation of a virtualized disaster recovery plan without the proper planning can result in skipping key steps and neglecting to collect key information prior to designing. This means you can end up with an unwieldy DR solution that doesn’t meet the needs of the business.
This article is aimed at providing an overview of what should be included in a sound virtualized disaster recovery plan. It includes industry best practices by taking a phased approach to instituting a DR plan. This approach includes the following phases:
Assessment—Gathering key requirements for DR solution
Design—Creating a DR plan to meet business and technical requirements
Deploy—Stand up necessary infrastructure. Install, configure and test solution
Manage—Test your DR plan as frequently as possible
Regardless of the business size, the process of DR planning is similar. This approach should be reapplied as your business requirements change or to take advantage of technology advancements that can reduce costs and enhance DR capabilities. The result is a DR plan that is flexible enough to adapt with the times and your business. Since we’re talking about recovery of virtualized environments, we will focus on leveraging its unique capabilities to the maximum degree.
Since we’re concentrating on the topic of DR planning, we’ll focus on the “Assessment” and “Design” phases.
- Business impact analysis
- Determine RPO (recovery point objective) and RTO (recovery time objective)
- Understand your budget
- Understand application dependencies
- Automate VMware environment data collection
- Virtualize stragglers
- Analyze resource requirements
- Design for easiest restore
- Decide on infrastructure configuration
- Test-drive DR plan
1. Business Impact Analysis
Business impact analysis (BIA), is the process of understanding which processes, systems and data are most critical to the success and future survival of your company. Without one, you open yourself to the risk of wasting resources or overprotecting assets of little value to your business. Worse yet, without a BIA, you may end up neglecting to plan recovery for key IT systems that your mission-critical systems depend on. The following main areas of assessment are addressed:
- Identify critical business systems
- Identify system resource dependencies
- Identify key support personnel or teams
- Estimate disruption impact
- Determine resource recovery priority
Having this granular knowledge of the business impact of your critical systems not only will help you in a real disaster scenario, but it will help you test your preparation for the disaster. Knowing where to focus your DR planning efforts will help you greatly streamline and prioritize your disaster recovery exercises, or tests. It’s important to remember that BIA is not just a technical system inventory, but you’ll need to work with each department or business unit within the organization to determine key requirements and down time impact.
2. Determine RPO (recovery point objective) and RTO (recovery time objective)
This is where you should spend a good deal of time developing an intimate understanding of how your business makes money and document the key resources and processes necessary to enable revenue generation. In some companies, this may be a single business process that will translate into clear guidance on what RTO/RPO requirements are. In larger companies, with diversified products and services, you will likely document multiple processes. Likewise there are certain businesses that can tolerate a 24 to 48 hour RPO and RTO; for others, data loss and system downtime is NOT an option and near-zero RTOs and RPOs are required.
3. Understand Your Budget
Virtualized DR is much less expensive than traditional physical DR, but it is still an additional cost on top of your existing investment in virtualization. All IT managers have to be very aware of the budget limitations and the factors contributing to that limitation. Knowing your budget limit and some key strategies will allow you to optimize your DR plan to get the most capabilities out of your limited resources.
One recommendation is to consolidate all disaster recovery options inside a single product so that you’re not paying two for two software products, two backup infrastructures and the associated operational costs that can sink you. Another suggestion is proposing a phased DR assessment budget. This method builds trust with the business or customer and ensures them that you are not making these recommendations on a whim and that the DR plan is exercising due diligence.
Just like your initial server virtualization, you may have to spend money to save money in the case of virtualized DR. Being able to communicate the total cost of ownership (TCO) savings and benefits of a legacy-free disaster recovery system is key to successfully obtaining the funding you need to realize the full benefits. A good rule of thumb is to shoot for the plan that can create the lowest TCO over the next 3 to 5 years.
4. Understand Application Dependencies
When creating your DR plan, it’s crucial to understand dependencies of core applications. Does an application depend on an external database on another virtual machine? What restore order is needed to test the application? And even today, we still need to be concerned about what servers are still physical. These dependencies must be documented and planned for. No matter how well you protect the critical VMs, if you forget the weakest link in the dependency chain, you might as well have not protected any systems.
The core network infrastructure services needed for DR include:
- IP address assignment (DHCP, etc.)—Local to each site and required for any communication on the network
- DNS—Ensures that servers and PCs understand how to reach each other as well as Internet resources
- Active Directory—Provides directory service to secure access to recovered systems
These services can be provided in a number of ways, but in the majority of small business and mid-sized companies, we are most likely talking about Microsoft Windows Server VMs running Active Directory. If Active Directory servers are restored incorrectly, you will waste precious hours conducting manual recovery steps to restore function to this critical service.
Mission critical applications (e.g., order entry, manufacturing, shipping/receiving) are the lifeblood of the company and consist of application server VMs as well as VMs containing persistent data like databases or file system objects. With traditional legacy backups this can be a terrible chore to document and remain current on all the specific files and folders inside a VM that needs to be backed up, especially when application upgrades may create new files and folders not included in the original backup configuration. Focusing on entire VMs for a virtualized DR plan avoids focus on the micro-management of the data protection, but rather on identifying the higher level service components that need to be protected and restored.
5. Automate VMware Environment Data Collection
Virtual Machine inventory automation, can be invaluable to designing your DR plan, in that it quickly and accurately collects critical information on the virtual environment. Whether you have 25 VMs or 2,500, automating the collection of VM information can jumpstart you on the way to accurately designing your DR plan. There are a multitude of tools available to assist with this data collection task. Since we’re planning for a virtualized DR plan, we favor leveraging virtualization management software to help ease this task.
1. Virtualize the Stragglers
You probably have a large portion of your environment virtualized, but it’s likely that you may have some business critical systems that remain on physical systems because of their importance. Unfortunately, placing your business critical systems on physical servers because they’re important is a mistake. If you still have physical servers, it’s time to make the switch. The benefits of virtualized DR are well known and have been written about and practiced for more than 5 years. No matter how good your DR solution for physical servers is, it can’t come close to approaching the capabilities and cost of a virtualized DR.
2. Analyze Resource Requirements
DR planning is about preparing for the loss of your primary datacenter. This means you need to size and budget for a recovery site capable of meeting your requirements in a disaster scenario. During the assessment phase of our DR planning, automating the collection of some key virtual infrastructure inventory as well as resource utilization statistics is recommended. It’s now time to use that information to properly size your DR solution.
3. Design for Easiest Restore
A good DR plan is designed for the easiest restore possible. In a disaster situation, there can be a lot of confusion, different environments, and possibly a different or missing workforce. A disaster is not the time to have a multitude of complex, manual steps to follow while under the stress of knowing your business’ ability to pay your next paycheck may hang in the balance. Nor can you count on having your best DR expert available to coordinate the recovery. Being successful in this situation will require your skeleton DR staff to have plenty of DR exercises under their belt and have the simplest restore procedures possible.
4. Decide on Infrastructure Configuration
After all the interviews, data collection and analysis, you’ll eventually have to make some decisions. You’ll have to decide on a final configuration for hardware, software, off-site data transport method, and recovery processes. There is no one right way to do this but armed with business requirements for recovery, application dependencies, and your budget guidelines, you should have enough information to navigate the decision making processes.
5. Test-Drive the DR Plan
In the software development world, a popular software development process is test-driven development (TDD). In TDD, developers first create automated unit tests that will only pass successfully if the new piece of code under development fulfills all criteria. By writing the test first and then developing the code, quality is improved since code can’t be released until the unit tests pass successfully. This process has proven very successful for software developers, and a derivation of the process can now be applied to virtualized DR. This derivation can be called test-driven DR and it turns traditional DR planning on its head by planning for the restore and verification of the restore before you plan to do your first backup.
VM Recovery Steps Comparison
Legacy Recovery Method
- Provision empty VM
- Reinstall a fresh system at the DR site
- Patch the OS
- Install application binaries and other dependencies
- Install backup agent and then proceed with restore of unverified application data
- Configure application to work with recovered data
- VM is ready to be restarted and application verified for first time
Whole-VMR Recovery OR with Veeam (or similar system)
- Restore whole-VM
- Power-on VM that was pre-verified with SureBackup (or similar)
Note: Replication would be a single step to power on the VM.
The impact of data loss or corruption on your organization—whether from hardware failure, human error, hacking or malware—could be devastating. This means that having a plan for data backup and restoration of electronic information is essential. Even when guidelines like the ones discussed above are used to develop a DR strategy, a finalized disaster recovery plan is unique to each individual organization. Talk to the virtualization experts at Advanced Network Systems about putting the right technical resources in place to ensure the survival of your critical information and the operation of your business even when disaster strikes.