Introduction
First (and possibly most important) I would like to say that all of the software for getting this going was free of charge, even for commercial use as far as I am aware. VMWare's logic is that after deploying you will want to either use their advanced features (big £) or you will need support (big £). I'd say it has worked becase I love this stuff now and the odds are high that we will reach a point in the next few years where we decide to upgrade for more features. Either way you can play with VMWare for free and see if it works for you, even go live with production servers without paying a penny (appart from Hardware).
Update Feb 2010: Yep, just purchased a 3 year VMWare Essentials bundle. Guess it worked!
Whilst we have been doing daily backups to tape of our database systems for ages (as all good IT people should!) there are two problems with this.
The backups are done nightly. If something happened at 6pm then all of that days work would be lost.
The backups only cover the raw data. If the server died then it would take quite a while to reinstall all of the systems and settings.
When a server became available recently I decided to push through with backups as virtual machines. This server was suitably powerfull (Quad core 3GHz CPU with 8GB of ram (since upgraded to 16GB _just in case_ )
My plan for the first phase was to run backup servers in the VMWare environment. Any time I wanted to take a copy offsite I could then copy the virtual machine from VMWare to some other media and go. To restore I could just copy the files to any VMWare server and power it up. data would only be up to the point I did the backup but I could then use the nightly tape backups to bring it more up to date.
The second phase was to start replicating changes automatically from the live database server to the backup database server. This would give me an 99.9% up to date copy of the DB system ready to become live if the live database server died. (Please remember this is not a proper backup - if the live system deletes half its customer records the replication will do this on teh "backup" system too...)
Third phase was moving the live servers into VMWare once I was happy that the performance of the backups was adequate. Quite possibly this would just involve migrating from existing hardware to existing hardware running VMware, with an iSCSI drive cage for files. I know the hardware is powerful enough now so it will be (with minor VMWare overheads) then. Also this opens up options with VMWare infrastructure like vmotion (moving between physical servers) which would make upgrades etc easier, failover etc. All this is great stuff but expensive and I'm doing this one step at a time. The things I can use it for will keep improving and changing so I'm not going to spend too much time planning now. :)
Installing VMWare
First since its all free you might as well get the latest version of VMWare ESXi from the vmware website. Also get the licence key as otherwise the system will expire after 60 days. With the free key its an unlimited licence (in terms of time - some reporting and remote connection features are limited)
When you install VMWare ESXi it will completly overwrite the hard drive of the server. The install itself is fairly easy as long as the hardware is supported. If not then you will probably be in trouble. There are lists of known working hardware available online, but many systems not on the compatible list also work.
When its installed all you get is a yellow and grey screen with very few options. Use these screens to setup the root password, network and make a note of the IP address. Browse to this ip in a web browser and you will get a link to download the VSphere client which is how you manage everything.
Install the client on any Windows desktop and run it to connect.
Deploying to VMWare
There were basically two options for creating systems on VMWare.
First and most obvious in install from scratch in a virtual machine. This may well be the best solution if you have a slightly old install anyway - you don't want to take all the old temp files etc over if you can avoid it.
Convert an existing server to a virtual server. The VMWare convertor runs suprisingly well and you end up with a clone of the physical machine which can run inside VMWare. It even sorts out new partitions, drivers etc.
Since the main use of this VMWare server was as a backup server and I did not really want to pay for RedHat support for backup machines I decided to create these servers running CentOS as it is binary compatible to RedHat. The servers installed flawlessly after I had remembered to enable the CDRom drive and uploaded the CentOS install ISO to the datastore (you can also use local ISO files on your desktop or you local physical CD rom drive but the datastore is a bit faster and I wanted a copy available for next time). Apart from the fact that it was all happening in a window it was exactly the same as on a real computer.
For the convert I was going from a RedHat ES version 5.1 server. It took a while but the new system worked on first boot with the exception of networking - because the Mac address had changed Redhat setup a new default config and it was trying to use dhcp. As RedHat keeps the old config it was easy enough to copy the old settings back in and restart networking.
Backing up
The easiest way to backup a virtual machine is to turn it off, then use the datastore browser to download all the files before turning it back on again. This gives you a clean full backup with no complications. However this is not easy to automate.
Note: If you have a thin disc then the datastore will show the space used, but when you download it will convert to a "fat" disk. EG a 200GB thin disk that shows as 20GB in the datastore will become 200GB as it is downloaded. The datastore will still have the 20GB thin disk but it will take ages to download.
I wanted backups with as little downtime as possible. I know of the following ways to achieve this:
Buy the extended features/3rd party programs
Use snapshots whilst the machine is shut down
Use snapshots whilst the machine is running (needs more drive space as the entire memory seems to be dumped to disk even if only a small amount is actually used. Also takes longer to create snapshots because of this)
Backup Snapshots with a virtual machine off
The basic idea here is to turn off the virtual machine, take a snapshot and then turn the machine on. This should only take a few minutes and you can then copy the pre-snapshot fields at your leasure. It is assumed here that you do not actually have any pre existing snapshots. If you do then it is still possible but I'll leave working out the differences to the reader. Running with snapshots long term is generally a bad idea as it slows things down a bit (system has to check the snapshot then the main file for things) and can use lots more disk space depending on how many fiels change in the VM.
- Turn off virtual machine
- Backup the <vmname>.vmx file
- Take a snapshot
- restart virtual machine
- Backup the <vmname>.vmdk file
- Place all the backed up files in a new directory
- Add the VM to the inventory (right click on the .vmx file in the datastore)
This works because the machine is off when you create the snapshot and therefore the snapshot file (<vmname>-000001.vmdk) contains everything that happens AFTER you turn it on. Because the <vmname>.vmdk is no longer being written to it is possible to copy it without corruption. These are the only two files you need, all the rest are generated on the fly. By doing it this way downtime is small as you can turn the system back on before starting to copy the .vmdk hard disk file (the big one).
Backup Snapshots with virtual machines on (gulp)
If zero downtime is your aim (its what I'd like) then you should really buy the extended features and do this properly. My current practice is as follows.
- Make a snapshot with memory (this is what we will restore to later)
- Backup the vmx file (virtual machine config file)
- Backup the vmsd file (Snapshot index file)
- Make another snapshot without memory (only needed to stop the system writing to first snapshot with memory)
- Backup the <vmname>.vmdk
- Backup the <vmname>-000001.vmdk (post first snapshot file - not used but needed to have something to roll back from)
- Backup the <vmname>-Snapshot<n>.vmsn (first memory snapshot so the lower of the two values for <n> - its also a lot bigger (size of memory rather than a few kB) )
- Add the VM to the inventory (right click on the .vmx file in the datastore)
- Select the VM and choose to rollback with the snapshots.
The virtual machine will be magically on in exactly the state it was in when you took the first snapshot! You do not even need to power it up as when the snapshot was taken it was powered on, so when restored it is still powered on. Apart from a computer which is possibly confused about how the time has changed so much its all good to go :)
Note that this is NOT the way that VMware expects you to do backups, involves a ot of fiddly steps and is liable to be broken by an update from VMware any day. If you can avoid it try to find another way to do this - I like it because the systems I'm working with are not essential and I can't justify the budget to do this properly yet. As soon as I can I'm buying an off the shelf program to do this.