Backups have two major purposes:
To permit restoration of individual files
To permit wholesale restoration of entire filesystems
The first purpose is: a user accidentally deletes a file and asks that it be restored from the latest backup. The second situation is a system administrator’s worst nightmare: for whatever reason, the system administrator is staring at hardware that used to be a productive part of the data center.
Different data & different backup needs
The pace at which data changes is crucial to the backup procedure. There are two reasons for this:
A backup is nothing more than a snapshot of the data being backed up. It is a reflection of that data at a particular moment in time.
Data that changes in frequently can be backed up infrequently, while data that changes often must be backed up more frequently. System administrators that have a good understanding of their systems, users, and applications should be able to quickly group the data on their systems into different categories.
However, here are some examples to get you started:
- Operating System
This data normally only changes during upgrades, the installation of bug fixes, and any site-specific modifications
- Application Software
This data changes whenever applications are installed, upgraded, or removed.
- Application Data
This data changes as frequently as the associated applications are run. Depending on the specific application and your organization, this could mean that changes take place second-by-second or once at the end of each fiscal year.
- User Data
This data changes according to the usage patterns of your user community. In most organizations, this means that changes take place all the time.
Based on these categories (and any additional ones that are specific to your organization), you should have a pretty good idea concerning the nature of the backups that are needed to protect your data.
Types of Backups
If you were to ask a person that was not familiar with computer backups, most would think that a backup was just an identical copy of all the data on the computer.
In other words, if a backup was created Tuesday evening, and nothing changed on the computer all day Wednesday, the backup created Wednesday evening would be identical to the one created on Tuesday.
While it is possible to configure backups in this way, it is likely that you would not. To understand more about this, we must first understand the different types of backups that can be created.
The type of backup that was discussed at the beginning of this section is known as a full backup. A full backup is a backup where every single file is written to the backup media.
As noted above, if the data being backed up never changes, every full backup being created will be the same.
That similarity is due to the fact that a full backup does not check to see if a file has changed since the last backup; it blindly writes everything to the backup media whether it has been modified or not.
This is the reason why full backups are not done all the time — every file is written to the backup media. This means that a great deal of backup media is used even if nothing has changed.
Backing up 100 gigabytes of data each night when maybe 10 megabytes worth of data has changed is not a sound approach; that is why incremental backups were created.
Unlike full backups, incremental backups first look to see whether a file’s modification time is more recent than its last backup time.
If it is not, the file has not been modified since the last backup and can be skipped this time. On the other hand, if the modification date is more recent than the last backup date, the file has been modified and should be backed up.
Incremental backups are used in conjunction with a regularly-occurring full backup (for example, a weekly full backup, with daily incrementals).
The primary advantage gained by using incremental backups is that the incremental backups run more quickly than full backups. The primary disadvantage to incremental backups is that restoring any given file may mean going through one or more incremental backups until the file is found.
When restoring a complete file system, it is necessary to restore the last full backup and every subsequent incremental backup.
In an attempt to alleviate the need to go through every incremental backup, a slightly different approach was implemented. This is known as the differential backup.
Differential backups are similar to incremental backups in that both backup only modified files. However, differential backups are cumulative — in other words, with a differential backup, once a file has been modified it continues to be included in all subsequent differential backups (until the next, full backup, of course).
This means that each differential backup contains all the files modified since the last full backup, making it possible to perform a complete restoration with only the last full backup and the last differential backup.
Like the backup strategy used with incremental backups, differential backups normally follow the same approach: a single periodic full backup followed by more frequent differential backups.
The effect of using differential backups in this way is that the differential backups tend to grow a bit over time (assuming different files are modified over the time between full backups).
This places differential backups somewhere between incremental backups and full backups in terms of backup media utilization and backup speed, while often providing faster single-file and complete restorations (due to fewer backups to search/restore).
Given these characteristics, differential backups are worth careful consideration.
At one time, tape devices were the only removable media devices that could reasonably be used for backup purposes. However, this has changed.
Tape was the first widely-used removable data storage medium. It has the benefits of low media cost and reasonably-good storage capacity. However, tape has some disadvantages — it is subject to wear, and data access on tape is sequential in nature. These factors mean that it is necessary to keep track of tape usage (retiring tapes once they have reached the end of their useful life), and that searching for a specific file on tape can be a lengthy proposition.
In years past, disk drives would never have been used as a backup medium. However, storage prices have dropped to the point where, in some cases, using disk drives for backup storage does make sense. The primary reason for using disk drives as a backup medium would be speed. There is no faster mass storage medium available. Speed can be a critical factor when your data center’s backup window is short, and the amount of data to be backed up is large.
By itself, a network cannot act as backup media. But combined with mass storage technologies, it can serve quite well. For instance, by combining a high-speed network link to a remote data center containing large amounts of disk storage, suddenly the disadvantages about backing up to disks mentioned earlier are no longer disadvantages. By backing up over the network, the disk drives are already off-site, so there is no need for transporting fragile disk drives anywhere. With sufficient network bandwidth, the speed advantage you can get from backing up to disk drives is maintained.