One of the most talked-about areas of data protection is "data de-duplication." Data de-duplication is an emerging technology that might play a major role in a broad range of applications for protecting and retaining data, including backup and recovery, long term archiving, continuous data protection, and secure retention for compliance. It also offers advantages for applications that benefit from efficient data transmission, including remote replication and wide-area network optimization.
The key to de-duplication is to transmit only data that has changed since the last backup called an incremental image. This contrasts with the traditional model of backing up all of the data from every site on a weekly or daily basis, a task that is no longer cost-effective or efficient for many organizations desperate to cut down the amount of information being stored.
Analysts are suggesting that de-duplication will be more and more on the data protection radar screen; for more reasons than one. Not only is it supposed to be an option for reducing redundancy on primary storage, but as an enabler of various WAN features. In terms of backup, goes one step beyond the more traditional file-based incremental backup. Ideally, a de-duped incremental backup is supposed to minimize backup traffic by copying only changed blocks after each full backup. Various approaches can be taken to de-duplication: operation before the data is written to storage, during the process of writing to storage, and even afterwards in some cases. In fact, it is possible to use multiple de-duped incremental backup in a series, a strategy I’ve used for several years now, allowing the user to keep sector-based changes to their disk.
This technology, on its face, has the potential for being a money saver. But some companies suggest that de-duplication may be more shadow than substance, costing more than it saves and requiring excessive performance hits. Tip: Don't run for the latest, see first how it would fit in your installation. Another tip: If a vendor does not offer this function and is trying to steer you away from it, ask why. This is not bleeding-edge technology; it has a history of successes. Make sure your storage vendor isn't just trying to steer you into products they sell rather than products that solve your requirements.