History and Functions of Deduplication


Posted December 14, 2012 by GiulyRotarry

Deduplication is a process that has become extremely popular among corporate houses and business enterprises these days because it saves immense time and effort by merging, detecting and eliminating duplicates in databases.

 
Deduplication is a process that has become extremely popular among corporate houses and business enterprises these days because it saves immense time and effort by merging, detecting and eliminating duplicates in databases. When one use a dedupe software, one can successfully and conveniently create databases that are free from repetitions. Therefore, such a software program enables users to have high quality and error-free databases, which is a prerequisite for any business project. Today, this process is uber popular, but have you ever wondered about how it all began?


The concept of record linkage deduplication was the brainchild of Mr. Halbert L. Dunn and the initial idea of pertaining to this concept came to his mind in the year 1946. He then proceeded to write an article called ‘Record Linkage’ where he explained his theories and ideas related to the concept. This article was later published in a journal which goes by the name of ‘American Journal of Public Health’. Even though the principles that the modern dedupe software programs follow are not derived from Dunn’s theory but the credit for coming up with the initial idea goes to him.


Thirteen years after Dunn published the very first idea of deduplication, a certain Howard Borden Newcombe improved upon the same. These he then proceeded to publish in an article titled ‘Science’. It is said that the foundations of the modern theory of record linkage used by dedupe software programs are the ones Newcombe came up with. This foundation was later formalized by two individuals named Alan Sunter and Ivan Fellegi ten years later, i.e. in 1969. ‘A theory for record linkage’ also known as the FS or Fellegi-Sunter theory, is the mathematical foundation for dedupe applications existing today.


Sometime in the late 1990s, a lot of machine learning techniques began to sprout up that enabled people to engage in deduplication. These techniques were used by machines for the purpose of estimating the duplicate entries by means of the FS theory’s proposed conditional probabilities. This was the era of hits and misses in case of dedupe software programs and while some were spot-on and very effective, others violated the algorithm suggested by the FS theory. The task of record linkage does not always have to be done on a computer but these were involved to ensure easy production of results.


People who have to deal with a huge amount of statistics are the ones who find themselves reaching out for the dedupe software in order to simplify their work. After all, engaging in record linkage is a task that is best done with the help of such a software program. The process of deduplication or record linkage offers multiple functions, which is the primary reason as to why researchers and administrators choose to use it. This kind of software program allows one to multitask and keep the databases manageable and well-organized, not to mention free from duplicate or identical entries.


One of the best uses for the process of deduplication is survey data, clinical data and social data. It performs the function of linking together the entries in order to produce new perspectives. Record linkage performs the primary function of maintaining a comprehensive database and without a dedupe software to help you in spotting, correcting or deleting duplicate variables, it would be impossible to handle. Moreover, considering the fact that it is impossible to check massive databases sans a dedupe program, these programs are typically used by companies for commercial purposes.


Prior to using a dedupe software http://www.dedupe-deduplication.com/deduplication-software.html package it is essential to learn what it is all about in terms of the manner in which the concept was formed and the way it evolved subsequently. It was as a part of deduplication http://www.dedupe-deduplication.com/dedupe.html that the concept of record linkage came into being and proved very useful in streamlining data.
-- END ---
Share Facebook Twitter
Print Friendly and PDF DisclaimerReport Abuse
Contact Email [email protected]
Issued By deduplication
Website http://www.dedupe-deduplication.com/deduplication-software.html
Country United Kingdom
Categories Computers
Tags dedupe software , deduplication
Last Updated December 14, 2012