Identity Resolution is a process whereby relationships are discovered between disparate records about identities and then resolved into a single entity. It is a challenging process simply because the relationships are not always obvious and the data used to detect these relationships are ambiguous or prone to error and variation. It is critical for risk management, public safety, fraud detection, but also increasingly useful in Customer Data Integration (CDI) and Master Data Management (MDM) applications.
Intelligent Search Technology's (IST) NameSearch product is a powerful identity resolution engine that enables you to find, link, search, match and group identities. This sophisticated software most effectively balances the competing requirements of increasing the quality of your searches while minimizing I/O expense and performance constraints.
One important aspect of NameSearch is the intelligent key and search range building algorithms. This facility is used for the indexing & retrieval of records regardless of variation caused by:
- Transcription or keyboarding errors
- Short forms
- Missing words
- Extra words
- Noise and sequence variations
A majority of names are represented by fewer words than you would imagine. What is more interesting is that there are actually more words that make up a portion of names that are considered uncommon. This is what causes distortion in name distributions. This is seen most dramatically by analyzing names in the US.
- 65% of the population in the US has 1 out of 400 first names.
- 35% of the population has 1 out of 300 surnames.
These are numbers taken from an analysis sample of over 3.2 million
first names and 2.5 million last names!
Complicating the problems associated with the overabundance of similarities of names, are the variations due to the types of information stored in databases and the name frequency characteristics in different geographical locations. Not to mention the number of identities that are used by one person alone, e.g. maiden names, aliases, professional names, nicknames, etc.
Traditional solutions for solving name variations only deal with phonetic errors. These solutions involved the standardization of easily confused sounds. For example, "PH"'s would be treated as "F"'s. Elaborate linguistic rules were generated to phonetically tokenize a name. These phonetically tokenized words served as the basis for name retrieval. In some instances these rules helped find names which were hard to spell, unfortunately, the distribution pattern of common names became even more distorted.
Discrepancies caused by phonetic errors account for 20-25% of all name variations. IST addresses problems due to phonetics by employing analysis routines to determine when phonetic tokenization should be applied. This enables NameSearch to overcome problems due to phonetics without the negative consequences incurred by other methods of name searching (i.e. wildcard, text searching, N-gram indexing or Soundex alone). No single algorithm or name matching system is the best for ALL naming data.
Instead, NameSearch leverages the best techniques including phonetics, transliteration, deterministic and probabilistic algorithms — prepackaged with an extensive pre-defined rule set. These rules can be used right out of the box or modified to meet your specific needs. This is done through the NameSearch Generation Shell.
The Generation Shell is a Graphical User Interface (GUI) designed for the modification and tuning of your NameSearch subroutines. The Shell allows you to adjust frequency and rule base tables, set various parameters, modify key building routines and test changes.
Rule based Expertise:
Rule based expertise solves many classes of problems associated with
name variations found in data. Names like Bill, William, Bob and
Robert are used interchangeable to identify individuals, as well as
numerous other nickname selections. The rule based expertise of the
NameSearch Software can
solve this, for example.
NameSearch rule base is also
used to identify noise words. Noise words are elements in a name which
do not help in the identification of a candidate. Examples of noise
words are "Incorporated", "Corporation",
"Limited", "Junior", "Senior",
"Avenue" and "Street". Often there are times where
elements in a name contribute to the identity but should be treated as
less important. In these cases, the rule base does not treat them as
noise words but recognizes that they are less significant. Some
examples are "associate", "board",
"international" and "services".
The rule base also contains rules for handling common prefixes. For
example, names like McDaniel are frequently confused with MacDaniel.
Prefix recognition provides the function for handling this with ease.
Another feature of the rule base is diminutive recognition. There are
plenty of names which end in
a diminutive such as "ie" or "y". In these cases, it is useful to identify the root and apply the before mentioned rule. For example, you would want Bill, Billie and Billy to find William or Willie.
The NameSearch software, in addition to key building and rule bases, comes with advanced comparison routines. These functions use the strength of the key building routines to intelligently calculate numeric values indicating the likelihood of a match.
These comparison routines can be used for the elimination of
candidates from an on-line system,
providing the ability to tailor the information being displayed. This is especially useful for systems containing more than 10 million records.
- In addition, the comparison routines form the basis behind batch utilities, such as a merge/purge application. These comparison routines enable systems to make decisions without human intervention.
NameSearch integrates various strands of knowledge to form a cohesive fabric enabling successful retrieval of records based on a name and/or addresses. By incorporating rules on common prefixes, suffixes, nicknames, noise words and other similar classes of variations, combined with Intelligent Search Technology's powerful algorithms and it's user-friendly Generation Shell, the complexities of Identity Resolution are made easy!
>> Page up <<