PDF Download: https://senzing.com/principle-based-ER
Abstract:
Senzing® software uses a unique principle-based approach to entity resolution that eliminates the need for pre-training, tuning, or experts. This general-purpose method can be used across a nearly unlimited range of entity types, such as people, organizations, vessels and vehicles.
Principles are just one of the reasons our software is so easy to deploy and use, while delivering higher quality and more accurate results than other entity resolution methods.
This document provides details about how principle-based entity resolution works. It explains the concept of
using principles to perform entity resolution, how principles differ from traditional rules, why principles make the
entity resolution process more efficient, and how they deliver more precise and accurate results.
What is Principle-Based Entity Resolution and How is it Better?
Principles are a special form of generalized knowledge that draw on common truths or assumptions. The differences between the rules used by some entity resolution methods and the principles used in Senzing
software are distinct. Here’s an example:
- You tell your child to quit throwing rocks at cars, which is a rule. The next day you find him throwing baseballs at SUVs and have to tell him not to do that too, another rule. A few days later, you have to tell him not to throw golf balls at trucks, fire engines, and ambulances, more rules. Instead of all these rules, why not one simple principle: Don’t throw things at other people’s stuff?
In Senzing software, principles are based on the expected behaviors of entity attributes, e.g., names, addresses, and identifiers. For example, social security numbers (SSNs) typically point to only one person, but dates of birth (DOBs) behave differently, as many people share the same DOB. There are always exceptions. These exceptions are learned in real-time, as new data is received. For example, when multiple people are using the same SSN, our software detects it, labels that SSN as generic and reevaluates all prior records with that number.
Our principle-based method assigns three behaviors to each entity attribute:
- Frequency – does one, few, many, or very many entities generally share the same value, e.g., an SSN is commonly used by one entity, an address is shared by a few, and a DOB is shared by many?
- Exclusivity – does an entity typically have just one such value, e.g., an entity should have only one SSN or DOB, or is the value non-exclusive, e.g., an entity can have more than one credit card number?
- Stability – is this an exclusive value that is generally constant over an entity’s lifetime, e.g., an SSN and DOB are typically stable over a lifetime, or does it typically change, e.g., home address?
The software comes preconfigured with the common attributes and expected behaviors of people and organizations, see Table 1 for some examples. You can start loading and resolving entities without any configuration, training, or tuning. If you need to add a new attribute, such as an additional identifier, just add the name of the attribute and assign its three behaviors.
Comments
0 comments
Please sign in to leave a comment.