Entity Resolution
Following a few decades of experience building some of the largest commercially available and real time entity resolution (ER) systems in existence, a ground-up skunkworks effort and project began within IBM during 2009.
Drawing upon a combination of best practices we picked up over the years and exciting new breakthroughs - Senzing entity resolution is unlike any other entity resolution technology on the market.
The following outlines a few unique features and capabilities.
Computationally Efficient Self-learning, Self-tuning
As Senzing accumulates additional context/data it becomes smarter. Thus, Senzing ER results and outcomes improve over time. Furthermore, when Senzing makes a specific discovery (e.g., (800) 555-1212 is a common phone number), not only does Senzing take this into account in the future, Senzing re-evaluates all previous assertions involving this phone number. The ability to use new observations to reverse earlier assertions (also referred to as “Sequence Neutrality”) in real time and at scale, is non-trivial.
Business Value: This is exceptionally important for two reasons:
- Experts are no longer required to train and tune the ER system, significantly reducing time, cost and experience.
- Automatically fixing the past as new discoveries are made updates the system in real time like no other ER system is capable of doing. While traditional ER methods (without Sequence Neutrality) have to re-train and re-load on a periodic basis - correcting for “error drift” in the data - Senzing self-corrects historically false positives and false negatives as additional data records are continuously processed.
Related Information: Smart Systems Flip-Flop
Entity Centric Learning
Entity resolution benefits from all of the data features contained within a resolved entity, this is in contrast to traditional 1:1 record matching methods and techniques.
Business Value: Required to effectively perform ER in 'weak signal' environments. Examples of weak signal environments include: professional criminals obfuscating their identities, low fidelity data sources, seemingly incompatible data sources.
Relationship Resolution
Senzing supports both disclosed and derived relationships. When taken together, these relationships allow Senzing to fully capture and manage the entire resolved entity graph. Disclosed relationships are those known to exist between entities that are reported as observations to Senzing (e.g., family members, employment affiliations, and business hierarchies). Derived relationships are discovered in real-time as new data records are ingested and analyzed (e.g., two distinct entities sharing a physical address could be roommates, twins, etc).
Business Value: Comprehending relationships between resolved entities improves context – with this added context comes higher quality business decisions. For example, if a loan application contains a personal reference who is roommates with a known criminal, this might warrant further investigation before finalizing a credit decision.
Real Time Transactional Entity Resolution
Senzing delivers low-latency, high transaction rate, entity and relationship resolution.
Business Value: Real-time, entity resolved and graphed data allows Senzing to be deployed into operational and mission critical systems. Transactional entity resolution incrementally integrates new observations in real-time eliminating the need to periodically refresh the entire data store. This has huge performance implications: there is a big difference between integrating the latest 10k additions or changes transactionally (each using insignificant compute) versus a batch-based system that must reprocess the entire data set to integrate the latest 10k transactions.
Multi-Cultural Names
Senzing integrates IBM’s Global Name Management (GNM) technology for culturally-aware and linguistically sensitive name comparisons. GNM is the only government certified name comparison algorithm on the market.
Business Value: Leveraging the world-class GNM library, Senzing is able to compare names for similarity across cultures and scripts – an essential feature that delivers higher quality entity resolution.
Related Information: IBM InfoSphere Global Name Management
Selective Anonymization - A Privacy by Design (PbD) Feature
Senzing provides the ability to anonymize entity attributes (including geospatial data) at the system of record, before data is transferred to Senzing. (Note: Senzing software is not a cloud service. You download Senzing software and either deploy it on-premise or in your private cloud. No personal data ever flows to Senzing). The ability to perform high quality (fuzzy) entity and relationship resolution using only these anonymized features allows for data sharing and insight in situations and use cases where the release and sharing of sensitive data was previously impossible.
Business Value: This technique increases the willingness of data owners to participate in information sharing ecosystems because this differentiating technique greatly reduces the risk of unintended disclosure.
Related Information: To Anonymize or Not Anonymize, That is the Question
Principle-Based Entity Resolution
A breakthrough technique that blends fuzzy matching, deterministic and probabilistic methods into a high-performance, self-tuning, self-correcting, entity agnostic (e.g., people, organizations, cars, boats, planes) entity resolution and relationship discovery engine. No longer does entity resolution require experts, training data and manual tuning.
Business Value: No experts and no training means users can start loading observations (data) almost immediately, significantly decreasing time to value. Add new data sources, new kinds of entity classes, and new entity features (attributes) with ease without experts or preliminary training steps.
Related Information: Principle-based Entity Resolution and 10,000 Rules, Oh the Joy!
In closing, the Senzing technology does much, much more.
With a suite of Privacy by Design (PbD) features built in, feedback loop support, an underlying schema designed for linear scale on elastic compute infrastructures, Senzing is unique.
For additional information on the topics in this article please review the Uniquely Senzing white paper.
Comments
0 comments
Please sign in to leave a comment.