Entity Resolution 2.0
Following a few decades of experience building some of the largest commercially available and real time Entity Resolution systems in existence, a ground-up skunkworks effort and project began within IBM during 2009.
Drawing upon a combination of best practices we have picked up over the years and exciting new breakthroughs what we have today is unlike any other Entity Resolution technology on the market - G2 is nothing short of Entity Resolution 2.0.
The following outlines a few of G2's unique features and capabilities.
Computationally Efficient Self-learning & Self-tuning
As G2 accumulates additional context it becomes smarter. Thus, Entity Resolution results and outcomes improve over time. Furthermore, when G2 makes a specific discovery (e.g., (800) 555-1212 is a common phone number ), not only does G2 take this into account in the future, G2 re-evaluates all previous assertions involving this phone number. The ability to use new observations to reverse earlier assertions (also referred to as “Sequence Neutrality”) in real time and at scale, is non-trivial.
Business Value: This is exceptionally important for two reasons:
- Experts are no longer required to train and tune the Entity Resolution system, lowering the barrier to entry is significantly reduced in terms of time, cost and experience.
- Automatically fixing the past as new discoveries are made updates the system in real time like no other Entity Resolution system is capable of doing. While traditional Entity Resolution methods (without G2's Sequence Neutrality) have to re-train and re-load on a periodic basis - correcting for “error drift” in the data - G2 self-corrects historically false positives and false negatives as additional data records are continuously processed.
Related Information: Smart Systems Flip-Flop
Entity Centric Learning
Entity Resolution benefits from all of the data features contained within a resolved entity, this is in contrast to traditional 1:1 record matching methods and techniques.
Business Value: Required to effectively perform Entity Resolution in 'weak signal' environments. Examples of weak signal environments include: professional adversaries attempt to obfuscate their identities, low fidelity data sources, seemingly incompatible data sources.
G2 supports both disclosed and derived relationships. When taken together, these relationships allow G2 to fully capture and manage the entire resolved entity graph. Disclosed relationships are those known to exist between entities that are reported as observations to G2 (e.g., family members, employment affiliations, and business hierarchies). Derived relationships are discovered at run-time by G2 as new data records are ingested and analyzed (e.g., two distinct entities sharing a physical address could be roommates, twins, etc).
Business Value: Comprehending relationships between resolved entities improves context – with this added context comes higher quality business decisions. For example, if a loan application contains a personal reference who is roommates with Billy the Kid, this might warrant further investigation before finalizing a credit decision.
Real Time Transactional Entity Resolution
G2 delivers low-latency, high transaction rate, Entity and Relationship Resolution.
Business Value: Real-time, entity resolved and graphed data allows G2 to be deployed into operational and mission critical systems. Transactional Entity Resolution incrementally integrates new observations in real time eliminating the need to periodically refresh the entire data store. This has huge performance implications: there is a big difference between integrating the latest 10k additions or changes transactionally (each using insignificant compute) versus a batch based system that must reprocess the entire data set (re-boil the ocean) to integrate the latest 10k transactions.
G2 integrates IBM’s Global Name Management (GNM) technology for culturally-aware and linguistically sensitive name comparisons. GNM is the only government certified name comparison algorithm on the market.
Business Value: Leveraging the world-class GNM library, G2 is able to compare names for similarity across cultures and scripts – an essential feature that delivers higher quality Entity Resolution.
Related Information: IBM InfoSphere Global Name Management
Selective Anonymization - A Privacy by Design (PbD) feature
G2 provides the ability to anonymize entity attributes (including geospatial data) at the system of record, before data is transferred to G2 for Entity Resolution. The ability to perform high quality (fuzzy) Entity and Relationship Resolution using only these anonymized features allows for data sharing and insight in situations and use cases where the release and sharing of sensitive data was previously impossible.
Business Value: This technique increases the willingness of data owners to participate in information sharing ecosystems because this differentiating technique greatly reduces the risk of unintended disclosure.
Related Information: To Anonymize or Not Anonymize, That is the Question
Principle-Based Entity Resolution
A breakthrough technique that blends fuzzy matching, deterministic and probabilistic methods into a high-performance, self-tuning, self-correcting, entity agnostic (e.g., people, organizations, cars, boats, planes) Entity Resolution and Relationship discovery engine. No longer does Entity Resolution require experts, training data and manual tuning.
Business Value: No experts and no training means users can start loading observations (data) almost immediately, significantly decreasing time to value. Add new data sources, new kinds of entity classes, and new entity features (attributes) with ease without experts or preliminary training steps.
Related Information: Principle-based Entity Resolution
G2 technology does much, much more.
With a suite of Privacy by Design (PbD) features, feedback loop support, an underlying schema designed for linear scale on elastic compute infrastructures, G2 is unique and we're confident a game changer.