PDF Download: https://senzing.com/er-capabilities
Abstract:
This technical overview provides a list of important capabilities to consider when evaluating whether to buy or build enterprise-grade entity resolution (ER) technology. While many of these capabilities are obvious, some are often overlooked.
Ease of Use
ER technology has historically been difficult to build and use. This was partially due to the fact that a diverse range of expertise was required, including knowledge of statistics, linguistics and performance engineering to name a few. With easy-to-use ER, organizations of all sizes – not just the elite with huge budgets – can affordably evaluate and deploy ER.
Senzing uses a radically different approach to ER that delivers unmatched ease of use and lightning fast ROI.
The technology is a result of a 2009 IBM skunkworks initiative, code-named G2, that was subsequently spun-out of IBM in 2016 to form Senzing.
Accuracy
The number of false positives and false negatives and their combined F1 score defines the accuracy of ER systems. Robustness of a system’s name, address, and other feature comparison techniques and the underlying matching method are two of the main contributors to ER accuracy.
When used with real data, Senzing routinely produces more accurate results than humans. Senzing is so accurate,
organizations can also use it to quickly audit the accuracy of their existing ER algorithms.
Real Time
Many organizations benefit from batch-based ER today but recognize they will need real-time ER in the future to
remain competitive. Transforming a homegrown batch-based ER system into a real-time system is not possible without significant reengineering. Since real-time systems also support batch data, we recommend choosing an ER system that is natively real time, even if real-time capabilities are not needed today. This will ensure readiness for real-time ER at a moment’s notice.
The engineering effort to create Senzing, the first real-time AI for ER, began with a one-year project to design its
underlying database schema. Senzing’s unique schema allows our real-time learning algorithms to deliver unmatched accuracy over billion-record systems with low latency
Privacy by Design
Privacy by design (PbD) is an approach to systems engineering initially developed by Ann Cavoukian. The framework was published in 2009 and adopted as a standard by the International Assembly of Privacy Commissioners and Data Protection Authorities in 2010. PbD calls for the consideration of privacy throughout the entire engineering process.
The Senzing team has been dedicated to PbD since inception. The underlying technology in Senzing (code-named G2) was first announced publicly in 2011 on Data Privacy Day.
Developer Focused
Commercially available ER has not historically focused on developers. Today, many developers, data engineers and data scientists are looking for ways to quickly add ER to their projects with easy to use, componentized technology.
Senzing is unique in that it makes the complicated task of ER easy for developers, whether it is deployed in the cloud or on-prem using Kubernetes or Docker, or on bare metal.
Operational Impact
Human resources are required to operate production ER deployments e.g., for database maintenance, onboarding new data sources, and technical support to users. Some ER capabilities reduce operational expenses while making ER more agile and responsive to shifting markets and enterprise innovation goals.
Commercially available ER has not historically focused on developers. Today, many developers, data engineers and data scientists are looking for ways to quickly add ER to their projects with easy to use, componentized technology. The operational cost of Senzing is dramatically less than other ER technologies. Sometimes operational cost savings alone justify the entire return on investment (ROI) for Senzing. An example is ERIC16, a nonprofit organization modernizing voter registration in America. ERIC has a Senzing ER system that contains more than 350M records representing two thirds of America’s voters in 30 states. Until recently, ERIC had an IT department of just one person managing Senzing and all other IT requirements. More details on ERIC can be found in the New York Times story Another Use for A.I.: Finding Millions of Unregistered Voters.
Relationship Awareness
Most entity resolution methods perform only basic ER to determine who is who. When a system also identifies
relationships, or who is relate to whom, the ER results are much more useful.
Senzing builds, persists, and manages relationships between entities in an entity-resolved graph in real time at scale. Relationships are available transactionally during streaming ER or ad hoc via the Senzing API.
Globalization
Data in an ER system rarely involves a single culture. It is essential, an ethical obligation, to be able to perform accurate ER over culturally-diverse data.
The Senzing team has been supporting culturally-aware ER for decades. Our focus is reflected in the accurate outcomes Senzing API delivers from culturally diverse data.
Conclusion
An organization may not require all of the ER capabilities, or the scale and performance, discussed above.
Yet all organizations benefit to some degree when such capabilities are available in an easy-to-use offering
at an affordable price.
At Senzing, we are making this possible. We have literally dedicated most of our lives to this mission.
We hope you enjoy our technology.
Comments
0 comments
Please sign in to leave a comment.