The general guidance for completing a successful Senzing Proof of Concept (PoC) will be in-line with nearly any other PoC you undertake: understand the required technical and business goals, scope and define success to measure against these goals, ensure the correct and capable human and system resources are available, regular progress checks, etc.
This article will highlight key items of consideration for a successful PoC. It isn't a PoC project plan.
If you are considering evaluating Senzing, it's almost certain you have at least one use case and various goals you'd like to explore and aim to achieve. Some of the common goals and aspects we hear from our customers include the low entry cost of using Senzing, quality of results and outcomes, rapid return on investment (ROI), rapid deployment and time to value, ease of adding new data, the availability of the embeddable SDK or REST API and that Senzing never 'calls home' or needs to send data externally. With consideration to these aspects, nearly all evaluations of Senzing focus on the ease of adding data and the quality of the results. We will focus on those.
Senzing does not sell services, a consideration of the value of a Senzing evaluation - and subsequent deployment(s) - is your own resources lead the evaluation with Senzing support available as required. Your team will learn Senzing, if they are experienced with entity resolution systems they will rapidly realize using Senzing removes an order of magnitude of complexity for performing entity resolution.
Rightsizing for Success
Generally, the most important considerations to start with and understand are:
- What data is required to demonstrate success to the business? Where is the data, which sources of data are required, who owns the data and can it be used for the evaluation, how many records are required to satisfy success outcomes?
- What system resources are quickly and easily available to support the evaluation? Review the System Requirements, API Hardware Sizing Guide and Disk I/O Performance articles.
- What human resources and skills are available to support the evaluation and iterative analysis of the results?
If your evaluation use case is to ingest, entity resolve and analyze 1 billion records across multiple data sources, but all you have is one part-time person and a Windows VM, then there is a mismatch between expectations and resources. A part-time resource and a Windows VM would be great for 1 million records using the Senzing Desktop App but not a large scale Senzing API evaluation.
Selecting the right data
Typically you will be doing a subset of your overall data, both in the number of records and the number of sources. When selecting data it is important that the data will actually support matching.
- Don't randomly select data but instead pick everyone with a last name that starts with 'A' -- or some similar approach. This way there is a general expectation that you'll get multiple records for the same person.
- Pick data sources that both demonstrate the necessary business cases -- e.g. matching claims to customers -- as well as have at least a couple overlapping attributes -- e.g. name and address.
- Don't limit the number of features you send to Senzing. If one source has name, address, phone, etc and the other has name, phone, and date of birth make sure to send all the features even though they aren't in common across the sources.
- If your organization has some specific data cases that are important to evaluate, don't be afraid to mockup specific records to support demonstrating that in case the selected data doesn't happen to have examples. Feel free to look at the How to create an entity resolution truth set to inspire ideas.
It is important that the environment you chose for your PoC fits with your team's skills and capabilities. Some options to consider:
- Desktop App: This is limited to the default Senzing configuration (Senzing on 'full autopilot') and 1 or 2 million records. It will run well on many Windows or macOS laptops and requires minimal IT skills. The Desktop App can be downloaded from the senzing.com website.
- Bare metal Linux: The most common method of deployment, Senzing is natively installed on Red Hat or Debian-based systems. Getting started instructions can be found in the Quickstart Guide.
- Docker / Kubernetes / Openshift / CloudFormation: If your team loves one of these platforms.
Mapping data is the process of informing Senzing what the fields in your data sources represent. Consider a CSV file where an individual's full name is stored in a column called NAME. To inform Senzing this field describes all the tokens comprising an individual name, you would modify the header row of the CSV file and change NAME to NAME_FULL. NAME_FULL is the term informing Senzing what to expect in this column and how to use it for all functions of entity resolution.
Unlimited support is included with an evaluation license, we will help you map your data sources. Typically, mapping data for processing in Senzing is straight forward with the out-of-the-box configuration covering most scenarios. The initial mapping process usually takes less than 30 minutes per data source.
It's important to have resources that understand the desired outcomes with the available data, the schema of the data and access to utilize it during a PoC.
There are a few things to be cognizant of:
- Senzing requires structured data but is very flexible in how each attribute can be provided for ingestion. You do need to field the data appropriately.
- Each record must include attributes that identify one and only one entity. For instance if you are mapping a contact from your address book, the name, home address, email, phone number, etc are for the person entity. Their company name, company address, company phone number, company website as their employer is a separate and distinct entity.
In such a scenario the person entity and company entity would be extracted from the data source and mapped accordingly with each being a distinct record sent to Senzing.
- Senzing uses highly sophisticated domain aware name processing which includes culturally aware person name matching and organizational name domain knowledge. Senzing does need to be told that a name is a personal name (NAME_FIRST, NAME_MIDDLE, NAME_LAST or NAME_FULL) or an organizational name (NAME_ORG).
Data mapping is beyond the scope of this article, for additional information on getting started with data mapping using both CSV and JSON see the Generic Entity Specification.
Evaluating Results and Outcomes
During a PoC you may only want - for example - a CSV output of the results that connect records into entities and how those entities are related. You may want to export or replicate similar information to a warehouse for further analysis and consider joining to the original data sources. Both and similar scenarios are easily accomplished. Additionally, see the Exploratory Data Analysis Tools that can be used to dynamically explore, compare and analyze results.
We're ready and waiting to help with accelerating your Senzing evaluation and discussing any topics herein further. If you'd like to reach out to us please do so at firstname.lastname@example.org or support ticket.