Entity resolution is the process that determines who is who and who is related to who within and across your data. Entity resolution differs from simple record matching. Entity resolution creates a complete resume of an entity including all the names you know them by, places you have recorded that they have lived, email addresses they have used, etc. Senzing also deduces how resolved entities are related.

Senzing is specifically designed to support people and organization entities. For entity resolution to be effective, these entities will have names, addresses, phone numbers, identifiers and other attributes. Note: If you have name only data or names plus very limited information e.g. city or gender, Senzing is not for you.

Senzing is perfectly suited to entity resolve your customers, prospects, employees, vendors, watch lists and more.

Requirements

Operating System

Windows 10 64 bit or later
MacOS - Recent version

Minimum Hardware Specifications

4 cores
16GB RAM
250GB flash storage (SSD or NVMe)

Downloading

Download

Features

Load Your Data

Load the exported entity data from your data sources into Senzing.

Review Resolution

Review entity resolution outcomes both within a data source and across data sources. How many duplicate customer records do you have? What importance and interest are the discovered relationships to you?

Single Subject Search

Search for a particular entity and obtain everything you know about them. This includes all the data sources their data can be found in and who they are related to.

Entity Resume with Graph

View a concise resume detailing entities including a graph of any relationships they have.

Export Results

Export resolved entity data.

Manage Projects

Manage multiple projects with different sources of data.

Tip

Click the Get Help button in the upper right portion of the app screen to request help from Senzing support.

Startup Demo Project

Loading Data

Upon initial startup you are presented with a demo project that contains sample data ready to load into Senzing. A project contains data to be loaded, reviewed and searched, there can be multiple projects all containing different sets of data.

Click OPEN on the Demo Project card to view the data to load in the demo project.

The three data sources are CSV files that contain data describing customers, some reference data and a watch list. Data source is the term used to describe a single source of data, where its contents have been extracted from a source system. For example, the customers.csv for the CUSTOMERS data source may have been exported from a CRM system. It is the data in each of these data sources that is loaded into Senzing and entity resolution processing completed on.

To review the data contained in each data source click on the MAPPING button on a data source card. Don't click on or adjust anything, mapping will be covered a little later. Click the blue back arrow in the upper left to go back.

To load and complete the entity resolution processing, click the LOAD button on each of the three data sources. Once each has loaded all processing has been completed and the LOAD button will change to REVIEW, it's time to review the outcomes of entity resolution.

Reviewing Outcomes

Click on REVIEW on the CUSTOMERS card.

There are three categories of matches:

Duplicates
- Records from within and across the data sources have resolved together to form a single entity.
Possible Duplicates
- Entities are strongly related and could possibly be the same, they share common data attributes but there isn't enough information at this time to match them together or conflicting data is holding them apart.
Possibly Related
- Entities are related, they share common data attributes but they are not the same.

At the current screen, Duplicates from the CUSTOMERS data source are being reviewed as indicated in the header. There are 36 such entities.

Considering the entity Robert Smith, note:

It is comprised of 4 CUSTOMERS data source records that have resolved together to form an entity
The Robert Smith entity is designated as Entity ID 1
There are variations in the names
There are transpositions in the date of births

Now click on the number in the Possibly Related category.

Considering the entity Eddie Kusha, note:

Eddie Kusha is an entity comprised of multiple records
Under his records is an entity, Mary Kusha, also comprised of multiple records
Eddie Kusha has a relationship with Mary Kusha, note her entity ID in the Related Entity ID column
They are related because they share (+) an address, email and surname but their date of birth and SSN are different (-), this is the Match Key column

When multiple data sources are loaded the review screen provides the ability to compare two data sources and see how many entities there are that are comprised of records from the two data sources. This is achieved by selecting two data sources in the selectors on the review screen, the following example specifies CUSTOMERS and WATCHLIST.

The overlapping segment details how many entities are comprised of records from both the customers data source and the watch list data source. In the above example there are 6 entities comprised this way. Notice how entity 5 is comprised of 2 records, one from each data source selected. The outer segments represent the number of entities only comprised with that data sources records. Clicking on the digits in each category and segment will update the display for the category.

Clicking on an Entity ID will take you to the resume for that entity. The entity resume gives you a comprehensive overview of all the records and data sources that compose an entity, as well as any relationships. Relationships between entities are also graphed.

Adjusting Review Sample Size

The review screen does not show every entity for the category being reviewed, instead, it shows a sampling size of 150 results.

If you'd like to view a larger sampling, click on the Sample button and from the Preferences pop up choose your desired Review Sample Size. Larger sample sizes can incur a performance penalty upon reviewing.

Entity Centric Learning

When reviewing the matches you may see a More button in the More column. This indicates Entity Centric Learning (ECL) has occurred and there are additional records from different data sources that have contributed to the entity. Click the More button to expand and display the full set of records.

In this example detailing duplicates for the customers data source, expanding the More button for entity six reveals two additional records from the watch list data source have resolved to the entity. Senzing has resolved records from both the customer and watch list data sources together forming the single entity.

Entity Searching

Use the Search screen to search for entities.

Search is designed to find any entities that match any of the search criteria. It is not a query for a group of people who live in the same city or have the same last name, etc. In fact, you must enter the complete name (given and surname) to find a match on name. Likewise, you must provide a complete address including street address and either city or postal code to find a match on address.

Ideally, a search for a single entity would contain the following:

A complete name
A complete address and/or phone and/or email
A birth date and/or passport, driving license number or national ID

Exporting Entity Resolution Outcomes

There are 2 mechanisms to export entity resolution results. The first of these is from the results review screens, in the top right-hand corner of the results is a blue arrow. This will export the currently viewed results category to a CSV file.

You can also export the complete resolved entity data using the export facility. The Include Entities tick boxes determine the match levels to include in the export.

When ready to perform the export click Create Entity Export, when the export is ready the Download Entity Export button will be available.

Mapping and Loading Your Own Data

Having explored the included demo project and sample data, it's time to entity resolve your data.

Locating Your Data Sources

These are your customers and prospects, possibly even your employees and vendors, or any source of records containing identity data. You can find these in your:

Address Books: Microsoft Outlook, Gmail and Lotus Notes
Customer Relationship Management (CRM): Salesforce, ACT, SugarCRM, and Microsoft Dynamics
Direct Marketing systems: MailChimp, Constant Contact, Marketo and Zoho
Web and e-Commerce: Wordpress, WooCommerce, and Stripe
Accounting and HR systems: Quickbooks, Zoho, Wave, ADP, and Oracle
Spreadsheets and other files: Sometimes identifying data about entities is kept in spreadsheets or other files.

These systems typically have export functions allowing you to move their data from one system to another. Look for the option to export to Comma Separated Values (CSV). CSV is the format used by the application for reading and loading data. Check out our list of Plug and Play Data Sources for instructions on how to export their entities.

Extracting Data from Data Sources

You will want to select all the identity data available in each source, look for:

System ID or account number - this is how you would look them up in the source system
Names - Primary name, Synonyms, Nicknames, AKA if any
Date of birth
Gender
Addresses - Home, Mailing, Alternate, etc
Phone numbers - Home, Work, Fax, Cell or Mobile
Email addresses - Possibly websites for a company
All identifiers - SSN, other National ID, Drivers License, Passport or similar

The more of these fields you provide the more accurate entity resolution will be.

Tip

For optimal entity resolution, ensure that the data sources have a minimum of a first and last name, and any two of the above-listed features.

Mapping & Loading Your Data

Once you have identified your data sources and exported them to CSV files, it is time to map and load the data. Mapping is the process of annotating the columns in a data source with terms Senzing understands to describe the fields. For example, the column header for first name in your source CSV may be named fname. Senzing recognizes a first name using the term NAME_FIRST. Your task during the mapping process is to match the Senzing term to the contents of each CSV column. It is the mapped columns that designate which data attributes will be used for entity resolution.

To further understand mapping review Mapping Tutorial & Assistance. There is additional help under the ? button on the mapping screen.

First, create a new project for your data. Click on the Projects icon in the menu bar and then Add Project. Give the project a name and click Create.

Once created the new project will open, and you can add your exported data source(s) by clicking on Add Data Source for each one.

Screen_Shot_2020-07-16_at_4.34.37_PM.png

If it’s one of the Plug and Play Data Sources, you will see the status is set to LOAD; the application has auto mapped the fields. Simply click on LOAD to begin loading the data source.

Screen_Shot_2020-07-16_at_4.34.20_PM.png

If the source is not one of the plug and play sources, the status is set to REVIEW MAPPING. The application will attempt to auto-detect the columns and auto map them, but you should always review these and make amendments as necessary.

Screen_Shot_2020-07-16_at_4.37.38_PM.png

Click on REVIEW MAPPING to go into the mapping screen. On the mapping screen add a data source name to describe this data.

Then for each column complete mapping by selecting the Senzing term that describes the contents of that column.

In addition to mapping data attributes to use for entity resolution, you can choose to include or exclude informational data attributes you may need in analysis. Use the Included checkbox across the top of each column to include or exclude these values. Columns not mapped to a Senzing term but selected to be included will be loaded and available for reporting but will not be used during entity resolution processing.

Screen_Shot_2019-09-13_at_7.39.51_PM.png

Once you have completed the mapping review, press the Ready to Load button to return to the project. After adding and completing the mapping review for one data source, repeat the process to add additional data sources you may have to add them to the project. To complete loading click on the LOAD button for each data source.

Loading can take from minutes to hours depending upon the volume of records in the source file and the resources available on your computer. Mouse over the progress bar to see the rate of ingestion and how far through the source file processing is.

After completion of loading you'll see the load button on the data source card has changed to REVIEW. As with the demo project, you can now review and explore the outcomes from entity resolution processing.

Articles in this section

Senzing App - Quickstart

Requirements

Operating System

Minimum Hardware Specifications

Downloading

Features

Startup Demo Project

Loading Data

Reviewing Outcomes

Adjusting Review Sample Size

Entity Centric Learning

Entity Searching

Exporting Entity Resolution Outcomes

Mapping and Loading Your Own Data

Locating Your Data Sources

Extracting Data from Data Sources

Mapping & Loading Your Data

Comments

Articles in this section

Requirements

Operating System

Minimum Hardware Specifications

Downloading

Features

Startup Demo Project

Loading Data

Reviewing Outcomes

Adjusting Review Sample Size

Entity Centric Learning

Entity Searching

Exporting Entity Resolution Outcomes

Mapping and Loading Your Own Data

Locating Your Data Sources

Extracting Data from Data Sources

Mapping & Loading Your Data

Related articles