- Windows 7 or 10 64 bit
- macOS High Sierra or Mojave
- 4 cores
- 16GB RAM
- 250GB flash storage (SSD or NVMe)
WARNING: The performance will be subpar on a system with the below specs particularly if you are working with large source data sets.
- 2 cores
- 8GB RAM
- 100GB storage
Understanding Entity Resolution
Entity Resolution (ER) is the process that determines who is who and who is related to who within and across your data. Entity Resolution differs from simple record matching; Entity Resolution creates a complete resume of an entity including all the names you know them by, all the places they have lived you have recorded, the email addresses they have used, etc. Senzing also figures out, and remembers how resolved entities are related.
The Senzing App is specifically designed to support people and organizations. For ER to be effective, these entities will have names, addresses, phone numbers, identifiers and other attributes. WARNING: If you have name only data or names plus very limited information e.g. city or gender, Senzing is not for you.
Senzing is perfectly suited to entity resolve your customers, prospects, employees and vendors. Really any data source containing identity data, including watch lists.
The Senzing App has the following features:
This occurs within seconds and uses very sophisticated fuzzy matching techniques to ensure we return everything your organization knows about that entity and their relationships.
If you will be building multiple projects with different data sources, be sure to enable multiple projects from the gear/preferences.
Locating Your Data Sources
These are your customers and prospects, possibly even your employees and vendors - any list of records containing identity data, including watch lists.
You can find these in your:
- Address Books such as: Microsoft Outlook, Gmail and Lotus Notes
- Customer Relationship Management (CRM) systems such as: Salesforce, ACT, SugarCRM, and Microsoft Dynamics
- Direct Marketing systems such as: MailChimp, Constant Contact, Marketo and Zoho
- Web and e-Commerce systems such as: Wordpress, WooCommerce, and Stripe
- Accounting and HR systems such as: Quickbooks, Zoho, Wave, ADP, and Oracle
- Spreadsheets and other files: Sometimes identifying data about entities is kept in spreadsheets or other files. Such is often the case with prospect lists and watch lists
All these systems have import and export functions allowing you to move their entities from one system to another. Look for the option to export to CSV (Comma Separated Values). The Senzing App currently only reads CSV files.
Check out our list of Plug and Play Data Sources for instructions on how to export their entities.
Extracting Data from Source Systems
- Look for a menu option or search the help for the terms 'import' and 'export'
- Perform a web search using an expression such as 'how to export from <your system>'.
- Example: 'How to extract customers from QuickBooks'
- If you have an IT department that maintains your system(s), contact them to extract the master entity data
Sometimes you are given the option to choose which fields to export.
You will want to select all the identity data available. Look for:
- System ID or account number - this is how you would look them up in the source system
- Names - Primary name, Synonyms, Nicknames, AKA if any
- Date of birth
- Addresses - Home, Mailing, Alternate, etc
- Phone numbers - Home, Work, Fax, Cell or Mobile
- Email addresses - Possibly websites for a company
- All identifiers - SSN, other National ID, Drivers License, Passport or similar
The more of these fields you provide the more accurately Senzing will be able to determine if an entity in one data source is the same as in another.
ER Pro-Tip: Data sources that only contain name data are not going to produce high-quality entity resolutions. Be sure that the sources you wish to resolve on have a minimum of a first and last name and any two of the above-listed features.
Analysis Pro-Tip: If there are fields in your source data that are important for you to see when you are performing analysis, be sure to extract them. When Mapping, you will check the Included checkbox and then map them as "none/included," so that they will appear in the Entity Resume.
Loading & Mapping Your Data
Once you have identified a data source and exported its entities to a CSV file, it is time to load the exported CSV. Go to the add a data source card and select your csv file.
If it’s one of the Plug and Play Data Sources, you will see the status is immediately set to LOAD NOW; the Senzing App has auto mapped the fields. Simply click on LOAD NOW to begin loading the data source.
If the source is not one of the plug and play sources, you will see the status is set to REVIEW MAPPING. The Senzing App will still attempt to auto-detect the columns and automap them, you should always review these and make amendments and alterations as necessary.
Mapping is the process of annotating the columns in a data source with a set of common terms Senzing uses to describe and resolve entities.
Click on REVIEW MAPPING to go into the mapping screen.
There is additional help under the Mapping Help button on the mapping screen we cover it in detail in Mapping Assistance. On the mapping screen add a data source name to describe this data.
In addition to mapping the data source attributes to use for entity resolution you can choose to include or exclude informational data attributes that you may need in analysis; using the Included checkbox across the top of each column. Columns that are not mapped to a Senzing term but selected to be included will be ingested and available for reporting but will not be used during entity resolution processing.
Once you have completed the mapping review, press the Ready to Load button to return to the data source card where you can begin loading by clicking LOAD NOW.
Loading can take from minutes to hours depending upon the volume of records in the source file and the resources available on your computer. Mouse over the LOADING... status to see the rate of ingestion and how far through the source file processing is.
Ideally you should not close the application during loading, though you can if you need to. Loading will restart the next time you launch the application but you will have to press the RESUME LOADING status to resume.
After completion of loading you'll see LOADED, REVIEW RESULTS. Clicking this will take you to the results review screen.
Reviewing Your Matches
After you load each data source for the first time it is important you review the matches made. This is the final verification that the data has been mapped correctly and has been successfully loaded. For instance, if you expected matches and didn’t get any, it might simply be you forgot to map an important field like a surname or the given name field is always empty indicating a problem with the data export from the source system.
You can review your matches from LOADED, REVIEW RESULTS as described above and from the dashboard screen on the card titled REVIEW. When only one data source is loaded, you will see three circles on this card: One for the Duplicates that were found, one for the Possible Duplicates, and one for the Possibly Related records.
Clicking on the hyperlink in each circle will take you to the review of that category of match, in the following example Duplicates are being reviewed.
When more than one data sources are loaded, the circles overlap to give an overview of two data sources. You can choose the 2 sources to compare via the data source selectors at the top of the card. As before, clicking on the hyperlinks in the circles will take you to the review screens of the appropriate selection.
If you find something that does not look correct and it is not immediately apparent why, check the troubleshooting article for common errors and details on how to collect and send support information to us.
When reviewing the matches you may see a More button in the More column. This indicates Entity Centric Learning (ECL) has occurred and there are additional records from different data sources that have contributed to the match. Click the More button to expand and display the full set of records.
This example show Possible Matches are being reviewed within a Salesforce data source. Expanding the More button for Entity ID 159890 displays an additional record from the WooCommerce data source that Entity ID 159890 comprises of; Senzing has resolved these 2 data source records together to form the single entity. This entity has a possible match to the Related Entity ID 18763.
The review screen by default opens in a condensed view. To expand the view and show additional details click the Show all columns slider at the top of the review section.
The review screen does not show every entity for the category being reviewed, instead it shows a default sampling size of 150 results.
If you'd like to view a larger sampling, click on the Sample button and from the Preferences pop up choose your desired Review Sample Size. The Preferences options can also be set from the Settings and Support icon in the upper right hand corner of the application (gear icon).
Note: Larger sample sizes can incur a performance penalty upon reviewing.
Single Subject Searches
You can search for entities from the dashboard screen on the card titled SEARCH and by clicking the magnifying glass in the sidebar.
Search is designed to find all the records for a single entity. It is not a query for a group of people who live in the same city or have the same last name, etc. In fact, you must enter the complete name (given and surname) to find a match on name only. Likewise, you must provide a complete address including street address and either city or postal code to find a match on address.
While you can just search by name, address or phone number, etc, imagine someone calling in asking what information you have on them. To be sure you find them and only them, you would need to be provided with more than only their name. You might have several people with the same name but all at different addresses or with different birthdays or passport numbers.
Ideally, a search for a single entity would contain the following:
- A complete name
- A complete address and/or phone and/or email
- A birth date and/or passport, driving license number or national ID
Once you execute the search you will be presented with a list of the matching records by the match level.
You will only get records in the Matches category if you searched for a complete identity as described above. But we always show you Possible Matches and Possibly Related records, as well as name only matches so you have all the records to review when making a decision about which records belong to the identity you searched for.
Clicking on a search result will take you to the resume for the selected entity.
Entity Resume with Relationship Graph
The Entity Resume gives you a comprehensive overview of all the records which compose an entity as well as their possible relationships. The relationships are graphed with Entities as nodes and the relationship connections (match keys) as labeled edges so that you can understand which features in the data expose the connection.
The export button on the search results screen enables you to export the search results to a CSV file for review in spreadsheets and similar applications.
Exporting Entity Resolution Outcomes
In addition to exporting the search results, there are 2 other mechanisms to export Entity Resolution results.
The first of these is from the results review screens, in the top right-hand corner of the results is a blue arrow. This will export the sample size of results to a CSV file.
You can also export the complete resolved entity data using the export facility; launched with the downward-facing arrow icon on the sidebar. With the export facility, you can choose the level of matches you are interested in exporting. The Match Levels tick boxes determine the level of matches to include in the CSV export. By default, matches, possible matches and relationships are selected.
When ready to perform the export click Create Entity Export, when the export is ready the Download Entity Export button will be available.