Watch this 6 min Video on Mapping
In order to perform entity resolution, Senzing needs your data mapped. As noted in the tutorial video, entities are composed of features, and features are composed of attributes. Mapping is the process of annotating the fields in a data source with a set of common terms Senzing understands and uses when performing Entity Resolution.
ProTip: Name only data sources: If all you have is name only data sources, you will not have enough data to perform an entity resolution, but if you have a name only source and several other sources which contain name and at least two other features for resolution, you will be able to see possible relationships with resolved entities.
- Name your data source. This name should be unique and descriptive, as this is how you will identify the location of records showing in your results
- Select Feature Type: Use the list under "Mapped to" to pick the type of feature it is for the Entity Resolution process. (See details below)
- Use Mapping Guidance to resolve issues.
- Red Alerts and will are issues that will prevent entity resolution from taking place.
- Yellow Alerts are warnings that indicate possible problems.
- Blue Alerts highlight best practices you might choose to follow.
Selecting Feature Types
For instance, the column that contains a person's first name may be named fname in one data source and firstName in another. When mapping those data sources corresponding feature term for the first name in Senzing is NAME_FIRST. By telling Senzing what the feature the data maps to enables the Entity Resolution to ignore the difference in column header names and treat the data across the data sources the same.
In the "Mapped to" attribute pull-down list for each column, you will find a list of terms to select from. It’s a good idea to familiarize yourself with this list as it will inform you as to what to look for in the columns of your data sources.
You will see "Feature" below the "Mapped to" attribute selector.
The "Mapped to" pull-down list is grouped by category. There are common terms for names, addresses, phones, identifiers like drivers license, passport and email addresses, and other attributes like date of birth and gender. An ideal mapping would include:
- A name
- An address, phone, and/or email
- A date of birth and/or identifier
Name Data Pro-Tip: If you have separate name columns (first and last) and full name column in the same file, make sure the full name column has the combined results for all records, as you will have to choose between using First/Last or Full name when mapping. To quickly combine the two name columns use a CONCATENATE function in your spreadsheet program.
Labeling Multiple Addresses in your data source
When you data source contains more than one address, you will need to to use labels so that Senzing knows that those addresses are distinct from each other. For example, you could have a primary and secondary address, like "home "and "work."
In this case, you would label each attribute of the address.
Labeling other Features
If you find that you have multiple phone numbers, emails and even first and last names (AKAs), you will use a label to keep those features distinct for entity resolution.
Unfortunately, not every data source has all of these. But they should have something besides just a name; the name alone is not enough to perform Entity Resolution. However, you can use the search function on a full name once your data is loaded.
You will also find that the app automatically maps many data source columns automatically. Your task is to accept or correct the ones it mapped automatically and manually assign the ones not auto-mapped.
Including other data
You may have data in your data sources that are needed for analysis but not used for entity resolution.
To include or exclude a data source column use the Included checkbox at the top of the column. In this example, setting the customer_since column to None/Included indicates you'd like to ingest this column and make it available for reference, but it will not be used for Entity Resolution as it is not mapped to a valid Senzing attribute term.
In addition to using the Included checkbox to exclude a column from being loaded, you can set the "Mapped to" to None/Suppressed. This informs Senzing not to include the column during loading, as a result it will not show up on any reporting or exporting and is not used for Entity Resolution.
Ideally, you wouldn't include every un-mapped column from a source, especially columns with a large amount of text in them as they will make your reports harder to read. Remember, you can always go back to the source system and view the full details of all fields that are not relevant and used by Entity Resolution.
That’s all there is to mapping! We suggest you add your data sources one by one and review the matches Senzing determined each time. See the troubleshooting article if you didn’t get the results you were expecting.