Watch this 6 min Video on Mapping
In order to perform entity resolution, Senzing needs your data mapped. As noted in the tutorial video, entities are composed of features, and features are composed of attributes. Mapping is the process of annotating the fields in a data source with a set of common terms Senzing understands and uses when performing entity resolution.
ProTip: Name only data sources: If all you have is name only data sources, you will not have enough data to perform an entity resolution, but if you have a name only source and several other sources which contain name and at least two other features for resolution, you will be able to see possible relationships with resolved entities.
(View the QuickStart for more on how to add data source files to a project.)
Start Mapping:
- Name your data source. This name should be unique and descriptive, as this is how you will identify the location of records showing in your results
- Select Feature Type: Use the list under "Mapped to" to pick the type of feature it is for the entity resolution process. (See details below)
- Use Mapping Guidance to resolve issues.
- Red Alerts and will are issues that will prevent entity resolution from taking place.
- Yellow Alerts are warnings that indicate possible problems.
- Blue Alerts highlight best practices you might choose to follow.
Selecting Feature Types
For instance, the column that contains a person's first name may be named fname in one data source and firstName in another. When mapping those data sources the corresponding feature term for the first name in Senzing is NAME_FIRST. By informing Senzing what the feature the data maps to is, enables entity resolution to ignore the difference in column header names and treat the data across the data sources the same. Senzing will remember your mapping preferences for each data source.
In the "Mapped to" attribute pull-down list for each column, you will find a list of terms to select from. It’s a good idea to familiarize yourself with this list as it will inform you as to what to look for in the columns of your data sources.
You will see "Feature" below the "Mapped to" attribute selector.
The "Mapped to" pull-down list is grouped by category. There are common terms for names, addresses, phones, identifiers like drivers license, passport and email addresses, and other attributes like date of birth and gender. An ideal mapping would include:
- A name
- An address, phone, and/or email
- A date of birth and/or identifier
Name Data Pro-Tip: If you have separate name columns (first and last) and full name column in the same file, make sure the full name column has the combined results for all records, as you will have to choose between using First/Last or Full name when mapping. To quickly combine the two name columns use a CONCATENATE function in your spreadsheet program.
Labeling Multiple Addresses in your Data Source
When your data source contains more than one address, you will need to use labels so that Senzing knows that those addresses are distinct from each other. For example, you could have a primary and secondary address, like "home "and "work."
In this case, you would label each attribute of the address.
Labeling Other Features
If you find that you have multiple phone numbers, emails and even first and last names (AKAs), you will need to use a label to keep those features distinct for entity resolution.
Unfortunately, not every data source has all of these, but data sources should have something besides just a name. The name alone is not enough to perform entity resolution, however, you can use the search function to find a full name once your data is loaded.
You will also find that the App maps many data source columns automatically. Your task is to accept or correct the ones it mapped automatically and manually assign the ones not auto-mapped or not mapped correctly.
Including Other Data
You may have data in your data sources that is needed for analysis but not used for entity resolution.
To include or exclude a data source column, check mark the Included box at the top of the column. In this example, setting the customer_since column to None/Included indicates you'd like to ingest this column and make it available for reference, but it will not be used for entity resolution as it is not mapped to a valid Senzing attribute term.
In addition to using the Included checkbox to exclude a column from being loaded, you can set the "Mapped to" to None/Suppressed. This informs Senzing not to include the column during loading, as a result it will not show up on any reporting or exporting and will not be used for entity resolution.
Ideally, you wouldn't include every un-mapped column from a source, especially columns with a large amount of text in them as they will make your reports harder to read. Remember, you can always go back to the source system and view the full details of all fields that are not relevant and used by entity resolution.
That’s all there is to mapping! We suggest you add your data sources one by one and review the matches Senzing determined each time. See the troubleshooting article if you didn’t get the results you were expecting.
Comments
0 comments
Please sign in to leave a comment.