This article guides you through the process of mapping and loading data into Senzing using the csv files attached to this article. If you want to follow along, download them to a directory of your choice.
- The Excel spreadsheet is the one marked up in the video.
- The -raw.csv file is the one that actually gets processed.
- The .py Python script is the one we build in the video.
- The .g2c file has the configuration required to load this data.
- The _mapped.json files are the files created by the Python script that is ready to load into Senzing.
Important notes!
- The video below references ENTITY_TYPE which has been replaced with RECORD_TYPE.
- The video suggests to always use the PRIMARY label for names even if you only have one. Please see the best practices section below for latest instuction.
- You can either map to GROUP_ASSOCIATION_ORG_NAME or EMPLOYER_NAME
Please read below and/or watch this Video tutorial. Here we go!
The source data you want to load into Senzing will range from simple to complex. Simple is a basic CSV file like the employee list shown below ...
... to the complex, like the OFAC List, which looks like this ...
Let's start with the simple, but realize no matter how complex, there are only two things to do:
- Decide what fields belong to what entity
- Map them to Senzing attributes based on the Generic Entity Specification
This is how we mapped it ... (the new row 1 has the corresponding Senzing attributes)
Things to note ...
- A data source is required, but there is no column for one. This is ok as we can specify a data source for the whole file when we load it.
- Column A employee number was mapped to record_id as it appears to be unique.
- Columns B through K have one-to-one mappings with Senzing attributes. Note using the PRIMARY label for the name attributes and HOME for the address attributes.
- Columns L-N are payload attributes that don't even need to be registered in Senzing since they do not help resolution. But they are useful to display to users when presented with a match.
- The employer name column is mapped to a group association attribute rather than a name attribute, as it is NOT the employee's name.
Best practices
- Always map to NAME_ORG if you know it's an organization. Otherwise, map to name_full or name_last/first, etc.
- Only specify labels for names, addresses and phones if they are meaningful to you. There are 3 exceptions to this ...
- Use the label "MOBILE" for mobile phones to increase their value as they are less likely to be shared by family members.
- Use the label "BUSINESS" for the physical address of an organization. It will keep chain stores and other subsidiaries from resolving even if they have the same name, phone number, and website.
- Use the label "PRIMARY" for names only if a source has multiple names and distinguishes the main or primary name from a list of aliases. Data providers often provide multiple names and have determined which is the preferred or "primary" name. Senzing will use this to help select the best display name for an entity when there are several to choose from.
- Don't be afraid to add new attributes when needed. Contact support@senzing.com for free 30-minute training on how to do this. These will mostly be industry-specific IDs that did not make it into the list we ship with. Don't try to put them all into a generic attribute like OTHER_ID. You will likely want to set their behaviors independently, and it's nice to see precisely what matched in the match_key.
- Be conservative when setting behaviors for new attributes. For instance, on a new kind of ID, set it first to a basic F1. Only add the exclusive behavior when you determine it should break matches even between entities that might otherwise resolve. Only add the stable behavior when you determine it should cement matches between entities that might otherwise not resolve.
Take note ...
Easily, the single most common cause of overmatching is when attributes that don't belong to an entity get mapped to it. Ask yourself on every field... "does this attribute really belong to the entity I am trying to map?". If the answer is no, either include it as a payload or map it to the entity it does belong to and relate this entity to it.
Please refer to the video linked above for more information on mapping!
Comments
0 comments
Please sign in to leave a comment.