This article guides you through the process of mapping and loading data into Senzing using the csv files attached to this article. If you want to follow along, download them to a directory of your choice.
- The excel spreadsheet is the one marked up in the video.
- The -raw.csv file is the one that actually gets processed.
- The .py python script is the one we build in the video.
- The .g2c file has the configuration required to load this data.
- The _mapped.json files are the files created by the python script that are ready to load into Senzing.
Please read below and/or watch this Video tutorial. Here we go!
The source data you want to load into Senzing will range from the simple to the complex. Simple is a basic csv file like the employee list shown below ...
... to the complex, like the OFAC List which looks like this ...
Lets start with the simple, but realize no matter how complex there are only two things to do:
- Decide what fields belong to what entity
- Map them to Senzing attributes based on the Generic Entity Specification
This is how we mapped it ... (the new row 1 has the corresponding Senzing attributes)
Things to note ...
- A data source is required, but there is no column for one. This is ok as we can specify a data source for the whole file when we load it.
- Column A employee number was mapped to record_id as it appears to be unique.
- Columns B through K have one to one mappings with Senzing attributes. Note the use of the PRIMARY label for the name attributes and HOME for the address attributes.
- Columns L-N are payload attributes that don't even need to be registered in Senzing since they do not help resolution. But they are useful to display to users when presented with a match.
- The employer name column is mapped to a group association attribute rather than a name attribute as it is NOT the name of the employee.
- Always specify PRIMARY on a name even if there are no AKAs. A primary name should be considered before an AKA in any best name calculation
- Always specify a label for addresses and phones. Not only do they help group the components of an address that belong together, they can be used to help find the information you want. Like you may want to see the latest HOME address or the most current CELL phone.
- Not everything needs a label. For instance, a watch list may list 3 dates of birth for a person with no clear indication of which is better. But names, addresses and phone numbers should always have a label.
Easily, the single most common cause of over matching is when attributes that don't belong to an entity get mapped to it. Ask yourself on every field... does this attribute really belong to the entity I am trying to map? And if the answer is no, either include it as payload or map it the entity it does belong to and relate this entity to it.
Please refer to the video linked above for more information on mapping!