Overview
In Senzing, the MATCH_KEY is intended to capture what features matched when that record last was processed individually. Keep in mind that Senzing is an Entity Centric Learning (ECL) engine, after a record is initially processed, a record is rarely processed individually again.
What a MATCH_KEY is NOT is a current record of how a record currently matches the rest of the records in the entity. That answer would be obtained from the upcoming “Why” function.
This article intends to explain what a MATCH_KEY is through example and how that differs from the “Why” capability.
Example #1
This is a simple processing of three records and how MATCH_KEYs are calculated. In the end we will compare the MATCH_KEY vs what “Why” would return.
RECORD_ID, NAME_FULL , HOME_ADDR_FULL , HOME_PHONE_NUMBER 1 , John Smith, , 555-555-1212 2 , John Smith, ”246 South St , Las Vegas, NV” , 555-555-1212 3 , John Smith, ”246 South , Las Vegas, NV 89132”, 555-555-1212
Record 1
- MATCH_KEY = <blank>
- Record 1 processes and there is nothing to match against so the MATCH_KEY is blank
- Record 1 becomes Entity 1
Record 2
- MATCH_KEY = +NAME+PHONE_NUMBER
- Record 2 processes and matches to Entity 1 with name and phone_number
- Record 2 is matching to an entity and not a record, there could be 1000 records behind that entity and it doesn’t know or care as this is an entity based decision
- The MATCH_KEY for Record 1 is not changed and remains blank
Record 3
- MATCH_KEY = +NAME+ADDRESS+PHONE_NUMBER
- Record 3 processes and matches to Entity 1 with name, address, and phone_number
- As with processing Record 2, this is an entity based decision so none of the MATCH_KEYs for the existing records are updated.
MATCH_KEY vs Why
Example #2
In this example we will create another entity nearly identical to the first but add a fourth record that will cause both entities to come together. The first being the home information and the second being the work information.
Ingestion of the following records will create entity 2.
RECORD_ID, NAME_FULL , WORK_ADDR_FULL , WORK_PHONE_NUMBER 4 , John Smith, , , 333-333-1212 5 , John Smith, ”789 North St , Las Vegas , NV” , 333-333-1212 6 , John Smith, ”789 North , Las Vegas , NV 89132”, 333-333-1212
MATCH_KEY vs Why
As you can see this second second set of records uses the exact same logic as the first set but combine to form entity 2.
Adding record 7, this record the person uses their mobile phone for work and your contact list has their home and mobile phone.
RECORD_ID, NAME_FULL ,WORK_ADDR_FULL ,HOME_PHONE_NUMBER ,MOBILE_PHONE_NUMBER 7 , John Smith , ,555-555-1212 ,333-333-1212
Record 7
- MATCH_KEY = +NAME+PHONE_NUMBER
- Record 7 processes and matches to Entity 1 and Entity 2
- Entity 1 and 2 have nothing to keep them apart (no different date of birth, SSN, GENDER, etc), this seventh record will cause all both Entity 1 and Entity 2 and all records to resolve together into a single entity
What Happens to the MATCH_KEY?
Only record #7 in this decision is individually processed so it is the only record with an updated MATCH_KEY. Both Entity 1 and 2 are treated as an entity and not as their individual records as part of Entity Centric Learning (ECL) provided by Senzing. In the final MATCH_KEY vs “Why” analysis you see:
There are some engine internals that can make this slightly different for various combinations of data/operations - ambiguous, delete, update, redo, etc - and you may see updated MATCH_KEYS or re-ordering of the blank records.
Comments
0 comments
Please sign in to leave a comment.