First off, the Senzing Resolved Entity ID is not a globally unique persistent identifier. Rather it is a number to identify a grouping that may be transient in nature. This grouping reflects the dynamic automated learning nature of Senzing.
Every DSRC_RECORD (Datasource Record) has its unique set of ER (Entity Resolution) data assigned an OBS_ENT (Observed Entity) with an OBS_ENT_ID (Observed Entity ID), Senzing's internal representation of that record. This is somewhat of a pre-deduping process as identical ER data gets the same OBS_ENT_ID, so multiple DSRC_RECORDs could have the same OBS_ENT_ID. If the ER data changes the OBS_ENT_ID for that DSRC_RECORD changes too.
So data source + record ID, MDM+1231521 might be assigned OBS_ENT_ID 9870948109. When an entity is put together the RES_ENT_ID (Resolved Entity ID) is simply the lowest OBS_ENT_ID in the group.
So if you have records 1,2,3,4,5,6,7 you could get groups like:
Ent1: 1,2,3
Ent4: 4
Ent5: 5,6,7
And if you update the ER data in 1, getting new OBS_ENT_ID 100, but cause no additional ER outcome changes you could get:
Ent2: 2,3,100
Ent4: 4
Ent5: 5,6,7
Since changes can have multiple changes to the ER groups, the outcome is simply based on how those groupings end up. In general, if you aren't updating records then entities tend to stabilize around the oldest record for that entity.
Comments
0 comments
Please sign in to leave a comment.