Overview
Sometimes, despite records not having enough matching attributes to resolve together, you have additional knowledge and know that records belong to the same person or organization. Senzing provides the ability to force such records together simply by utilizing the TRUSTED_ID feature. TRUSTED_ID will force together records that share the same TRUSTED_ID_TYPE and TRUSTED_ID_NUMBER attributes, regardless of the other data attributes in the records.
Conversely, you can break an entity apart by giving different TRUSTED_ID_NUMBER attributes to the records that comprise the entity. Forcing records apart isn't a common task.
Note on built-in TRUSTED_ID versus a custom A1ES feature. The examples below use the built-in TRUSTED_ID feature, which is the right choice for drop-in scripts and any deployment that does not modify the engine config — it works out of the box. Be aware of one nuance in v4-native configs: TRUSTED_ID_TYPE namespaces comparison, so two records with different TRUSTED_ID_TYPE values will not force apart even if their TRUSTED_ID_NUMBER values conflict. Pick a single consistent TRUSTED_ID_TYPE value per stewardship purpose to keep the semantics straightforward. (In a v3-upgraded config, TRUSTED_ID compares via EXACT_COMP and the asymmetry does not apply.)
For sustained stewardship and MDM deployments where you can change the engine config, the recommended pattern is a custom A1ES feature with a project-specific name (e.g., MDM_ID, STEWARD_ID, CURATION_ID). A custom A1ES feature has no _TYPE field, compares via EXACT_COMP, and the feature name appears in MATCH_KEY output for clear provenance.
Usage
Consider the following two records:
DATA_SOURCE,RECORD_ID ,NAME_FULL ,DATE_OF_BIRTH ,PHONE_NUMBER
TEST ,1 ,SKIPPY JONES ,1960-01-01 ,5551212
TEST ,2 ,BOB JONES ,1960-01-01 ,5551212Senzing would create two new entities for these records, detect they share the same date of birth and phone number, and record a relationship between the two entities. There isn't enough data at this time for Senzing to confirm they are the same person.
To force the two records together, append the TRUSTED_ID_TYPE and TRUSTED_ID_NUMBER attributes to both records with the same values. The TRUSTED_ID_TYPE is an arbitrary value you choose; it can act as an informational hint when reviewing the entity data. TRUSTED_ID_NUMBER is a unique value shared by the records. Records with the same TRUSTED_ID_TYPE and TRUSTED_ID_NUMBER will resolve together. Note, in the following example, the TRUSTED_ID_NUMBER is created from the DATA_SOURCE and RECORD_ID values.
DATA_SOURCE,RECORD_ID,NAME_FULL,DATE_OF_BIRTH,PHONE_NUMBER,TRUSTED_ID_TYPE,TRUSTED_ID_NUMBER
TEST ,1 ,SKIPPY ,1960-01-01 ,5551212 ,FORCED_SAME ,TEST1-TEST2
TEST ,2 ,BOB JONES,1960-01-01 ,5551212 ,FORCED_SAME ,TEST1-TEST2 Loading these two records would result in a single entity comprised of both of the records.
Tips
- Before using TRUSTED_ID, check that you don't have additional data attributes available that would allow Senzing to automatically make a decision to resolve the records together. In the data source, are there additional names, maybe an address, or identifiers such as a driver's license or an SSN? It is best practice to provide Senzing with more data to enable it to make decisions automatically rather than overriding its behavior.
- You can add the new attributes to the records by calling getRecord(), add the new attributes to the record, and then call replaceRecord() to replace it. Senzing will do the rest. WARNING: If you do this, you must ensure that no other thread or process is trying to modify that record at the same time.
More than two records can share the same TRUSTED_ID attributes. In that case, all the records that share the same values will resolve together.
For sustained stewardship deployments — where overrides are persisted, audited, and applied through ETL on every reprocess rather than ad-hoc on individual records — configure a custom A1ES feature (commonly MDM_ID) and use a stewardship override table pattern: a separate, audited table keyed by (DATA_SOURCE, RECORD_ID) whose MDM_ID value is injected into records during ETL.
- For deployments that need a persistent, stable entity identifier (a durable GUID per entity) across the lifetime of the dataset, the MDM Lite lifecycle on top of stewardship is the supported pattern.
Comments
0 comments
Please sign in to leave a comment.