There are times the Senzing engine determines additional work needs to be performed on an entity. In some cases it will decide automatically that this work should be done at a different time, for instance:
- Cleaning up decisions made based on attributes that are determined to no longer be important (most common)
- Records being loaded in parallel around the same cluster of entities causing conflicts
- Automatic corrections
When this happens a special record is written to a table for future processing. In the case of the G2Loader.py python script, the script automatically processes these.
The table SYS_EVAL_QUEUE is in the core Senzing repository and is comprised of:
LENS_CODE: A key for an advanced feature which is not in use at the moment
ETYPE_CODE: Entity type code, typically something like PERSON or COMPANY
DSRC_CODE: The user provided data source identifying code
ENT_SRC_KEY: A key, often internally generated, to identify a specific record
MSG: The internally formatted message to be processed by the engine
The first 4 field make up the unique key to identify the record.
Logically the data is processed as follows:
- Select records from the SYS_EVAL_QUEUE table: "select LENS_CODE, ETYPE_CODE, DSRC_CODE, ENT_SRC_KEY, MSG from SYS_EVAL_QUEUE"
- It is best to do this in blocks of records (e.g. 100) in case the table has a lot of records in it
- It is recommended to do this periodically during process (e.g. every 1000 records) and to limit to a maximum number of blocks processed at a time (e.g. max 10 blocks) to balance new data and redo records
- For each row selected
- Call G2Engine.process(MSG)
- Handle errors as you would with G2Engine.addRecord(...) or other such functions
- On successful processing, delete the record: "delete from SYS_EVAL_QUEUE where LENS_CODE = ? and ETYPE_CODE = ? and DSRC_CODE = ? and ENT_SRC_KEY = ?"
The G2Loader.py script that ships with Senzing API implements redo processing in the processRedo(...) function.