The Senzing G2 engine utilizes libraries for scoring data attributes. The scoring libraries - also known as comparators - are plugins to the engine. This plugin framework enables integrators and builders to develop custom scoring libraries for specific requirements.
The G2 engine has a default scoring library for Social Security Number (SSN) to compare and score 2 SSNs for likeness. The SSN scoring library is specifically tailored to the nuances of comparing SSNs. The default library for SSN is called libg2SSNComp.so.
Consider a situation where a data source you wished to load into Senzing came with SSN but it had been encrypted previously by the owner of the data source. You know SSN is very useful and you want to compare the decrypted SSNs within Senzing and use them for entity resolution. You have the key to decrypt the SSN attributes but you need to do this in memory, on a secure server with Senzing deployed on and never store the decrypted version of the SSN to physical media.
You can achieve this by writing your own SSN scoring plugin with the required functionality to decrypt the SSNs, perform the comparison between the SSNs and hand the scoring to the Senzing G2 engine to consider during the entity resolution process.
Writing Your Custom Plugin
All G2 scoring plugins use a common header file to define the interface and other scoring operations - g2PluginInterface.h, - located in /opt/senzing/g2/sdk/c. The SSN plugin interface uses five functions from g2PluginInterface.h. These are:
- int initPlugin(const char *configInfo, char *errorStr, const size_t maxErrorStrSize, G2GETCALLBACK getCallback, G2CALLBACK callback1, G2CALLBACK callback2, G2CALLBACK callback3);
- const char *getVersion();
- const char *getScoreNames();
- int scoreSimple(const char *ftypeCode, const char *felemCode, const char *inboundStr, const int inboundIsHashed, const char *matchedStr, const int matchedIsHashed, char *behavior, const size_t maxBehaviorBufferSize, int* resultScore, char *errorStr, const size_t maxErrorStrSize);
- int closePlugin();
These five functions must be implemented in your custom plugin. A sample is attached - SampleScoreSSNPlugin.c. This outlines how to structure and build your own plugin, the header functions, function implementations and methods.
To assist in understanding how to write the plugin and how the Senzing engine calls it, a driver program simulating the engine is also attached - SampleScoreSSNTester.c. The following shows compilation and output of the samples. Scoring output from the test program is shown only for reference and comprehension, your plugin would not output data such as this.
You would re-write the functions in the sample plugin with your own logic; such as first decrypting the mapped SSN_NUMBER before calling the scoreSimple() function with the unencrypted SSN values. The sample calls our standard SSN scoring plugin; you could also write your own scoring logic if desired.
Once you've written your custom plugin with the required logic, it needs to be built into a standard library. For example, libMyCustomSSNScorer.so. Remember, this must export the aforementioned function names to allow the G2 engine to access them.
Configuring Your New SSN Plugin
Once your new plugin is ready, the G2 configuration needs to be modified to inform the engine to use it. This configuration is defined in the g2config.json file.
- First, define your plugin as a new comparison library in the CFG_CFUNC section of the g2config.json file. Here is the default one for SSN
...
{
"CFUNC_ID": 8,
"CFUNC_CODE": "SSN_COMP",
"CFUNC_DESC": "SSN comp",
"FUNC_LIB": "INT_LIB",
"FUNC_VER": "1",
"CONNECT_STR": "g2SSNComp",
"ANON_SUPPORT": "Yes",
"LANGUAGE": null,
"JAVA_CLASS_NAME": null
},
...
Add a new section at the end of CFG_CFUNC defining your plugin...
{
"CFUNC_ID": 100,
"CFUNC_CODE": "SSN_COMP_CUSTOM",
"CFUNC_DESC": "SSN_COMP_CUSTOM",
"FUNC_LIB": "INT_LIB",
"FUNC_VER": "1",
"CONNECT_STR": "SampleScoreSSNPlugin",
"ANON_SUPPORT": "Yes",
"LANGUAGE": null,
"JAVA_CLASS_NAME": null
},
...
Where:
- CFUNC_ID - Unique ID code
- CFUNC_CODE - Code to represent the new plugin
- CONNECT_STR - Name of your new plugin. This is the name libSampleScoreSSNPlugin.so on Linux without the lib prefix or .so suffix
- ANON_SUPPORT - Boolean indicator, does the plugin support selective hashing (anonymization)
- Add a new section to the CFG_CFRTN section to define the return values for the plugin just as you did for CFG_CFUNC previously.
...
{
"CFRTN_ID": 200,
"CFUNC_ID": 100,
"CFUNC_RTNVAL": "FULL_SCORE",
"EXEC_ORDER": 1,
"SAME_SCORE": 100,
"CLOSE_SCORE": 90,
"LIKELY_SCORE": 80,
"PLAUSIBLE_SCORE": 70,
"UN_LIKELY_SCORE": 60
},
...
Where:
- CFRTN_ID - Unique ID code
- CFUNC_ID - Same value used in CFG_CFUNC
- *_SCORE - Score thresholds for match determination
- The integer *_SCOREs follow the general G2 engine scoring. Scoring thresholds are used to indicate how similar (or not) attributes are. These scores are subsequently used to determine overall matching during entity resolution. Valid scores for each threshold level range from 0 to 100 with -1 indicating one side of the two attributes being scored are invalid or the pair cannot be scored
- The final step is to modify the configuration so SSN attributes utilize your new plugin. This modification is within the CFG_CFCALL section. Find the entry for SSN, FTYPE_ID = 7:
...
{
"CFCALL_ID": 7,
"FTYPE_ID": 7,
"CFUNC_ID": 8,
"EXEC_ORDER": 1
},
...
Change the CFUNC_ID from 8 to the value you used - 100 in the example above....
{
"CFCALL_ID": 7,
"FTYPE_ID": 7,
"CFUNC_ID": 100,
"EXEC_ORDER": 1
},
...
Once these steps are complete, the G2 engine will be using your new plugin.
Initialization information inside the plugins
When a plugin is initialized by the G2 engine, some pertinent information to the system is passed in, through the "configInfo" parameter of the "initPlugin" API function. This character string contains information, such as the following...
<INIT_INFO>
<NODE_NAME>pyG2Engine</NODE_NAME>
<CONFIG_PARAMS>{"SOURCE_TYPE":"JSON_PARAM_STRING","INI_FILE_NAME":"","INI_JSON_STRING":"{\"PIPELINE\": {\"SUPPORTPATH\": \"/home/senzingUser/senzing/data\", \"CONFIGPATH\": \"/home/senzingUser/senzing/etc\", \"RESOURCEPATH\": \"/home/senzingUser/senzing/resources\"}, \"SQL\": {\"CONNECTION\": \"sqlite3://na:na@/home/senzingUser/senzing/var/sqlite/G2C.db\"}}"}</CONFIG_PARAMS>
<PLUGIN_SUPPORT_PATH>/home/senzingUser/senzing/data</PLUGIN_SUPPORT_PATH>
<PIPELINE_CONFIG_PATH>/home/senzingUser/senzing/etc</PIPELINE_CONFIG_PATH>
</INIT_INFO>
Important Note: The format and content of this information is subject to change in future versions. If you wish to begin using this information, please contact Senzing support.
Comments
0 comments
Please sign in to leave a comment.