Overview
This article describes how to setup Selective Hashing on the Senzing APIs. This is an advanced topic and requires installation of additional software and changes to your Senzing configuration. If you have an interest in Selective Hashing please contact support to discuss your requirements.
This article is focused on using the G2Loader.py utility and configuration to self-anonymize, the pattern is the same for using the APIs directly.
Prerequisites
Selective Hashing requires softhsm2 to be installed.
- Red Hat / CentOS V7
sudo yum install softhsm
sudo usermod -aG ods <your_userid>
- Logout and login
- Red Hat / CentOS V8
-
sudo yum module install idm:DL1
sudo yum install softhsm
sudo usermod -aG ods <your_userid>
- Logout and login
-
- Debian
-
sudo apt install softhsm2
sudo usermod -aG softhsm <your_userid>
- Logout and login
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/lib/softhsm/
- You can add this to ~/.bachrc (or similar) or modify the project setupEnv to make it persist across sessions
-
Generate Secure Store and SALT
1. Edit your G2Module.ini file, add the highlighted secure store values to the [PIPELINE] section
[PIPELINE] SUPPORTPATH=/home/ant/senz-2_2_1/data CONFIGPATH=/home/ant/senz-2_2_1/etc RESOURCEPATH=/home/ant/senz-2_2_1/resources SECURE_STORE_URL=pkcs11://token/?slotID=0 SECURE_STORE_LIB=softhsm2 SECURE_STORE_PIN= ENABLE_SECURE_STORE=TRUE [SQL] CONNECTION=sqlite3://na:na@/home/ant/senz-2_2_1/var/sqlite/G2C.db # CONNECTION=postgresql://username:password@hostname:5432:database/?=schema=schemaname # CONNECTION=mysql://username:password@hostname:3306/?schema=schemaname # CONNECTION=db2://username:password@database/?schema=schemaname # CONNECTION=mssql://username:password@database
2. Initialize the secure store and token for Senzing to use. The following commands are run from the root of your Senzing project path. When prompted, specify a value with more than 4 characters for the SO (Security Officer) PIN
bin/g2ssadm -tokinit -label SENZING_STORE -c etc/G2Module.ini
3. Edit G2Module.ini again, comment out the original SECURE_STORE_URL line and add a new one with the following value for the URL
[PIPELINE]
SUPPORTPATH=/home/ant/senz-2_2_1/data
CONFIGPATH=/home/ant/senz-2_2_1/etc
RESOURCEPATH=/home/ant/senz-2_2_1/resources
# SECURE_STORE_URL=pkcs11://token/?slotID=0
SECURE_STORE_URL=pkcs11://token/?tokenLabel=SENZING_STORE
SECURE_STORE_LIB=softhsm2
SECURE_STORE_PIN= ENABLE_SECURE_STORE=TRUE
4. Initialize the secure store, enter the SO PIN when prompted
bin/g2ssadm -ssinit -c etc/G2Module.ini
5. Create the SALT, enter the SO PIN when prompted
bin/g2saltadm -new -name G2SALT -c etc/G2Module.ini
6. The g2saltadm command will return 4 lines, add these to the G2Module.ini file
[PIPELINE] SUPPORTPATH=/home/ant/senz-2_2_1/data CONFIGPATH=/home/ant/senz-2_2_1/etc RESOURCEPATH=/home/ant/senz-2_2_1/resources # SECURE_STORE_URL=pkcs11://token/?slotID=0 SECURE_STORE_URL=pkcs11://token/?tokenLabel=SENZING_STORE SECURE_STORE_LIB=softhsm2 SECURE_STORE_PIN= ENABLE_SECURE_STORE=TRUE [SQL] CONNECTION=sqlite3://na:na@/home/ant/senz-2_2_1/var/sqlite/G2C.db # CONNECTION=postgresql://username:password@hostname:5432:database/?=schema=schemaname # CONNECTION=mysql://username:password@hostname:3306/?schema=schemaname # CONNECTION=db2://username:password@database/?schema=schemaname # CONNECTION=mssql://username:password@database [SALT] NAME=G2SALT CHECKSUM=X1JKFbitmLGzaIF2GRF+eQ== HASH_METHOD=SHA2
Modify Configuration to Enable Feature Hashing
In this example, hashing will be enabled for the SSN feature. The G2configTool.py utility is used to modify the configuration and save it.
cd <senzing_project_path>/python
./G2ConfigTool.py
setFeature {"feature":"ssn", "Anonymize":"Yes"}
save
quit
[ant@localhost senz-2_2_1]$ pwd
/home/ant/senz-2_2_1
[ant@localhost senz-2_2_1]$ cd python/
[ant@localhost python]$ ./G2ConfigTool.py
Welcome to G2Config Tool. Type help or ? to list commands.
(g2cfg) setFeature {"feature":"ssn", "Anonymize":"Yes"}
Anonymize setting updated!
(g2cfg) save
WARN: This will immediately update the current configuration in the Senzing repository with the current configuration!
Are you certain you wish to proceed and save changes? y
Configuration saved to Senzing repository.
(g2cfg) quit
Only configure the features you must hash to ANONYMIZE, hashing will impact the quality of scoring. It is highly recommend not to hash NAME or ADDRESS as these will most adversely impact match quality.
Run It
Once the above steps are complete, you can run G2Loader.py as you normally would with the feature(s) configured to be anonymized now being hashed upon ingestion.
Keep in mind you can't change a loaded feature to/from being ANONYMIZED without reloading those records. Senzing can't unhash a feature and does not crawl through loaded data to hash it automatically.
Comments
2 comments
A few updates to note...
Thank you. Corrected.
Please sign in to leave a comment.