Overview
The default Senzing configuration for entity resolution is purposely conservative. We don't put records together unless we are sure. However we keep track of the records that fell just short of a match and these are referred to as possible and ambiguous matches.
We believe that how conservative or loose the matching is, is based on the mission. For instance, someone in security or finance would not want to deny service to a person or company if they could possibly be someone else. However, marketing users hate to see possible matches and would prefer that records that are at least plausible matches are included.
This is where Dynamic Entity Resolution comes in. What if you could take a set of records and resolve it in memory with a looser set of rules without affecting the entire repository. We have a github project that does just that located here: https://github.com/Senzing/resolver
The following is a demonstration of how it works. You can watch the video here: Senzing Dynamic Resolution Demonstration
Preparation:
The user is expected to have already installed Senzing and Docker.
Step 1: Download the files attached to this article to a directory of your choice.
I chose ~/senzing/projects/dynamic
We will refer to this as the "demo" directory.
Step 2: Load the attached customers.csv file with the default configuration.
From your primary Senzing project's python directory ...
python3 G2ConfigTool.py -f ~/senzing/projects/dynamic/addToDefaultConfig.g2c
python3 G2Loader.py -f ~/senzing/projects/dynamic/customers.csv
Start or restart your Senzing rest api from your docker compose setup or from SenzingGo located here: https://github.com/Senzing/senzinggo
Step 3: Pretend you are a marketing user and search for Kohls
From the demo directory ...
python3 dynamic_search_test.py
Do a regular search for "Kohls". Your screen should look like this ...
As a marketing user, you see two Kohls here not three. There is one in Boston and one in Blaine, MN and you wonder why the first two didn't resolve. McClellen Highway must also be highway 49, and the street number is the same. So you contact support and ask them why.
Step 4: You are now a senior support analyst and you receive this request.
You go into the G2Explorer in the main repository and you do a search for "Kohls". You see the same result, so you ask the system why?
From your primary Senzing project's python directory ...
python3 G2Explorer.py
search Kohls
why search 2
Your screen should look like this ...
You see that the addresses only score 77 and the records hit rule 165 which downgrades it to a possible match.
Even though Kohls is likely to have multiple outlets in Boston, it is less likely they both have street number 290. But what if this was a major fast food franchise? We just can't be sure without reference data or human intervention.
But this is not the first time that marketing users have complained that address matching should be looser so you decide to give them their own search profile and set this rule to resolve just for them!
Step 5: Create a new Senzing project for Dynamic Resolution
From the python directory on your server's installation of Senzing normally located at /opt/senzing/g2/python ...
python3 G2CreateProject.py ~/senzing/g2/g2v2.8-ds1
I chose to place mine on my home directory under and named it g2v2.8-ds1 for the version of g2 I am using with a -ds1 to signify this is the dynamic search 1 configuration.
Note: We will just use the default sqlite database for dynamic search although you could change it if desired. After all, the results of dynamic resolution are temporary. They are returned to the caller immediately and the results are not saved in the database.
Step 6: Configure the new project
From the new projects's python directory ...
cd ~/senzing/g2/g2v2.8-ds1
source setupEnv
python3 G2SetupConfig.py
This lays down the default configuration. To this you will want to add all of your prior configuration scripts. To do this, also from the new projects python directory ...
python3 G2ConfigTool.py
addDataSource CUSTOMERS
getRule 165
setRule {"id": 165, "resolve": "Yes", "relate": "no"}
When complete, your screen should look like this ...
We just changed rule 165 to resolve rather than relate!
We are now ready to use the dynamic resolver with dynamic search profile #1!
Step 7: Start the dynamic resolver
I developed the following script from the documentation at https://github.com/Senzing/resolver and named it start_dynamic_resolver.sh
export SENZING_VOLUME=/home/jbutcher/senzing/g2/g2v2.8ds1
export SENZING_DATA_VERSION_DIR=${SENZING_VOLUME}/data
export SENZING_ETC_DIR=${SENZING_VOLUME}/etc
export SENZING_G2_DIR=${SENZING_VOLUME}
export SENZING_VAR_DIR=${SENZING_VOLUME}/var
sudo \
--preserve-env \
docker run \
--interactive \
--publish 8252:8252 \
--rm \
--tty \
--volume ${SENZING_DATA_VERSION_DIR}:/opt/senzing/data \
--volume ${SENZING_ETC_DIR}:/etc/opt/senzing \
--volume ${SENZING_G2_DIR}:/opt/senzing/g2 \
--volume ${SENZING_VAR_DIR}:/var/opt/senzing \
senzing/resolver
From the project directory where you downloaded these files ...
python3 start_dynamic_resolver.sh
The dynamic resolver is now listening on port 8252 and your screen should look like this ...
Step 8: Pretend you are the marketing user again and do a dynamic search for Kohls
From the demo directory ...
python3 dynamic_search_test.py
Do a dynamic search for "Kohls". Your screen should look like this ...
The original 2 Kohls records that did not resolve now appear to be resolved for the marketing user!
Summation
You can review the attached code in dynamic_search_test.py. But how it works is by sending the records returned by the initial search of the main repository to the dynamic search repository where they are re-resolved with a different rule set to see what then matches!
You could also use dynamic search to resolve any file of records without affecting the main repository. For instance, you have a contact list you want to resolve. Or lets say you want to perform a marketing campaign for all the customers in Boston, but you want them resolved with a looser set of rules.
Just remember, these results are not saved to the main repository and you wouldn't want to resolve millions of records this way.
Comments
0 comments
Please sign in to leave a comment.