Prerequisites
Watch
- (2:38)
- (10:57)
- Senzing Mapping Primer (1:17)
Read
Data
- Identification of the data sources you plan to use for your evaluation
- JSON or CSV extracts of the data sources
AWS Access and Skills:
- AWS account with sufficient privileges to stand up the outlined architecture
- Share your AWS account number with Senzing support
- Basic AWS familiarity with using CloudFormation and an AMI to provision and deploy software
- Create a key pair to SSH into your Senzing AWS instance
Workstation Software
- Install Docker to test and verify the services in the stack are working correctly
Senzing License
- Senzing includes an evaluation license for 100k records. If your data source record volume is greater than 100k records, see Obtaining and Using a Larger Evaluation License at the end of this article
Architecture
The stack is targeted to provide a demo experience using a VPC subnets with two availability zones. The AMI instance and database are in the same subnet. The network security group restricts the ingress/egress between the AMI instance and database on port 5432. The network security group for ingress to the public subnet is only for the default ports 22 and 8081. Port 22 is to SSH to AMI instance and 8081 is to reach the Senzing Entity Search Web App. The CloudFormation template allows a user to modify the ports and CIDR for the public network security group. This process also creates an SQS queue and a service account to access that queue. The service account is the only way to access and interact with the SQS service. CloudFormation will automatically create the service account, access key, and secret key.
Data Flow
This diagram shows how data flows into the stack.
Creating the Stack with CloudFormation
Standing up the stack will take 8-10 minutes.
-
Log into AWS
-
Select the region you wish to use from the upper right menu
- Navigate to CloudFormation
- Click Create Stack
- Select Template is ready and Amazon S3 URL
- Copy and paste the following URL in to the Amazon S3 URL field
https://senzing-factory.s3.us-east-2.amazonaws.com/senzing-demo/cloudformation/senzing-demo-4.1.0.yaml
- Enter a stack name
InfoWe have provided a set of recommended parameter defaults that work best for the evaluation.
- In the SSH section, enter KeyName of an existing EC2 KeyPair to enable SSH access
- On the review screen, select the checkbox for “I acknowledge that AWS CloudFormation might create IAM resources with custom names.”, and click Create Stack
- When the CloudFormation process completes, navigate to the Outputs tab and copy down the SqsQueueUrl, SqsUserAccessKey, and the SqsuserSecretKey values for later use
Testing the CloudFormation with Test Data
Loading Test Data to SQS
Senzing provides a sample dataset to test and verify the services in the stack are working correctly. With the information captured from the CloudFormation outputs, you execute a stream-producer container to load data into CloudFormation.
-
Modify the SqsQueueUrl, SqsUserAccessKey, and the SqsuserSecretKey values of this docker run command using the values you copied down while creating the stack
docker run -it --rm \ --env SENZING_INPUT_URL="https://s3.amazonaws.com/public-read-access/TestDataSets/loadtest-dataset-1M.json" \ --env SENZING_RECORD_MAX=5000 \ --env SENZING_SUBCOMMAND=json-to-sqs \ --env SENZING_SQS_QUEUE_URL="<SqsQueueUrl value>" \ --env AWS_ACCESS_KEY_ID=<SqsUserAccessKey value> \ --env AWS_DEFAULT_REGION=us-east-1 \ --env AWS_SECRET_ACCESS_KEY=<SqsuserSecretKey value> \
senzing/stream-producer - Select the region you previously deployed to from the upper right menu
- While stream-producer is running, navigate to SQS or Simple Queue Service
- Verify the messages are arriving in SQS
- Periodically refresh the page to verify stream-loader has processed all the messages into the Senzing demo stack
Explore the Test Data Using the Entity Search Web App
- Select the region you previously deployed to from the upper right menu
- Navigate to EC2
- Click on Instances
- Go to the EC2 instance and select the spawned AMI
- At the bottom of the page, copy the Public DNS (IPv4) value. You will use this value and the port value :8081 to construct a URL for the Senzing Entity Search Web App
- Open a browser, and navigate to the URL http://<Public DNS (IPv4) value>:8081
- Try searching for the following names: Alfred Coleman, and Amelia Burks
- Click on a search return to view the entity resume. The entity resume allows you to explore the details of an entity and its relationships
To further investigate the test data check out the Exploratory Data Anaylsis tools watch the videos or tutorials.
Loading Your Data
You will be subject to Amazon pricing for resource use based on the volume of data and processing.
Build the JSON or CSV for Senzing
To ensure success loading your data via JSON you will want to be sure to map your data correctly; reference the Generic Entity Specification. These examples contain the likely fields you will run into when mapping people org organizations to Senzing understood terms.
Sample Structure for a Person
The sample person structure contains fields for
- Primary name
- Date of birth and gender
- Passport, driver’s license, social security number, national insurance number
- Home and mailing addresses
- Home and cell phone numbers
- Email and social media handles
Example JSON:
{"RECORD_ID": 1001, "NAMES": [{ "NAME_TYPE": "PRIMARY", "NAME_LAST": "Jones", "NAME_FIRST": "Robert", "NAME_MIDDLE": "M", "NAME_PREFIX": "Mr", "NAME_SUFFIX": "Jr" }], "GENDER": "M", "DATE_OF_BIRTH": "1/2/1981", "PASSPORT_NUMBER": "PP11111", "PASSPORT_COUNTRY": "US", "DRIVERS_LICENSE_NUMBER": "DL11111", "DRIVERS_LICENSE_STATE": "NV", "SSN_NUMBER": "111-11-1111", "ADDRESSES": [{ "ADDR_TYPE": "HOME", "ADDR_LINE1": "111 First St", "ADDR_CITY": "Las Vegas", "ADDR_STATE": "NV", "ADDR_POSTAL_CODE": "89111", "ADDR_COUNTRY": "US" }, { "ADDR_TYPE": "MAIL", "ADDR_LINE1": "PO Box 111", "ADDR_CITY": "Las Vegas", "ADDR_STATE": "NV", "ADDR_POSTAL_CODE": "89111", "ADDR_COUNTRY": "US" }], "PHONES": [{ "PHONE_TYPE": "WORK", "PHONE_NUMBER": "800-201-2001" }, { "PHONE_TYPE": "CELL", "PHONE_NUMBER": "702-222-2222" }], "EMAIL_ADDRESS": "bob@jonesfamily.com", "SOCIAL_HANDLE": "@bobjones27", "SOCIAL_NETWORK": "twitter"} {"RECORD_ID": 1002, "NAMES": [{ "NAME_TYPE": "PRIMARY", "NAME_LAST": "Jones", "NAME_FIRST": "Bobby" }], "GENDER": "M", "DATE_OF_BIRTH": "2/1/1981", "ADDRESSES": [{ "ADDR_TYPE": "HOME", "ADDR_LINE1": "111 1st St", "ADDR_CITY": "Las Vegas", "ADDR_STATE": "NV", "ADDR_POSTAL_CODE": "89222" }]} {"RECORD_ID": 1003, "NAMES": [{ "NAME_TYPE": "PRIMARY", "NAME_LAST": "Jonze", "NAME_FIRST": "Martin" }], "GENDER": "", "DATE_OF_BIRTH": "", "PASSPORT_NUMBER": "PP11111", "PASSPORT_COUNTRY": "US", "NATIONAL_ID_NUMBER": "11111111", "NATIONAL_ID_COUNTRY_COUNTRY": "CA", "ADDRESSES": [{ "ADDR_TYPE": "HOME", "ADDR_LINE1": "333 3rd St", "ADDR_CITY": "Las Vegas", "ADDR_STATE": "NV", "ADDR_POSTAL_CODE": "89333" }]} {"RECORD_ID": 1004, "NAMES": [{ "NAME_TYPE": "PRIMARY", "NAME_LAST": "Jones", "NAME_FIRST": "Elizabeth", "NAME_MIDDLE": "R", "NAME_PREFIX": "Ms" }], "GENDER": "F", "DATE_OF_BIRTH": "2/2/1982", "PASSPORT_NUMBER": "PP22222", "PASSPORT_COUNTRY": "US", "DRIVERS_LICENSE_NUMBER": "DL22222", "DRIVERS_LICENSE_STATE": "NV", "SSN_NUMBER": "222-22-2222", "ADDRESSES": [{ "ADDR_TYPE": "HOME", "ADDR_LINE1": "111 First St", "ADDR_CITY": "Las Vegas", "ADDR_STATE": "NV", "ADDR_POSTAL_CODE": "89111", "ADDR_COUNTRY": "US" }], "PHONES": [{ "PHONE_TYPE": "CELL", "PHONE_NUMBER": "702-333-3333" }], "EMAIL_ADDRESS": "beth@jonesfamily.com"}
Sample Structure for an Organization
The sample organization structure contains fields for
- Primary name
- Tax ID number, like employer identification number
- Other ID numbers, like a Dunn and Bradstreet number
- Primary and mailing addresses
- Primary and other phone numbers
- Website and social media handles
Example JSON:
{"RECORD_ID": 2001, "NAMES": [{ "NAME_TYPE": "PRIMARY", "NAME_ORG": "Presto Company" }], "TAX_ID_NUMBER": "EIN11111", "TAX_ID_COUNTRY": "US", "ADDRESSES": [{ "ADDR_TYPE": "PRIMARY", "ADDR_LINE1": "Presto Plaza - 2001 Eastern Ave", "ADDR_CITY": "Las Vegas", "ADDR_STATE": "NV", "ADDR_POSTAL_CODE": "89111", "ADDR_COUNTRY": "US" }, { "ADDR_TYPE": "MAIL", "ADDR_LINE1": "Po Box 111", "ADDR_CITY": "Las Vegas", "ADDR_STATE": "NV", "ADDR_POSTAL_CODE": "89111", "ADDR_COUNTRY": "US" }], "PHONES": [{ "PHONE_TYPE": "PRIMARY", "PHONE_NUMBER": "800-201-2001" }], "WEBSITE_ADDRESS": "Prestofabrics.com", "SOCIAL_HANDLE": "@prestofabrics", "SOCIAL_NETWORK": "twitter"} {"RECORD_ID": 2002, "NAMES": [{ "NAME_TYPE": "PRIMARY", "NAME_ORG": "Presto Fabrics" }], "ADDRESSES": [{ "ADDR_TYPE": "PRIMARY", "ADDR_LINE1": "2001 Eastern", "ADDR_CITY": "Las Vegas", "ADDR_STATE": "NV", "ADDR_POSTAL_CODE": "89222" }], "PHONES": [{ "PHONE_TYPE": "PRIMARY", "PHONE_NUMBER": "800-201-2001" }]} {"RECORD_ID": 2003, "NAMES": [{ "NAME_TYPE": "PRIMARY", "NAME_ORG": "Fabrics Unlimited, Inc" }], "TAX_ID_NUMBER": "EIN33333", "TAX_ID_COUNTRY": "US", "OTHER_ID_NUMBER": 33333, "OTHER_ID_TYPE": "D&B", "ADDRESSES": [{ "ADDR_TYPE": "PRIMARY", "ADDR_LINE1": "2003 Southern Highlands, Pkwy", "ADDR_CITY": "Las Vegas", "ADDR_STATE": "NV", "ADDR_POSTAL_CODE": "89333", "ADDR_COUNTRY": "US" }], "PHONES": [{ "PHONE_TYPE": "PRIMARY", "PHONE_NUMBER": "800-301-3001" }], "WEBSITE_ADDRESS": "fabrics-unlimited.com"} {"RECORD_ID": 2004, "NAMES": [{ "NAME_TYPE": "PRIMARY", "NAME_ORG": "Fabrics Unlimited" }], "ADDRESSES": [{ "ADDR_TYPE": "PRIMARY", "ADDR_LINE1": "2004 Horizon Ridge", "ADDR_CITY": "Las Vegas", "ADDR_STATE": "NV", "ADDR_POSTAL_CODE": "89444" }], "PHONES": [{ "PHONE_TYPE": "PRIMARY", "PRIMARY_PHONE_NUMBER": "800-301-3001" }], "WEBSITE_ADDRESS": "fabrics-unlimited.com"}
Loading Your Data to SQS
Just like the sample dataset tested, you can load your data with the information captured from the CloudFormation outputs, execute the stream-producer container to load data into SQS.
-
Modify the SqsQueueUrl, SqsUserAccessKey, and the SqsuserSecretKey values of this docker run command using the values you copied down while creating the stack
docker run -it --rm \ --env SENZING_INPUT_URL="https://<your data set path location here>" \
# The following line of code sets the input record max for the load.
# We recommend you do your initnal loads in small batches.
# Smaller batch loads for inital evaluation ensure your mappings
# are correct and control your AWS costs.
--env SENZING_RECORD_MAX=5000 \
--env SENZING_SUBCOMMAND=json-to-sqs \
--env SENZING_SQS_QUEUE_URL="<SqsQueueUrl value>" \
--env AWS_ACCESS_KEY_ID=<SqsUserAccessKey value> \
--env AWS_DEFAULT_REGION=us-east-1 \
--env AWS_SECRET_ACCESS_KEY=<SqsuserSecretKey value> \
senzing/stream-producer
Explore Your Data Using the Entity Search Web App
- Select the region you previously deployed to from the upper right menu
- Navigate to EC2
- Click on Instances
- Go to the EC2 instance and select the spawned AMI
- At the bottom of the page, copy the Public DNS (IPv4) value. You will use this value and the port value :8081 to construct a URL for the Senzing Entity Search Web App
- Open a browser, and navigate to the URL http://<Public DNS (IPv4) value>:8081
- Try searching for the names you identified
- Click on a search return to view the entity resume. The entity resume allows you to explore the details of an entity and its relationships
Obtaining and Using a Larger Evaluation License
Senzing includes an evaluation license for 100k records. If your data source record volume is greater than 100k records, please contact Senzing support to request a larger evaluation license.
- Copy License to the EC2 instance using the scp command
scp -i "<your_ec2_keypairfilename>.pem" g2.lic username@<Public DNS (IPv4) value>:~/g2.lic
-
SSH into the EC2 instance, for example
ssh -i "<your_ec2_keypairfilename>.pem" username@<Public DNS (IPv4) value>
-
Copy the license file as root
mv ~/g2.lic /var/senzing/g2.lic
-
Edit your /var/senzing/etc/G2module.ini to use the new license file:
[PIPELINE] LICENSEFILE=/var/senzing/g2.lic
- Source the setupEnv file
source /var/senzing/setupEnv
- Use G2Command to verify the license was successfully made available
./var/senzing/python/G2Command.py
Initializing engine...
Welcome to the G2 shell. Type help or ? to list commands.
(g2) license
G2 license:
{Your new license info will display here} - Restart dependent services
systemctl restart senzing-api-server.service
systemctl restart senzing-stream-loader.service
Comments
0 comments
Please sign in to leave a comment.