This article has been constructed from the original blog post by our founder and CEO Jeff Jonas.
At Senzing we make it quick and easy to accurately combine data about people and companies from different data sources in real-time. No other AI entity resolution technology that exists today can do this in real-time, at scale, with this level of accuracy. Our solution requires no training, tuning or experts, while also maintaining great affordability.
However, we recognize that this sounds great, but needs to be validated. With that in mind, you can try it right now on Paycheck Protection Program (PPP) data in under 20 minutes.
To help you get started, we have prepared three Senzing-ready .csv files filtered to contain Las Vegas related records.
[NOTE: Instructions for running the whole PPP loan file are located at the bottom of this blog post.]
If you need any help along the way, reach out to support@senzing.com for some free assistance.
IMPORTANT DISCLAIMERS
- The Senzing-ready .csv links provided are snapshots from the past, so the information is out of date. If you are doing real research, be sure to download the latest files (see links at bottom).
- Many organizations have multiple legal entities, sometimes similarly named. Without more data, Senzing may match these entities if they are located at the same business address. Such duplicates are likely legitimate. Note: as more data is loaded, these overmatches begin to automatically self-correct, which is a unique capability of Senzing.
Loading and Exploring Las Vegas PPP Data
- Download and install the free Senzing App here. [No personal data flows to Senzing, Inc.]
- Launch the Senzing App.
- Create a Project.
- Select “Projects” (left toolbar, icon with the hammer).
- Select “Add Project.”
- Name the project whatever you like e.g., “PPP Las Vegas.”
- Select “Create.”
- Load the PPP file into Senzing.
- Download the “PPP_Loans_Over_$150k_LasVegas.csv” that we have prepared here.
- Select “Data” (left toolbar, icon with the cylinder).
- Drag and drop the “PPP_Loans_Over_$150k_LasVegas.csv” file onto the canvas.
- Click the “Load” on the card.
- Review the results.
- Once loading is complete, Click “Review.” (Once loaded, “Load” will change to “Review”)
- Explore the Duplicates — records Senzing thinks belong to the same organization. The Match Key column explains why they are related.
- On the far right click the little “expand” icon (looks like a small blue clock) that appears as you hover over any of the “Other Data” column entries.
- Once finished exploring “Duplicates,” click on “Possibly Related.” Click on any “Entity ID” (left column in chart) to see the entity’s resume.
Highlights
- Notice in the top blue bubble that there are 40 duplicates.
- Looking over these duplicates you will notice some are probably false positives e.g., these three entities “NG WASHINGTON”, “NG WASHINGTON II” and “NG WASHINGTON III” are probably different legal entities – each eligible for a PPP loan. Records like this match because of the name and address similarity.
- You may notice other duplicates that look like identical legal entities – these are examples where further human analysis is required.
- Select “Search” (left toolbar, icon with a magnifying glass) and search for this address: “3130 S Durango Dr STE 400 Las Vegas.” Click any of the possibly related entities and you will see something like this (once you click on any of the possibly related entity names):
- Click the “Show Match Key” in the lower right corner and you will see how these three entities “BOYACK AND ASSOCIATES INC”, “BIA LAS VEGAS LLC” and “BIA NEVADA, LLC” are related.
Add Reference Data to Improve Accuracy
Reference data are carefully curated data sets that can be used to improve entity resolution accuracy. For this demonstration, we will be using a publicly available file called the National Provider Index (NPI) which contains a list of US health care providers curated by Health and Human Services.
- Load the NPI file into Senzing.
- Download the “NPI_Orgs_LasVegas.csv” that we have prepared here.
- Select “Add Data Source.”
- Drag and drop the “NPI_Orgs_LasVegas.csv” file onto the canvas.
- Click the “Load” on the card.
- Review the results.
- Once loading is complete, click “Review.” Notice there are zero duplicates, six thousand Possible Duplicates and even more Possibly Related.
- Click the database symbol on the left to return to the data source cards.
- Now click “Review” on the PPP_LOANS … card.
- Notice there are now 41 duplicates in the PPP data – recall, before loading the NPI file there were only 40. Which match is new? Hint: If you scroll down, you will find a blue “More” button on the left side which reveals records from other data sources that may have contributed to the matching decision.
- Notice there are now two possible duplicates – recall, before loading the NPI file there were zero.
- Click on the “2” Possible Duplicates. Can you figure out what Senzing learned that caused it to change its mind about these matches?
Highlights
- Using the NPI reference data, these three PPP records came together: “BAI LAS VEGAS LLC“, “BOYACK AND ASSOCIATES INC”, and “BAI NEVADA, LLC”. Why? When the NPI record revealed BAI was a DBA (doing business as) “BOYACK AND ASSOCIATES”, Senzing’s entity-centric learning technology, caused Senzing to reevaluate its earlier decision and improve it, in real-time.
- In a similar manner, the NPI reference data surfaced two possible duplicates – these have close names at the same address.
- Other popular reference data that can significantly improve matching results are commercially available from data providers like Dun & Bradstreet, Moody’s and OpenCorporates.
How to Combine Other Data to Improve Context
Combining additional data from other public and private sources is easy too. For example, publicly available data from the US Department of Labor Wage and Hour Compliance Actions can be easily added to discover which PPP recipients also have labor violations.
- Load the DOL Compliance Actions file into Senzing.
- Download the “Dept_Labor_Whisard_LasVegas.csv” that we have prepared here.
- Select “Data.”
- Drag and drop the “Dept_Labor_Whisard_LasVegas.csv” file onto the canvas.
- Click the “Load” on the card.
- Review the results.
- Once loaded, again click “Review” on the PPP LOANS … card.
- In the upper left area of the screen, you’ll see “PPP Loans” in a drop-down. To the right of this, you will see the word “NONE”. Click this drop-down to change “NONE” and to the “DOL – WHISARD” data source.
- Now click in the middle of the blue circles to see the matches between these data sources.
- Notice the CASE_VIOLTN_CNT values (Case Violations) on the far right.
- Scrolling down, use the blue More button on the right side to reveal records from other data sources that may have contributed to the matching decision.
Highlights:
- Before loading the US Dept of Labor file there were only two possible duplicates. Now there are three (3). To see this, change the “US DOL – WHD” data source back to “NONE”. Then click on the “3” possible duplicates. Take a look, one of these is new. Take away: although this is not considered reference data, new data from any source can be used to help improve past, present and future matches.
- While on the same PPP Possible Duplicates screen, check out the Match Key column. Notice all of the rows have an “-NPI_Number” which means these values were different. Had these not disagreed, Senzing would have considered these duplicates.
Success!
Unlike other technologies that take a long time to set up and configure, Senzing delivers with ease. Feel free to entity resolve your data e.g., your contacts, Salesforce accounts, vendor file, marketing list, etc. If you want additional info on getting started, check out this article.
We would love to hear any feedback, especially suggestions on how to help you solve your entity challenges or make the solution better. You can reach us here.
Thank you.
BONUS SECTION
Just for fun, check out these additional Senzing-ready files, filtered for Las Vegas:
- Medicare Supplier Directory: A list of suppliers and the supplier’s Medicare participation status. Senzing-ready file.
- Physician Compare: Containing general information about individual eligible professionals. Senzing-ready file.
- Office of Inspector General Exclusions: Excluded individuals and organizations from federal health care programs like Medicare and Medicaid. Senzing-ready file (filtered to organizations only).
Instructions for running all the PPP loan data:
The Senzing API, our main product, is for developers. Our technology makes the complicated task of entity resolution trivial for programmers. Senzing is real-time and scalable to billions of records. More on our unique technology here.
If you are not a developer, the simple Senzing App is for you. While 100k records are free, an affordable license upgrade is available here.
To speed up your full-file PPP project, here are some key links. Use the Website link if you need current information for real work. Otherwise, if you are just experimenting, try our Senzing-ready links which are out of date snapshots:
PPP Loans over $150k | Website | Senzing-ready Link |
National Provider Index | Website | Senzing-ready Link (filtered for organizations) |
Dept of Labor Compliance Actions | Website | Senzing-ready Link |
Medicare Supplier Directory | Website | Senzing-ready Link |
Physician Compare | Website | Senzing-ready Link |
OIG Exclusions | Website | Senzing-ready Link (filtered for organizations) |
REFERENCE LINKS
Senzing’s Developer Page
Uniquely Senzing White Paper
Entity Resolution Processes White Paper
Slow Motion Entity Resolution Video
Entity-Centric Learning
Architecture Pattern for Perpetual Insights
Our Customers & Partners
Comments
0 comments
Please sign in to leave a comment.