G2Export - How to Consume Resolved Entity Data - JSON Format – Senzing®

Overview

G2Export - How to Consume Resolved Entity Data provided an overview of using G2Export to extract resolved entity information of loaded source data from G2, and how to interpret the output. This article focused on the default CSV output format, here we will take a look at how this is represented when using JSON.

If you'd like to load the sample demo data in to G2 as used in this article:-

python3 G2Loader.py -P -p demo/sample/project.csv
- Note:- This will purge (-P) the current G2 repository if you have previously loaded any data

A very useful tool for managing and parsing JSON data on Linux is jq. The output shown herein was pretty printed using jq. To install it on CentOS:-

sudo yum install jq -y

Running G2Export

To export the resolved entity information in JSON format to the default file g2export.json :-

python3 G2Export.py -F JSON

JSON Example Output

The subsequent example output is the single JSON record for a single resolved entity 5 (ENTITY_ID), pretty printed using the jq command:-

jq <<< '<JSON_RECORD>'

{
  "RESOLVED_ENTITY": {
    "ENTITY_ID": 5,
    "RECORDS": [
      {
        "DATA_SOURCE": "CUSTOMERS",
        "RECORD_ID": "1005",
        "ENTITY_TYPE": "GENERIC",
        "INTERNAL_ID": 5,
        "ENTITY_KEY": "39DAE53669DC4071D5AAADCEE37238A126F0C6AF",
        "ENTITY_DESC": "Rob E Smith",
        "MATCH_KEY": "",
        "MATCH_LEVEL": 0,
        "MATCH_LEVEL_CODE": "",
        "MATCH_SCORE": 0,
        "ERRULE_CODE": "",
        "REF_SCORE": 0,
        "LAST_SEEN_DT": "2021-09-09 14:15:47.150"
      },
      {
        "DATA_SOURCE": "WATCHLIST",
        "RECORD_ID": "1006",
        "ENTITY_TYPE": "GENERIC",
        "INTERNAL_ID": 100001,
        "ENTITY_KEY": "9F111B0689C209355CF433FA        1D446427614BC246",
        "ENTITY_DESC": "Rob Smith Sr",
        "MATCH_KEY": "+NAME+DRLIC",
        "MATCH_LEVEL": 1,
        "MATCH_LEVEL_CODE": "RESOLVED",
        "MATCH_SCORE": 13,
        "ERRULE_CODE": "SF1_CNAME",
        "REF_SCORE": 8,
        "LAST_SEEN_DT": "2021-09-09 14:16:13.285"
      }
    ]
  },
  "RELATED_ENTITIES": [
    {
      "ENTITY_ID": 1,
      "MATCH_LEVEL": 2,
      "MATCH_LEVEL_CODE": "POSSIBLY_SAME",
      "MATCH_KEY": "+NAME+ADDRESS-DOB",
      "MATCH_SCORE": 12,
      "ERRULE_CODE": "CNAME_CFF_DEXCL",
      "REF_SCORE": 5,
      "IS_DISCLOSED": 0,
      "IS_AMBIGUOUS": 0,
      "RECORDS": [
        {
          "DATA_SOURCE": "CUSTOMERS",
          "RECORD_ID": "        1001"
        },
        {
          "DATA_SOURCE": "CUSTOMERS",
          "RECORD_ID": "1002"
        },
        {
          "DATA_SOURCE": "CUSTOMERS",
          "RECORD_ID": "1003"
        },
        {
          "DATA_SOURCE": "CUSTOMERS",
          "RECORD_ID": "1004"
        }
      ]
    }
  ]
}

The JSON document represents the similar information as seen in the CSV example, the following outlines the main components.

At the root is the single resolved entity, in this case ENTITY_ID is 5 - Rob E Smith
ENTITY_ID 5 consists of 2 source records contained within the RECORDS array
- One came from the CUSTOMERS DATA_SOURCE and its unique identifier from the source (RECORD_ID) is 1005
- The other came from the WATCHLIST DATA_SOURCE and its unique identifier from the source (RECORD_ID) is 1006
- The RECORD for 1006 contains the MATCH_KEY and MATCH_LEVEL information detailing why these 2 records resolved together
RESOLVED_ID 1 contains a RELATED_ENTITY array indicating the entity has possible matches and/or relationships to 1 or more other distinct entities
- There is 1 related entity and the object describes the relationship, ENTITY_ID is the RESOLVED_ID of the related entity
- RESOLVED_ID 5 is related to RESOLVED_ID 1
- Each RELATED_ID object describes the relationship details
- Each RECORD array contains 1 or more objects describing the record(s) constituting the related entity

Articles in this section

G2Export - How to Consume Resolved Entity Data - JSON Format

Comments

Articles in this section

Related articles