Scoring Search Results

This article describes a configurable scoring algorithm that can be implemented after a search to rank the entities returned by the best matching record.

Senzing does NOT provide weighted algorithm-based scoring but uses Principle-based ER to avoid such complexities while providing superior ER results. That said, if you want a weighted score, Senzing provides enough information to do so in your solution.

At the heart of this strategy is a JSON configuration document that adds either positive or negative weight to the matching (or non-matching) feature scores of the entities returned.

Here is an example ...

{
    "scoring": {
        "NAME": {
            "threshold": 50,
            "+weight": 80
        },
        "DOB": {
            "threshold": 85,
            "+weight": 10,
            "-weight": 30
        },
        "ADDRESS": {
            "threshold": 80,
            "+weight": 10
        },
        "PHONE": {
            "threshold": 80,
            "+weight": 10
        },
        "EMAIL": {
            "threshold": 80,
            "+weight": 10
        },
        "SSN": {
            "threshold": 90,
            "+weight": 10,
            "-weight": 30
        },
        "DRLIC": {
            "threshold": 90,
            "+weight": 10
        },
        "TAX_ID": {
            "threshold": 90,
            "+weight": 10
        }
    }
}

For each scored feature returned by the search api, a threshold is defined and scores above the threshold get the +weight multiplied to its best matching score. Scores below its threshold get the optional -weight applied to the total score.

Note: Negative weights should only be applied to exclusive features. For instance if searching for a name and address, the entity with the highest matching name and address should appear on top. But people do move and the best name match may not have a matching address. However, if searching for a name, address and DOB, the name and address may both match but the dob is definitely different which may be the case with a JR/SR or really a completely different person that happened to have lived at that address.

The searchByAttributes API call returns a scoring section that may return multiple scores for a single feature. For instance, if an entity in the database had a primary name as well as an AKA such as Sally Smith and Sally Jones and you searched for either name, both names would appear in the scoring section. The algorithm described in this article should only score the best matching name, not both.

See ...

Search API Response message

Interpreting API response messages

Here is a piece of python code that computes the best score by feature ...

 bestScores = {}
 bestScores['NAME'] = {}
 bestScores['NAME']['score'] = 0
 bestScores['NAME']['value'] = 'n/a'
 for featureCode in resolvedEntity['MATCH_INFO']['FEATURE_SCORES']:
     if featureCode == 'NAME':
         scoreCode = 'GNR_FN' #--use the full name score from GNR
     else: 
         scoreCode = 'FULL_SCORE' #--use full score from all other algorithms
     for scoreRecord in resolvedEntity['MATCH_INFO']['FEATURE_SCORES'][featureCode]:
         matchingScore= scoreRecord[scoreCode]
         matchingValue = scoreRecord['CANDIDATE_FEAT']
         if featureCode not in bestScores:
             bestScores[featureCode] = {}
             bestScores[featureCode]['score'] = 0
             bestScores[featureCode]['value'] = 'n/a'
         if matchingScore > bestScores[featureCode]['score']:
             bestScores[featureCode]['score'] = matchingScore
             bestScores[featureCode]['value'] = matchingValue

Once the best score by feature has been placed in a structure, you can simply go through it to sum up the overall match score.

 matchScore = 0
 for featureCode in bestScores:
     if featureCode in mappingDoc['scoring']:
         if bestScores[featureCode]['score'] >= mappingDoc['scoring'][featureCode]['threshold']:
             matchScore += int(round(bestScores[featureCode]['score'] * (mappingDoc['scoring'][featureCode]['+weight'] / 100),0))
         elif '-weight' in mappingDoc['scoring'][featureCode]:
             matchScore += -mappingDoc['scoring'][featureCode]['-weight'] #--actual score does not matter if below the threshold

At this point you can sort the matching entities by the computed match score descending so that the search results are properly sorted for the user with the best matching entities first.

Furthermore, an overall matching threshold can be applied so that the user never even sees the lower strength matches. In fact, the user could even be given access to the weights and thresholds to customize their searches based on what they want to see.

Articles in this section

Comments

Articles in this section

Related articles