The Senzing Exploratory Data Analysis (EDA) G2 Audit Utility was designed to provide a complete analysis of the differences between entity resolution from two different systems or from configuration changes to the same system.
For instance, if your users asked for a configuration change that would create more matches, you would want to see the full effect of those changes and answer these questions:
- If you expected 10% more matches, did you actually only get 5% more or 25% more
- If you expected to not lose any prior matches, did you actually lose 10% or more
- All of the boxes represent Entities. The prior run has six entities while the newer run has five
- The darker blue boxes are Clusters as they contain more than one record. The prior run has two clusters, while the newer run has three
- A Pair is any two records that are part of the same entity:
- The formula to compute the number of pairs in a cluster is: cluster size * (cluster size -1) / 2
- In the prior run, entity P1 has one pair, but entity P4 has 15 pairs: (G-H, G-I, G-J, G-K, G-L), (H-I, H-J, H-K, H-L), (I-J, I-K, I-L), (J-K, J-L), (K-L)
- A New Negative is a record that matched in the prior run, but not in the newer run. From the prior run’s point of view, one new negative was introduced: records A and B were split apart
- A New Positive is a record that matched in the newer run, but not in the prior run. From the prior run’s point of view, two new positives were introduced: records C and D were merged together, and so were records E and F
There are three broadly recognized computations that can be made by a Cluster, Pair or Accuracy:
- Precision - quantifies how many more matches the new run made. The more “merges” or “new positives” found, the lower the precision score
- Recall - quantifies how many less matches the new run made. The more “splits” or “new negatives” introduced, the lower the recall score
- F1 Score – creates a balanced harmony between the two. It helps you identify if you made a change that reduced overmatching at the expense of undermatching or vice-versa
Four analyses are performed to ensure that the results are viewed from all angles. It is left up to the analyst to determine which scores are important to them and for what reasons.
- Cluster Analysis - considers all clusters equal, regardless of their size. It is not well suited to data sets with large entity clusters where a couple of records shift from one entity to another, as you get no credit for the records that did match on those large clusters
- Entity Analysis - behaves just like cluster analysis except that singletons are included as well. On this small data set it looks even harsher than the pure cluster analysis. However, on large data sets with many of the same unmatched records (singletons), the scores at the entity level should be better than the cluster level
- Pair-Wise Analysis - counts how many matches were made regardless of the size of the cluster. This is a good measurement if you are trying to determine if the results are similar despite small differences
- Accuracy Analysis - is the traditional false positive, false negative measurement against the prior run as a gold standard or at least a stake in the ground. Just remember, one run’s false positive is the other run’s false negative but it does quantify the differences quite well
These formulas are the same for the cluster and pair-wise level:
- Precision = common count / prior count
- Recall = common count / new count
However, they are different at the accuracy level, as they specifically count records that are different:
- Precision = prior positives / (prior positives + new positives)
- Recall = prior positives / (prior positives + new negatives)
The F1 Score is computed the same for all three:
- F1 Score = 2 * (Precision * Recall) / (Precision + Recall)
These are the actual statistics computed based on the prior and newer run results above:
At a high level, it is obvious that these two runs made many of the same matches - 15 pairs out of the prior 16 in common.
Only one cluster was split creating one new negative resulting in lower recall score - 93% at the Pair-Wise Level and 88% on the accuracy scale.
Only two new matches were made creating two merged clusters and two new positives, resulting in a lower Precision Score - 88% at the Pair-Wise Level and 80% on the accuracy scales.