Overview
There are many different types of GEO location information and ways to use it. In this article, we discuss the use of static GEO locations that would be used for buildings, geographic features, cell towers, etc that have stable positions.
Senzing uses GEO location in two ways:
- To find closely located entities to consider as candidates in entity resolution
- To determine if two entities are close enough to be the same or far enough away to be unlikely to be the same
How Do We Find Candidates?
The geo hash geospatial-quantizing algorithm encodes a pair of latitude and longitude coordinate values into an alphanumeric string. Two entities that reside in the same Space Time Box (STB) will have comparable geo hash strings and an overlapping period of time that the geo hash is valid. For the purposes of this GEO location feature, the use is for entities that do not move so the time component is set to always.
In addition to a simple STB, Senzing automatically creates varying density Space Time Boxes to accommodate the density of the entity population.
Senzing will also adjust to the accuracy of GEO location if a precision is provided. GEO precision is a numeric value representing the uncertainty, in meters, of the GEO location. Senzing will automatically produce STBs to cover the uncertainty provided.
How Do We Score GEO Location?
Senzing implements the Haversine formula to determine distance and then maps that to scoring buckets of SAME, CLOSE, LIKELY, PLAUSIBLE and UNLIKELY for the feature score. If the distance is UNLIKELY then Senzing will tend to keep the entities apart based on this feature. As part of entity resolution, this feature is likely one of many features taken into consideration for the overall decision.
By default the following distances, in meters, are used for each of the scoring buckets for the GEO location feature:
- SAME <= 5m
- CLOSE <= 19m
- LIKELY <= 76m
- PLAUSIBLE <= 610m
- UNLIKELY <= 2400m
These defaults can be changed by editing the stb.config file in the Senzing /data directory.
[THRESHOLDS]
# Values in meterss
# =SAME,CLOSE,LIKELY,PLAUSIBLE,UNLIKELY
DEFAULT=5,19,76,610,2400
Creating Data For GEO Location
Senzing recognizes the following fields for GEO:
- GEO_LATLONG: Latitude and longitude in degrees in a single column and separated by a comma (e.g. "40.7269223,-73.9817648"). If this is provided, then neither GEO_LATITUDE or GEO_LONGITUDE should be provided.
- GEO_LATITUDE: Latitude in degrees (e.g. "40.7269223"). If this is specified then GEO_LONGITUDE must be provided.
- GEO_LONGITUDE: Longitude in degrees (e.g. "-73.9817648"). If this is specified then GEO_LATITUDE must be provided.
- GEO_PRECISION: (optional) GEO Precision in meters (e.g. "5.0"). If this is not provided then the GEO location is assumed to be exact.
CSV mapped data examples:
RECORD_ID,PRIMARY_NAME_FULL ,DATE_OF_BIRTH,GEO_LATLONG 1001 ,Mr Robert M Jones Jr,1/2/1981 ,"40.7269223,-73.9817648"
RECORD_ID,PRIMARY_NAME_FULL ,DATE_OF_BIRTH,GEO_LATITUDE,GEO_LONGITUDE,GEO_PRECISION 1001 ,Mr Robert M Jones Jr,1/2/1981 ,40.7269223 ,-73.9817648 ,5.0
Learn more about entity attribute behavior details in out Principle-Based Entity Resolution whitepaper
Comments
0 comments
Please sign in to leave a comment.