There are many different types of GEO location information and ways to use it. In this article, we discuss the use of static GEO locations that would be used for buildings, geographic features, cell towers, etc that have stable positions.
Senzing G2 uses GEO location in two ways:
1) To find nearby entities to consider as candidates in Entity Resolution
2) To determine if two entities are close enough to be the same or far enough away to be unlikely to be the same
How do we find candidates?
The geohash geospatial-quantizing algorithm encodes a pair of latitude and longitude coordinate values into an alphanumeric string. Two entities that reside in the same SpaceTimeBox (STB) will have comparable geohash strings and an overlapping period of time that the geohash is valid. For the purposes of this GEO location feature, the use is for entities that do not move so the time component is set to always.
In addition to a simple STB, G2 Senzing automatically creates varying density SpaceTimeBoxes to accommodate the density of the entity population.
G2 Senzing will also adjust to the accuracy of GEO location if a precision is provided. GEO precision is a numeric value representing the uncertainty, in meters, of the GEO location. G2 Senzing will automatically produce STBs to cover the uncertainty provided.
How do we score GEO Location?
G2 Senzing implements the Haversine algorithm to determine distance and then maps that to the G2 scoring buckets of SAME, CLOSE, LIKELY, PLAUSIBLE and UNLIKELY for the feature score. If the distance is UNLIKELY then G2 will tend to keep the entities apart based on this feature. As part of Entity Resolution, this feature is likely one of many features taken into consideration for the overall decision.
By default the following distances, in meters, are used for each of the scoring buckets for the GEO location feature:
- SAME <= 5m
- CLOSE <= 19m
- LIKELY <= 76m
- PLAUSIBLE <= 610m
- UNLIKELY <= 2400m
These defaults can be changed by editing the stb.config file in the Senzing g2/data directory (e.g. /opt/senzing/g2/data/stb.config).
# Values in meterss
Creating data for GEO Location
Senzing G2 recognizes the following fields for GEO:
- GEO_LATLONG: Latitude and longitude in degrees in a single column and separated by a comma (e.g. "40.7269223,-73.9817648"). If this is provided, then neither GEO_LATITUDE or GEO_LONGITUDE should be provided.
- GEO_LATITUDE: Latitude in degrees (e.g. "40.7269223"). If this is specified then GEO_LONGITUDE must be provided.
- GEO_LONGITUDE: Longitude in degrees (e.g. "-73.9817648"). If this is specified then GEO_LATITUDE must be provided.
- GEO_PRECISION: (optional) GEO Precision in meters (e.g. "5.0"). If this is not provided then the GEO location is assumed to be exact.
1001,Mr Robert M Jones Jr,1/2/1981,"40.7269223,-73.9817648"
1001,Mr Robert M Jones Jr,1/2/1981,40.7269223,-73.9817648,5.0