How Are Secret Keys Handled?
Secret keys are automatically generated (per instance) using openssl’s random bytes generator. The secret key is then placed directly into the secure store (never seen by human eyes). Alternatively, if a user’s policies require it, a user can provide a secret key and store it in the secure store. When hashing utilities are deployed to multiple locations, the secret key is shared by copying the secure store.
How Are Fields Hashed?
By default, Senzing uses an HMAC-SHA2-256 one-way hashing algorithm with a 1024-bit secret key. While this construct was developed for use in IPSec, we use it here for entity resolution with no modifications.
What Fields Are Hashed?
This varies by implementation. Configuration determines whether all or none of the fields are hashed. Attribution fields (source system and record ID) are typically not hashed.
Where Are The Fields Hashed?
Where the fields are hashed is implementation-specific, often following one of these patterns:
- Local Hashing: As records are being ingested into the Senzing ER engine, selected values are hashed as described above. The clear text is discarded. The hashed values are retained in the database.
- Single-tier Hashing: Before records are transmitted from a source system to a secondary system, selected values are hashed as described above. The recipient system then performs entity resolution using the hashed values. The hashed values are retained in the database.
- Multi-tier Hashing: Before records are transmitted from a source system to a secondary system, selected values are hashed as described above. The independent recipient re-hashes using a different secret key. “n” re-hashing tiers can be implemented. Entity resolution is performed on the final hashed values. These hashed values are retained in the database.
What Are Some Implementation Best Practices?
In addition to all security best practices like resetting passwords on a regular basis, here are some additional best practices when deploying Selective Field Hashing:
- Encrypt all communication flows whether the data is hashed or not.
- Use encrypted data at rest whether the data is hashed or not.
- Do everything possible to prevent an owner of the hashed data from gaining access to secret keys.
- Prevent humans from observing the secret keys.
- Hash all identifiers to increase the difficulty of field re-construction within a stolen database.
- Change secret keys periodically.
- When sharing secret keys, use a different channel than the data flow channel (e.g., FedEx).
- Hire a security/crypto professional to help you architect a system consistent with your goals.
- Be sure you understand how systems like this can be attacked e.g., dictionary attacks, chosen text attacks, statistical attacks.
We encourage you to submit additional best practice suggestions.
What Security and Privacy Claims Can Be Made?
The most basic claim that can be made is: storing hashed data helps reduce the risk of unintended disclosure. With implementation details and professional oversight additional security and privacy claims can be made.
For further details on this topic check out the Discovery Without Disclosure blog post.