Out of the box, Senzing is configured to use an embedded SQLite database for the entity repository to accelerate getting started. This article describes the steps to configure Senzing to use PostgreSQL as the entity repository.
Debian 10.1 was used in testing the steps outlined herein and the latest version of PostgreSQL available at the time; 11.5. This article assumes you are already a PostgreSQL user or familiar with MPostgreSQL, it only briefly covers the installation steps of PostgreSQL.
Install and Basic PostgreSQL Setup
The following is a brief overview of the steps required to install PostgreSQL. For full details on installing PostgreSQL on Linux please see the provided links above.
Download the repository installation package directly from the apt repository website or with the wget command.
- PostgreSQL apt Repository
- Choose your Debian version
- Follow the next 2 steps outlined for importing the repository
- Install PostgreSQL
sudo apt -y install postgresql-11
- Start PostgreSQL and check status
sudo systemctl start email@example.com
sudo systemctl status firstname.lastname@example.org
- If started successfully you will see output similar to the following
- If you would like the PostgreSQL server to start up during system boot
sudo systemctl enable email@example.com
Authentication and Remote Connections
Authentication is controlled through a configuration file, depending on your infrastructure, deployment and operational directives you may need to alter this file. The details of such changes are beyond the scope of this article, for additional information see Client Authentication.
The configuration file to change is typically named pg_hba.conf and is located in the system /etc/ directory. For example, /etc/postgresql/11/main/pg_hba.conf
The following outlines an example pg_hba.conf file where an entry has been added to the local settings to support the senzing user being trusted for local connections to a database named g2.
# TYPE DATABASE USER ADDRESS METHOD # "local" is for Unix domain socket connections only
local g2 senzing trust local all all peer # IPv4 local connections: host all all 127.0.0.1/32 md5 # IPv6 local connections: host all all ::1/128 md5 # Allow replication connections from localhost, by a user with the # replication privilege. local replication all peer host replication all 127.0.0.1/32 md5 host replication all ::1/128 md5
PostgreSQL by default only allows local connections. To allow remote connections, edit /etc/postgresql/11/main/postgresql.conf and modify the CONNECTIONS AND AUTHENTICATION section. Add a new line for listen_addresses or modify the default one.
... #------------------------------------------------------------------------------ # CONNECTIONS AND AUTHENTICATION #------------------------------------------------------------------------------ # - Connection Settings - #listen_addresses = 'localhost' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost'; use '*' for all # (change requires restart) listen_addresses = '*' ...
Restart the server after any changes
sudo systemctl restart firstname.lastname@example.org
Open Default PostgreSQL Firewall Port
If connecting remotely and you have a firewall running open the PostgreSQL port if required. For example if UFW is running:
sudo ufw allow 5432/tcp
Setup a Senzing User
Check the following permissions meet your organization's policies. The user name 'senzing' is used in the following outline, change as appropriate to the user you will be using Senzing with.
sudo -u postgres psql
CREATE USER senzing WITH ENCRYPTED PASSWORD '<user_password>';
- Change <user_password> to your desired password to access PostgreSQL
Create New Database & Add Senzing Schema
Without exiting the psql session created above:
CREATE DATABASE g2 OWNER=senzing;
\c g2 senzing
- senzing = User id previously used in above commands, substitute your user if different
- <project_path> = Your Senzing project path
Modify the <project_path>/etc/G2Module.ini file to reference the new PostgreSQL database and schema. You can comment out the current lines by prefixing them with # and adding the modified ones below.
- G2Module.ini - Change or add a new CONNECTION entry
- senzing = Senzing user
- password4g2 = Password for the Senzing user
- 127.0.0.1 = Senzing database server address
- 5432 = PostgreSQL port number
Configure Database Parameters
Set the database parameters for the Senzing workload. Be sure to stop and start PostgreSQL for the changes to be effective.
Run a Test Load and Export
Perform a test load of the supplied sample data and perform an export to test the new database setup.
- Source setupEnv
- Load the sample data
python3 G2Loader.py -P -p demo/sample/project.csv
- Create the export file