Out of the box, Senzing is configured to use an embedded SQLite database for the entity repository to accelerate getting started. This article describes the steps to configure Senzing to use PostgreSQL as the entity repository.
Debian 10.1 was used in testing the steps outlined herein and the latest version of PostgreSQL available at the time; 11.5. This article assumes you are already a PostgreSQL user or familiar with MPostgreSQL, it only briefly covers the installation steps of PostgreSQL.
Install and Basic PostgreSQL Setup
Download and Install
The following is a brief overview of the steps required to install PostgreSQL. For the latest instructions, see the official PostrgeSQL Linux downloads (Debian) page
sudo sh -c 'echo "deb http://apt.postgresql.org/pub/repos/apt
$(lsb_release -cs)-pgdg main" > /etc/apt/sources.list.d/pgdg.list'
--quiet -O - https://www.postgresql.org/media/keys/ACCC4CF8.asc | sudo
apt-key add -
sudo apt-get update
sudo apt-get -y install postgresql
Start PostgreSQL and check status. Replace
<VERSION> with the installed PostgreSQL version number .
sudo systemctl start postgresql@<VERSION>-main.service
sudo systemctl status postgresql@<VERSION>-main.service
If started successfully you will see output similar to the following
If you would like the PostgreSQL server to start up during system boot
sudo systemctl enable postgresql@<VERSION>-main.service
Authentication and Remote Connections
Authentication is controlled through a configuration file, depending on your infrastructure, deployment and operational directives you may need to alter this file. The details of such changes are beyond the scope of this article, for additional information see Client Authentication.
The configuration file to change is typically named pg_hba.conf and is located in the system /etc/ directory. For example, /etc/postgresql/11/main/pg_hba.conf
The following outlines an example pg_hba.conf file:
# TYPE DATABASE USER ADDRESS METHOD # Database administrative login by Unix domain socket
local all postgres ident # "local" is for Unix domain socket connections only
local all all md5 # IPv4 local connections: host all all 127.0.0.1/32 md5 # IPv6 local connections: host all all ::1/128 md5 # Allow replication connections from localhost, by a user with the # replication privilege. local replication all md5
PostgreSQL by default only allows local connections. To allow remote connections, edit /etc/postgresql/<VERSION>/main/postgresql.conf and modify the CONNECTIONS AND AUTHENTICATION section. Add a new line for listen_addresses or modify the default one.
... #------------------------------------------------------------------------------ # CONNECTIONS AND AUTHENTICATION #------------------------------------------------------------------------------ # - Connection Settings - #listen_addresses = 'localhost' # what IP address(es) to listen on; # comma-separated list of addresses; # defaults to 'localhost'; use '*' for all # (change requires restart) listen_addresses = '*' ...
Restart the server after any changes
sudo systemctl restart postgresql@<VERSION>-main.service
Open Default PostgreSQL Firewall Port
If connecting remotely and you have a firewall running open the PostgreSQL port if required. For example if UFW is running:
sudo ufw allow 5432/tcp
Setup a Senzing User
Check the following permissions meet your organization's policies. The user name 'senzing' is used in the following outline, change as appropriate to the user you will be using Senzing with.
sudo -u postgres psql
CREATE USER senzing WITH ENCRYPTED PASSWORD '<user_password>';
- Change <user_password> to your desired password to access PostgreSQL
CREATE DATABASE g2 OWNER=senzing;
Create New Database & Add Senzing Schema
psql -U senzing -d g2 -W
- senzing = User id previously used in above commands, substitute your user if different
- <senzing_project_path> = Your Senzing project path
Modify the <project_path>/etc/G2Module.ini file to reference the new PostgreSQL database and schema. You can comment out the current lines by prefixing them with # and adding the modified ones below.
- G2Module.ini - Change or add a new CONNECTION entry
- senzing = Senzing user
- password4g2 = Password for the Senzing user
- 127.0.0.1 = Senzing database server address
- 5432 = PostgreSQL port number
Configure Database Parameters
Tune your database for the best performance! Follow the instructions in Tuning Your Database. Restart your PostreSQL database for the changes to be effective.
Run a Test Load and Export
Perform a test load of the supplied sample data and perform an export to test the new database setup.
- Source setupEnv
- Load the sample data
python3 G2Loader.py -P -p demo/sample/project.csv
- Create the export file