Introduction
By default Senzing is set to use an embedded SQLite database. This article describes the steps to configure Senzing to use PostgreSQL as the entity repository and how to create the Senzing schema within a PostgreSQL database.
CentOS 7 was used in testing the steps outlined herein and the latest version of PostgreSQL available at the time; which was 11.2 This article assumes you are already a PostgreSQL user or familiar with PostgreSQL it only briefly covers the installation steps of PostgreSQL. If you require additional PostgreSQL information the following links will be useful:
Prerequisites
In addition to the dependencies outlined in the Introduction: Quickstart article the following operating system prerequisites and packages are required. You must ensure the dependencies in the quickstart article are met before proceeding.
- Senzing API 1.10 or newer
- sudo or root privileges to install PostgreSQL and issue system commands
Install and Basic PostgreSQL Setup
The following is a brief overview of the steps required to install PostgreSQL. For full details on installing and features of PostgreSQL on Linux, please see PostgreSQL tutorial.
Download the repository installation package directly from the yum repository website or with the wget command.
Website Download
- PostgreSQL Yum Repository
- Locate the Red Hat Enterprise Linux 7 / Oracle Linux 7 / CentOS 7 package
- Select Download
Yum Download
- On the PostgreSQL Linux Download website select the Red Hat Enterprise Linux 7 / Oracle Linux 7 / CentOS 7 version
- Following the steps outlined to install the repository RPM and install PostgreSQL
wget Download
- On the PostgreSQL Linux Download website select the Red Hat Enterprise Linux 7 / Oracle Linux 7 / CentOS 7 version
- Use wget with the listed URL to directly download the package with wget
- Once downloaded add the repository RPM
sudo yum localinstall pgdg-centos11-11-2.noarch.rpm
Installation
- Install PostgreSQL
sudo yum -y install postgresql11 postgresql11-server postgresql11-contrib postgresql11-docs libpqxx
- Start the server
sudo /usr/pgsql-11/bin/postgresql-11-setup initdb
- If you want to move the data directory to another path, edit /var/lib/pgsql/11/data/postgresql.conf and uncomment and modify the data_directory line to your desired path. Save and copy the original with rsync
data_directory = '/data/pgsql'
sudo rsync -av /var/lib/pgsql/11/data /data/pgsql
sudo systemctl restart postgresql-11
sudo systemctl status postgresql-11
- You should see an output similar to the following:
- To auto start PostgreSQL upon system startup
sudo systemctl enable postgresql-11
Modify Authentication Methods
Authentication is controlled through a configuration file, depending on your infrastructure, deployment and operational directives you may need to alter this file. The details of such changes are beyond the scope of this article, for additional information see Client Authentication.
As an overview the configuration file to change is typically named pg_hba.conf and is located in the data directory, e.g. /var/lib/pgsql/11/data/pg_hba.conf
The following example outlines an example pg_hba.conf file where 3 entries have been enabled to support md5 authentication in place of the default ident authentication.
# TYPE DATABASE USER ADDRESS METHOD
# allow local connections for postgres without a password
local all postgres ident
# "local" is for Unix domain socket connections only
local all all md5
# IPv4 local connections:
host all all 127.0.0.1/32 md5
# IPv6 local connections:
host all all ::1/128 md5
# Allow replication connections from localhost, by a user with the
# replication privilege.
local replication all md5
Upon making any changes to the pg_hba.conf file restart PostgreSQL
sudo systemctl restart postgresql-11
User Creation
- Log into to the PostgreSQL server
sudo su - postgres
- Create the user and database for Senzing
psql -c "CREATE USER g2user WITH ENCRYPTED PASSWORD '<g2userpassword>'" -d template1
createdb -E UTF8 --locale=en_US.UTF-8 -O g2user G2
exit
- In addition to modifying the authentication methods above, if you connect to PostgreSQL remotely, you will need to open port 5432 in the default (and any other organisational) firewall(s)
- Edit the /etc/firewalld/zones/public.xml file and add
-
<service name="postgresql"/>
-
- Restart the local firewall
sudo systemctl restart firewalld
- Edit the /etc/firewalld/zones/public.xml file and add
Schema Creation
- Install schema - assumes commands are run locally on the machine
psql -U g2user -d G2 -f <project_path>/resources/schema/g2core-schema-postgresql-create.sql
Configure G2Module.ini
Modify the following ini file to reference the PostgreSQL databases and schemas. The file is located in ~/senzing/etc/ (if you've not altered the default location). You can comment out the current lines by prefixing them with # and adding the modified ones below.
- G2Module.ini - Change the CONNECTION string to
CONNECTION=postgresql://g2user:g2userpassword@127.0.0.1:5432:G2/
Where:-
- g2user = G2 userid
- g2userpassword = Password for the G2 userid
- 127.0.0.1 = G2 server address
- 5432 = Postgresql port number
Tune your database for the best performance! Follow the instructions in Tuning Your Database.
Update Database with Senzing Configuration
The Senzing engine is configured with a JSON document, on a fresh install this document needs to be registered in the Senzing database. This only needs to happen once with a new project.
python3 python/G2SetupConfig.py
Run a Test Load and Export
Perform a test load of the supplied sample data and then perform a G2 export to test the new database setup.
- Source setupEnv
cd senzing/
. setupEnv
- Load the G2 sample data
cd python/
python3 G2Loader.py -P -p demo/sample/project.csv
- Create the G2 export file
python3 G2Export.py
Comments
0 comments
Please sign in to leave a comment.