You may have arrived here due to perceived poor performance in general or a warning from the Senzing product that your Database is performing slowly. The Senzing product checkDBPerf() function tests how many auto-commit inserts can be performed on your Database within a few seconds. The faster this completes, the better you are likely able to scale your system. This isn't the only performance area impacting Senzing but is the #1 outside of data mapping.
Database performance with Senzing focuses highly on latency. Issues with performance, in order, tend to focus around:
- Disk IO performance of the database server
- Lack of database tuning for an auto-commit OLTP workload
- Network bottlenecks preventing high speed communication between the Senzing API and the database
- Latency between the database server and non-direct attached storage subsystems
See the Disk Performance article.
The G2Loader utility sets a couple of pragmas suited to the Senzing workload. If your database is small and you have RAM to spare you could look at using tmpfs to improve performance. See the Disk Performance article.
Set the following parameters to tune DB2 for auto-commit
Set the following parameters to tune PostgreSQL for auto-commit
Set the following parameters to tune MySQL for auto-commit
innodb_flush_log_at_trx_commit = 0
innodb_flush_method = O_DIRECT
innodb_file_per_table = 1
On larger Senzing systems running a high number of threads you may see errors that additional prepared statements can't be created:
42000 Can't create more than max_prepared_stmt_count statements (current value: 16382)
If you know you will be running Senzing on a larger system with a higher than default number of threads update the following
max_prepared_stmt_count = 100000
100,000 is a suggested value but this will vary depending on threads and speed of system. A suggested formula is 2000 * number of senzing threads + 500
Networking used to be very straightforward, easy to monitor and manage. There was the exceptional case of devices hitting max packets per second or driver/OS/PCI bottlenecks but those were rare.
The popularity of cloud environments extracts much of the network topology from the user. Your application and database could be on different sides of a datacenter (or continent) or even through very limited virtual systems. This cloud networking is often a black box and the hardest thing to troubleshoot.
We have seen systems scale from 30 records per second to 1000 records per second simply by switching the type of underlying cloud network fabric:
- Make sure your systems are as co-located as possible when provisioned
- If there are options between virtual and physical switches, test with both options
- Ask both about bandwidth and packets per second limits