You may have arrived here due to perceived poor performance in general or a warning from Senzing your entity repository database is performing slowly; for example from G2Loader upon startup. The Senzing checkDBPerf() function tests how many auto-commit inserts can be performed on the Senzing entity repository within a few seconds. The faster this completes, the higher scalability you can expect from you system. This isn't the only performance area impacting Senzing but is the #1 outside of data mapping.
Database performance with Senzing is highly related to latency. Issues with performance, in order, tend to focus around:
- Disk IO performance of the database server - see the Disk Performance article
- Lack of database tuning for an auto-commit OLTP workload
- Network bottlenecks preventing high speed communication between the Senzing API and the database
- Latency between the database server and non-direct attached storage subsystems
For each database system the following parameters should be set for the Senzing auto-commit workload.
The G2Loader utility sets a couple of pragmas suited to the Senzing workload. If your database is small and you have RAM to spare you could consider using tmpfs to improve performance. See the Disk Performance article.
db2 update db cfg for <database_name> using PAGE_AGE_TRGT_MCR 10
innodb_flush_log_at_trx_commit = 0
innodb_flush_method = O_DIRECT
innodb_file_per_table = 1
On larger Senzing systems running a high number of threads you may see errors that additional prepared statements can't be created:
42000 Can't create more than max_prepared_stmt_count statements (current value: 16382)
If you know you will be running Senzing on a larger system with a higher than default number of threads, and/or you receive an error similar to the above, you will need to increase the number of prepared statements for the database. For example, on MySQL by increasing max_prepared_stmt_count:
max_prepared_stmt_count = 100000
100,000 is a suggested value but this will vary depending on threads and speed of system. A suggested formula is 2000 * number of Senzing threads + 500
Networking used to be very straightforward, easy to monitor and manage. There was the exceptional case of devices hitting max packets per second or driver/OS/PCI bottlenecks but those were rare.
The popularity of cloud environments extracts much of the network topology from the user. Your application and database could be on different sides of a data center (or continent) or even through very limited virtual systems. This cloud networking is often a black box and the hardest component to troubleshoot.
We have seen systems scale from 30 records per second to 1000 records per second simply by switching the type of underlying cloud network fabric:
- Make sure your systems are as co-located as possible when provisioned
- If there are options between virtual and physical switches, test with both options
- Ask both about bandwidth and packets per second limits