Small: up to 10 million entities.
8 cores, 32gb of ram, 100gb of SSD or NVMe storage
This should load typical data at 100 records per second, so 10 million would load in about a day.
Medium: up to 50 million entities.
16 cores, 64gb of ram, 500gb of SSD or NVMe storage
This should load typical data at 200 records per second, so 50 million would load in under 3 days.
Large: up to 100 million entities.
32 cores, 128gb of ram, 1TB of SSD or NVMe storage
This should load typical data at 200 records per second, so 100 million would load in under a week.
Of course there are hardware configurations that can achieve significantly faster performance and handle even larger data sets in the billions!
On this server you will install the database engine and G2. While our best performance will always be on the latest version of IBM DB2 Enterprise Server, we get acceptable performance on the latest free version of MySQL.
Why SSDs are required
To perform real-time entity resolution you must read from the database more than you write to it. SSDs are much faster than traditional spinning disks and have become so affordable they are becoming standard equipment. You can still use spinning disks, but it will take twice as long to load your data.
The performance expectations above are based upon typical person or company data sets such as master customer lists, prospect lists, employee lists, watch lists, national registries, etc.
You can run into data sets that have extra large records or highly related data – meaning everybody is related to everybody else. The nice thing is that you can increase performance by adding more cores and ram at 4gb per core.
If you run into slow data sets, please feel free to contact us as this often means the data was mis-mapped or could be mapped differently to achieve your performance needs.