Please see the Performance FAQ.
How do I load my own data?
Related media:- Video: G2 Mapping and Loading - Part 1
A single data source file can be loaded by specifying the '-f' flag with G2Loader.py. The following example loads a single file called Customers.csv of type CSV and establishes the data source label of CUSTOMERS to the data in G2:-
python python/G2Loader.py -P -f Customers.csv/?data_source=CUSTOMERS,file_format=CSV
Multiple data source files can be loaded by creating a new project specifying the details of each file to be loaded. Under the install senzing directory is the G2Project.ini file and the python/demo/sample directory contains the project.csv file and sample_person.csv file. Use these as templates for creating your own multi file project for loading.
The G2Project.ini has an entry similar to the follow to point to a project. Make a copy of the original G2Project.ini file and modify G2Project.ini to point your custom project.csv:-
The MyProject.csv should be point to the data source file(s) you wish to load:-
The record files should have specific headers terms which inform G2 of the meaning of the data in your file(s). The Generic Entity Specification describes the terms to map from your data types to those G2 understands:- Generic Entity Specification
Here is an example of a CSV file header and sample record:-
1001,Mr Robert M Jones Jr,M,1/2/1981,PP11111,US
This pattern can be used to load many types of data. If you have data in native script, please see the Native Script FAQ.
We recommend using the '-T' flag to perform a test load before loading your own data. See the note below:- Can I perform a test load?
What is the correct CSV format for my data files?
Currently the supported format is comma separated with the use of double quotes around strings that contain embedded commas. Consider the following and note the ADDR_FULL feature is surrounded by double quotes:-
1 ,James ,Lynch ,03/01/82 ,"96 Alpha,Hicksville,NY,118410,US",845-740-5295
How to incrementally load data?
Typical use of the G2 Evaluation is to load a single project in to an empty repository; described in the G2 Evaluation Quickstart. The '-P' flag to G2Loader.py purges the repository before loading the project. By omitting the '-P' the new data will be incrementally loaded on top of the existing data; no data will be deleted or replaced.
This can be helpful if multiple projects are to be loaded on top of each other and a report is desired to be generated after each load to demonstrate the chances.
Can I perform a test load?
Yes. To perform a test loading of data without populating G2 the '-T' flag can be used with G2Loader.py. This is used to check the configuration and data when you have created a new project or have new data to load. G2 will inform you of any errors, enabling you to quickly correct them before starting a full load.