Performance Questions
Please see the Performance FAQ
How Do I Load My Own Data?
A single data source file can be loaded by specifying the '-f' flag with G2Loader.py. The following example loads a single file called Customers.csv of type CSV and establishes the data source label of CUSTOMERS to the data in Senzing
python3 G2Loader.py -P -f Customers.csv/?data_source=CUSTOMERS,file_format=CSV
Multiple data source files can be loaded by creating a new project specifying the details of each file to be loaded. Under the python/demo/sample directory are project.csv and project.json files. Use these as templates for creating your own multi file project for loading. The project files describe each file to load, the data source code to use and the format of the load file. For example project.csv:
DATA_SOURCE,FILE_FORMAT,FILE_NAME
CUSTOMERS,CSV,customers.csv
PROSPECTS,CSV,prospects.csv
The load files should have specific terms which inform Senzing of the meaning of the data in your file(s). The Generic Entity Specification describes the terms to map from your data types to those Senzing understands and will use for entity resolution.
Here is an example of a CSV file header and sample record:
RECORD_ID,PRIMARY_NAME_FULL ,GENDER,DATE_OF_BIRTH,PASSPORT_NUMBER,PASSPORT_COUNTRY 1001 ,Mr Robert M Jones Jr,M ,1/2/1981 ,PP11111 ,US
This pattern can be used to load many types of data. If you have data in native script, please see the Native Script FAQ.
We recommend using the '-T' flag to perform a test load before loading your own data. See below: Can I perform a test load?
How Do I Format CSV Data Files for Loading?
Currently the supported format is comma separated with the use of double quotes around strings that contain embedded commas. Consider the following and note the ADDR_FULL feature is surrounded by double quotes:
RECORD_ID,NAME_FIRST,NAME_LAST,DATE_OF_BIRTH,ADDR_FULL ,PHONE_NUMBER 1 ,James ,Lynch ,03/01/82 ,"96 Alpha,Hicksville,NY,118410,US",845-740-5295
How to Incrementally Load Data?
Typical use of Senzing is to load a single project in to an empty repository; described in the Quickstart Guide. The '-P' flag to G2Loader.py purges the repository before loading. By omitting the '-P' flag the new data will be incrementally loaded on top of the existing data; no data will be deleted or replaced.
This can be helpful if multiple projects are to be incrementally loaded and a report is desired after each to demonstrate entity resolution and changes to the subsequent entities.
Can I perform a test load?
Yes. To perform a test loading of data without populating Senzing the '-T' flag can be used with G2Loader.py. This is used to check the configuration and data when you have created a new project or have new data to load. Senzing will inform you of any errors, enabling you to quickly correct them before starting a full load.
Comments
0 comments
Please sign in to leave a comment.