Lift data from tabular data such as CSV, XSLX or database tables
Introduction
This beginner-level tutorial shows how you can build a Knowledge Graph based on input data from a comma-separated value file (.csv), an excel file (.xlsx) or a database table (jdbc).
The workflow consists of the following steps, which are described in detail below:
- Registration of the target vocabulary
- Uploading of the data (file) / Connect to JDBC endpoint
(Re-)View your data table
- Creation of a (target) graph
- Creation of the transformation rules
- Evaluation of the results of the transformation rules
- Execution of the transformation to populate the target graph
Sample Material
The following material is used in this tutorial:
- Sample vocabulary which describes the data in the CSV files: products_vocabulary.nt
Sample CSV file: services.csv
ServiceID ServiceName Products ProductManager Price Y704-9764759 Product Analysis O491-3823912, I965-1821441, Z655-3173353, ... Lambert.Faust@company.org 748,40 EUR I241-8776317 Component Confabulation Z249-1364492, L557-1467804, C721-7900144, ... Corinna.Ludwig@company.org 1082,00 EUR
... Sample Excel file: products.xlsx
ProductID ProductName Height Width Depth Weigth ProductManager Price I241-8776317 Strain Compensator 12 68 15 8 Baldwin.Dirksen@company.org 0,50 EUR D215-3449390 Gauge Crystal 77 58 19 15 Wanja.Hoffmann@company.org 2,00 EUR ...
Register the vocabulary
The vocabulary contains the classes and properties needed to map the data into the new structure in the Knowledge Graph.- Press the + button on the lower bottom right of the VOCABS tab in Corporate Memory.
- Define a Name, a Graph URI and a Description of the vocabulary.
In this example we will use:- Name: Product Vocabulary
- Graph URI: http://ld.company.org/prod-vocab/
- Description: Example vocabulary modeled to describe relations between products and services.
Upload the data file / Connect to the JDBC endpoint
- Open a new browser tab and login to the Build / Data Integration area of Corporate Memory:
http://your.corporate.memory/dataintegration/
. - Press the Resources button of your workspace and select the file to be uploaded.
- Press the Browse... button, select the file and press the UPLOAD button. When the upload is done press CANCEL to close the dialog.
- Press the Add button in the Dataset category of your workspace and select the type CSV (file) / Excel (file).
-
Define a Name for the dataset and select the previously registered resource file. All other parameters can keep the default values.
In this example we will use:
- Name: Services_CSV
- and select the File: 156539415020_services.csv
Define a Name for the dataset and select the previously registered resource file. All other parameters can keep the default values.
In this example we will use:
- Name: Products_XLSX
- and select the File: 1565639227448_products.xlsx
- Open a new browser tab and login to the Build / Data Integration area of Corporate Memory:
http://your.corporate.memory/dataintegration/
. - Press the Add button in the Dataset category of your workspace and select the JDBC endpoint (remote) type.
Define a Name for the dataset, specify the JDBC Driver connection URL, the table name and the user and password to connect to the database.
In this example we will use:- Name: Services_ServiceDB
- JDBC Driver Connection URL: jdbc:mysql://mysql:3306/ServicesDB
- table: Services
- username: root
- password: ****
The general form of the JDBC connection string is:jdbc:<vendor>://<hostname>:<portNumber>/<databaseName>
CODEDefault JDBC connection strings for popular Relational Database Management Systems:
Vendor Default JDBC Connection String Default Port Microsoft SQL Server jdbc:sqlserver:<hostname>:1433/<databaseName> 1433 PostgreSQL jdbc:postgresql:<hostname>:5432/<databaseName> 5432 MySQL jdbc:mysql:<hostname>:3306/<databaseName> 3306 MariaDB jdbc:mariadb:<hostname>:3306/<databaseName> 3306 IBM DB2* jdbc:db2:<hostname>:50000/<databaseName> 50000 Oracle* jdbc:oracle:thin:<hostname>:1521/<databaseName> 1521 *IBM DB2 and Oracle JDBC drivers are not by default part of Corporate Memory, but can be added.
Instead of selecting a table you can also specify a custom SQL query in the source query field.
- Name: Services_ServiceDB
(Re-)View your Data Table
To validate that the input data is correct, you can preview the data table in Corporate Memory.
- Press the Open button of the dataset you want to view the data.
- Select the TABLEVIEW tab.
- In the TABLEVIEW tab, you can view a couple of rows to check that your data in correctly accessible.
Create a Knowledge Graph
- Press the Add button in the Datasets category of your workspace and select the type Knowledge Graph (embedded).
- Define a Name for the Knowledge Graph and provide a graph uri. All other parameters can keep the default values.
In this example we will use:- Name: Service_Knowledge_Graph
- graph: http://ld.company.org/prod-instances/
Create a Transformation
The transformation defines how an input dataset (e.g. CSV) will be transformed into an output dataset (e.g. Knowledge Graph).
- Press the Add button in the Transform Tasks category of your workspace.
- Define the Name, the Source Dataset, the Output Dataset and the needed Target Vocabularies of your Transformation Task.
In this example we will use:- Name: Create_Service_Triples
- Select the previously created dataset, as the Source Dataset: Services CSV
- Select the previously created dataset as the Output Dataset: Service_Knowledge_Graph
- Provide the graph (URI) that holds the registered vocabulary as the Target Vocabularies: http://ld.company.org/prod-vocab/
- Press the Open button of the created transformation.
- Click on the
in the right main area to expand the menu.
- Press the EDIT button to create a base mapping.
Define the Target entity type from the vocabulary, the URI pattern and a label for the mapping.
In this example we will use:
Target entity type: Service
URI pattern: http://ld.company.org/prod-inst/{ServiceID}
- where http://ld.company.org/prod-inst/ is a common prefix for the instances in this use case,
- and {ServiceID} is a placeholder that will resolve to the column of that name
- where http://ld.company.org/prod-inst/ is a common prefix for the instances in this use case,
- An optional Label: Service
Example RDF triple in our Knowledge Graph based on the mapping definition:<http://ld.company.org/prod-inst/Y704-9764759> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ld.company.org/prod-vocab/Service>
TEXT
- Evaluate your mapping by pressing on the
button in the Examples of target data property to see at most three generated base URIs.
- We have now created the Service entities in the Knowledge Graph. As a next step, will add the name of the Service entity in . Press the circular blue button on the lower right and select Add value mapping.
- Define the Target property, the Data type, the Value path (column name) and a Label for your value mapping.
In this example we will use:
Target Property: name
- Data type: StringValueType
- Value path: ServiceName
- which corresponds to the column of that name
- An optional Label: service name
Evaluate a Transformation
Visit the EVALUATE tab of your transformation to view a list of generated entities. By clicking one of the generated entities, more details are provided.
Execute a Transformation to build a Knowledge Graph
- Go into the mapping and visit the EXECUTE tab.
- Press the
button and validate the results. In this example, 9x Service triples were created in our Knowledge Graph based on the mapping.
- Finally you can use the DataManager EXPLORE module to (re-)view of the created Knowledge Graph: http://ld.company.org/prod-instances/