Lift data from tabular data such as CSV, XSLX or database tables
Introduction
This beginner-level tutorial shows how you can build a Knowledge Graph based on input data from a comma-separated value file (.csv), an excel file (.xlsx) or a database table (jdbc).
The complete tutorial is available as a project file. You can import this project
- by using the web interface (Create → Project → Import project file) or
by using the command line interface
cmemc -c my-cmem project import tutorial-csv.project.zip tutorial-csv
BASH
The documentation consists of the following steps, which are described in detail below:
- Registration of the target vocabulary
- Uploading of the data (file) / Connect to JDBC endpoint
(Re-)View your data table
- Creation of a (target) graph
- Creation of the transformation rules
- Evaluation of the results of the transformation rules
- Execution of the transformation to populate the target graph
Sample Material
The following material is used in this tutorial:
- Sample vocabulary which describes the data in the CSV files: products_vocabulary.nt
Sample CSV file: services.csv
ServiceID ServiceName Products ProductManager Price Y704-9764759 Product Analysis O491-3823912, I965-1821441, Z655-3173353, ... Lambert.Faust@company.org 748,40 EUR I241-8776317 Component Confabulation Z249-1364492, L557-1467804, C721-7900144, ... Corinna.Ludwig@company.org 1082,00 EUR
... Sample Excel file: products.xlsx
ProductID ProductName Height Width Depth Weigth ProductManager Price I241-8776317 Strain Compensator 12 68 15 8 Baldwin.Dirksen@company.org 0,50 EUR D215-3449390 Gauge Crystal 77 58 19 15 Wanja.Hoffmann@company.org 2,00 EUR ...
Register the vocabulary
The vocabulary contains the classes and properties needed to map the data into the new structure in the Knowledge Graph.- Press the + button on the lower bottom right of the VOCABULARIES tab in Corporate Memory.
- Define a Name, a Graph URI and a Description of the vocabulary.
In this example we will use:- Name: Product Vocabulary
- Graph URI: http://ld.company.org/prod-vocab/
- Description: Example vocabulary modeled to describe relations between products and services.
Upload the data file / Connect to the JDBC endpoint
- Navigate to DATA INTEGRATION tab
- Create a new project
- Create a new item by pressing the Create button
- In the Create new item dialog, click on CSV
- Fill out a label and upload the services.csv sample file
Leave all other parameters at their default values. Add the dataset by pressing the Create button.
- Now create a second dataset. Choose Excel and upload the products.xlsx sample file.
- Instead of uploading the services.csv sample file into Corporate Memory, you can also load it into a SQL database and access it from Corporate Memory using the JDBC protocol.
- Navigate to DATA INTEGRATION tab and open a project or create a new one.
- Press the Create button and select the JDBC endpoint type.
Define a Label for the dataset, specify the JDBC Driver connection URL, the table name and the user and password to connect to the database.
In this example we will use:- Name: Services_ServiceDB
- JDBC Driver Connection URL: jdbc:mysql://mysql:3306/ServicesDB
- table: Services
- username: root
- password: ****
The general form of the JDBC connection string is:jdbc:<vendor>://<hostname>:<portNumber>/<databaseName>
CODEDefault JDBC connection strings for popular Relational Database Management Systems:
Vendor Default JDBC Connection String Default Port Microsoft SQL Server jdbc:sqlserver:<hostname>:1433/<databaseName> 1433 PostgreSQL jdbc:postgresql:<hostname>:5432/<databaseName> 5432 MySQL jdbc:mysql:<hostname>:3306/<databaseName> 3306 MariaDB jdbc:mariadb:<hostname>:3306/<databaseName> 3306 IBM DB2* jdbc:db2:<hostname>:50000/<databaseName> 50000 Oracle* jdbc:oracle:thin:<hostname>:1521/<databaseName> 1521 *IBM DB2 and Oracle JDBC drivers are not by default part of Corporate Memory, but can be added.
Instead of selecting a table you can also specify a custom SQL query in the source query field.
- Name: Services_ServiceDB
(Re-)View your Data Table
To validate that the input data is correct, you can preview the data table in Corporate Memory.
- On the dataset page, press the Load preview button
- Once the preview is loaded, you can view a couple of rows to check that your data is accessible.
- Optionally, you can press start profiling and explore statistics about the dataset.
Create a Knowledge Graph
- Press the Create button and select the Datasets category. Select the type Knowledge Graph.
- Define a label for the Knowledge Graph and provide a graph uri. All other parameters can keep the default values.
In this example we will use:- Label: Service Knowledge Graph
- Graph: http://ld.company.org/prod-instances/
Create a Transformation
The transformation defines how an input dataset (e.g. CSV) will be transformed into an output dataset (e.g. Knowledge Graph).
- Press the Create button and select Transform from the Transform category.
- Fill out the the details leaving the target vocabularies field at its default value all installed vocabularies, which will enable us to create a transformation to the previously installed products vocabulary.
In this example we will use:- Name: Lift Service Database
- Select the previously created dataset, as the Input Dataset: Services
- Select the previously created dataset as the Output Dataset: Service Knowledge Graph
- Name: Lift Service Database
- Click on Mapping editor in the menu on the top right.
- Click on the
in the right main area to expand the menu.
- Press the EDIT button to create a base mapping.
Define the Target entity type from the vocabulary, the URI pattern and a label for the mapping.
In this example we will use:
Target entity type: Service
URI pattern: http://ld.company.org/prod-inst/{ServiceID}
- where http://ld.company.org/prod-inst/ is a common prefix for the instances in this use case,
- and {ServiceID} is a placeholder that will resolve to the column of that name
- where http://ld.company.org/prod-inst/ is a common prefix for the instances in this use case,
- An optional Label: Service
Example RDF triple in our Knowledge Graph based on the mapping definition:<http://ld.company.org/prod-inst/Y704-9764759> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ld.company.org/prod-vocab/Service>
TEXT
- Evaluate your mapping by pressing on the
button in the Examples of target data property to see at most three generated base URIs.
- We have now created the Service entities in the Knowledge Graph. As a next step, will add the name of the Service entity in . Press the circular blue button on the lower right and select Add value mapping.
- Define the Target property, the Data type, the Value path (column name) and a Label for your value mapping.
In this example we will use:
Target Property: name
- Data type: StringValueType
- Value path: ServiceName
- which corresponds to the column of that name
- An optional Label: service name
Evaluate a Transformation
Visit the EVALUATE tab of your transformation to view a list of generated entities. By clicking one of the generated entities, more details are provided.
Execute a Transformation to build a Knowledge Graph
- Go into the mapping and visit the EXECUTE tab.
- Press the
button and validate the results. In this example, 9x Service triples were created in our Knowledge Graph based on the mapping.
- Finally you can use the DataManager EXPLORE module to (re-)view of the created Knowledge Graph: http://ld.company.org/prod-instances/