Lift data from JSON and XML sources
Introduction
This tutorial shows how you can build a Knowledge Graph based on input data from hierarchical sources like a JavaScript Object Notation file (.json) or an Extensible Markup Language file (.xml).
The complete tutorial is available as a project file (XML) and a project file (JSON). You can import these projects
- by using the web interface (Create → Project → Import project file) or
by using the command line interface
cmemc -c my-cmem project import tutorial-xml.project.zip xml-transformation
BASHcmemc -c my-cmem project import tutorial-json.project.zip json-transformation
BASH
The documentation consists of the following steps, which are described in detail below:
- Registration of the target vocabulary
- Uploading of the data (files)
- Creation of a (target) graph
- Creation of the transformation rules
- Evaluation of the results of the transformation rules
- Execution of the transformation to populate the target graph
The following material is used in this tutorial:
Sample vocabulary which describes the data in the JSON and XML files: products_vocabulary.nt
Sample JSON file: services.json
[ { "Price": "748,40 EUR", "Products": "O491-3823912, I965-1821441, Z655-3173353, ...", "ServiceID": "Y704-9764759", "ServiceName": "Product Analysis", "ProductManager": { "name": "Lambert C. Faust", "mail": "Lambert.Faust@company.org", } }, { "Price": "1082,00 EUR", "Products": "Z249-1364492, L557-1467804, C721-7900144, ...", "ServiceID": "I241-8776317", "ServiceName": "Component Confabulation", "ProductManager": { "name": "Corinna Ludwig", "mail": "Corinna.Ludiwg@company.org", } }, ... ]
JSSample XML file: orgmap.xml
<orgmap> <dept id="73191" name="Engineering"> <manager> <email>Thomas.Mueller@company.org</email> <name>Thomas Mueller</name> <address>Karl-Liebknecht-Straße 885, 82003 Tettnang</address> <phone>+49-8200-38218301</phone> </manager> <employees> <employee> <email>Corinna.Ludwig@company.org</email> <name>Corinna Ludwig</name> <address>Ringstraße 276</address> <phone>+49-1743-24836762</phone> <productExpert>Memristor, Gauge, Encoder</productExpert> </employee> <employee> <email>Karen.Brant@company.org</email> <name>Karen Brant</name> <address>Friedrichstraße 664, 30805 Willich</address> <phone>(00530) 5040048</phone> <productExpert>Inductor</productExpert> </employee> ... </employees> <products> <product id="Z249-1364492" /> <product id="O184-6903943" /> <product id="V404-9975399" /> <product id="F344-7012314" /> <product id="N463-8050264" /> <product id="M605-5951566" /> <product id="N733-1946687" /> </products> <services> <service id="I241-8776317" /> <service id="D215-3449390" /> </services> </dept> <dept id="22183" name="Product Management"> ... </orgmap>
XML
Register the vocabulary
The vocabulary contains the classes and properties needed to map the source data into entities in the Knowledge Graph.
- Press the + button on the lower bottom right of the VOCABS tab in Corporate Memory.
- Define a Name, a Graph URI and a Description of the vocabulary.
In this example we will use:- Name: Product Vocabulary
- Graph URI: http://ld.company.org/prod-vocab/
- Description: Example vocabulary modeled to describe relations between products and services.
Upload the data file
In order to add the data files navigate to the DATA INTEGRATION tab and create a new project. Follow the steps below for adding JSON and XML datasets.
- Press the Create button and select JSON
Define a Label for the dataset and upload the services.json example file. All other parameters can keep the default values.
- Press the Create button and select XML
Define a Label for the dataset and upload the orgmap.xml example file. All other parameters can keep the default values.
Create a Knowledge Graph
- Press the Create button and select the Knowledge Graph dataset type.
-
Define a Label for the Knowlege Graph and provide a graph uri. All other parameters can keep the default values.
In this example we will use:
Name: Service Knowledge Graph
Graph: http://ld.company.org/prod-instances/
Define a Label for the Knowlege Graph and provide a graph uri. All other parameters can keep the default values.
In this example we will use:
Name: Organization Knowledge Graph
- Graph: http://ld.company.org/organization-data/
Create a Transformation
The transformation defines how an input dataset (e.g.: JSON or XML) will be transformed into an output dataset (e.g.: Knowledge Graph).
- Press the Create button and select the Transform type.
-
Define the Label, the Source Dataset, the Output Dataset and the needed Target Vocabularies of your Transformation Task.
In this example we will use:
Name: Create Service Triples
- Select the Source Dataset: Services JSON
- Select the Output Dataset: Service_Knowledge_Graph
Define the Label, the Source Dataset, the Output Dataset and the needed Target Vocabularies of your Transformation Task.
In this example we will use:
Name: Create Organization Triples
- Select the Source Dataset: Orgmap XML
- Select the Output Dataset: Organization_Knowledge_Graph
- Define the Source Type, which defines the XML element that should be iterated when creating resources. In this example we create RDF triples for the department instances: dept
Open the mapping editor from the menu in the top right.
- Click on the
to expand the menu.
- Press the EDIT button to create a base mapping.
-
Define the Target entity type from the vocabulary, the URI pattern and a label for the mapping.
In this example we will use:
Target Entity Type, defines the class that will be instantiated when the mapping rule is applied: Service
- The URI pattern that defines the URI that shall be generated for each individual: http://ld.company.org/prod-inst/service-instances/{ServiceID}
- where http://ld.company.org/prod-inst/ is a common prefix for the instances in this use case,
- service-instances/ complements the instances prefix by adding a common prefix for all service instances
- and finally {ServiceID} is a placeholder that will resolve to the json-key ServiceID (e.g. "ServiceID": "Y704-9764759")
- An optional Label: Service
Example RDF triple in our Knowledge Graph based on the mapping definition:<http://ld.company.org/services-instances/Y704-9764759> <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ld.company.org/prod-vocab/Service>
TEXTDefine the Target entity type from the vocabulary, the URI pattern and a label for the mapping.
In this example we will use:
Target Entity Type, defines the class that will be instantiated when the mapping rule is applied: Department
- The URI pattern that defines the URI that shall be generated for each individual: http://ld.company.org/department/{@id}
- where http://ld.company.org/department/ is a common prefix for the department instances in this use case,
- and finally {@id} is a placeholder that will resolve the XML attribute of the XML tag dept, which was configured as the Source Type of this transformation (see previous steps)
- where http://ld.company.org/department/ is a common prefix for the department instances in this use case,
- An optional Label: Department
Example RDF triple in our Knowledge Graph based on the mapping definition:<http://ld.company.org/department/73191 <http://www.w3.org/1999/02/22-rdf-syntax-ns#type> <http://ld.company.org/prod-vocab/Department>
TEXT - Evaluate your mapping by pressing on the
button in the Examples of target data property to see at most three generated base URIs.
- We have now created the Service entities in the Knowledge Graph. Next we will now add the name of our entity. Press the circular blue button on the lower right and select Add value mapping.
-
Define the Target property, the Data type, the Value path (column name) and a Label for your value mapping.
In this example we will use:
Target Property: has product manager
- Data type: StringValueType
- Value path: ProductManager/name
- which corresponds to the following element in the json-file: [ {"ProductManager": { "name": "Corinna Ludwig"} ... } ...]
- An optional Label: has Product Manager
Define the Target property, the Data type, the Value path (column name) and a Label for your value mapping.
In this example we will use:
Target Property: name
- Data type: StringValueType
- Value path: @name
- which corresponds to the department name attribute in the XML file
- An optional Label: department name
- By clicking on the
button in the Examples of target data property, you can get a preview for 3x value mapping to be created.
Evaluate a Transformation
Visit the EVALUATE tab of your transformation to view a list of generated entities. By clicking one of the generated entities, more details are provided.
Execute a Transformation to build a Knowledge Graph
- Go into the mapping and visit the EXECUTE tab.
- Press the
button and validate the results. In this example, 9x Service entities were created in our Knowledge Graph based on the mapping.
- Finally you can use the DataManager EXPLORE module to (re-)view of the created Knowledge Graphs
- JSON / Service: http://ld.company.org/prod-instances/
- XML / Department: http://ld.company.org/organization-data/