Skip to content

Build the Knowledge Graph from indicators of compromise rules, like Hayabusa and Sigma rules¤


There are a lot of sources to download the indicators of compromise rules to detect a possible future incident.

There are rules for Host-based intrusion detection systems (HIDS) with Hayabusa/Sigma, for example, and Network intrusion detection systems (NIDS) with Suricata/Zeek for example.

Here, we are working with the Hayabusa/Sigma rules available via GitHub:

The problem of interoperability, here, is the YAML format of files, their random position in their folders in their Github projets. Moreover, the same rule can exist in different projects but in this tutorial, we will not fix this problem and we consider the IRI rule is their Web address. In Corporate Memory, we would fix that with the Linked Tool, we will study this tool in a next part of this tutorial.

To build this knowledge graph of rules, we need to:

  1. Create a JSON dataset with all rules
  2. Build the transformer of a JSON rule to RDF
  3. Build a workflow to insert each rule in an unique knowledge graph
  4. Use this workflow in CMEMC to import automatically all rules

Import the JSON datasets¤

The YAML syntax is used to define each rule and there is one file by rule.

Corporate Memory doesn’t support YAML (for the moment) but you can convert the files in JSON with this bash where you need to install git and yq.

Moreover, we use yq to add the field rulePath in each file with their paths in their repositories to have the possibility to rebuild their positions on the Web and so allowing the analyst to click directly on this link to read the details and may be, modify this rule.

At the end of this bash, you will have a tree of JSON files and we will apply a workflow on each file.


Don’t forget to replace the final folder before using this bash.

#!/bin/bash -x


mkdir -p  ${output_dir_rules}
cd ${output_dir_rules}
git clone --depth 1
git clone --depth 1

for file in $(find . -name '*.yml'); do
    [ -f "$file" ] || break
    yq ".rulePath = \"${file}\"" -o=json  $file > ${file}.json


We can test this script:

cd ~/git/tutorial-how-to-link-ids-to-osint/docs/build/tutorial-how-to-link-ids-to-osint/lift-data-from-YAML-data-of-hayabusa-sigma
chmod +x

For example, the file proc_creation_win_bcdedit_boot_conf_tamper.yml will become this JSON file:

  "title": "Boot Configuration Tampering Via Bcdedit.EXE",
  "id": "1444443e-6757-43e4-9ea4-c8fc705f79a2",
  "status": "stable",
  "description": "Detects the use of the bcdedit command to tamper with the boot configuration data. This technique is often times used by malware or attackers as a destructive way before launching ransomware.",
  "references": [
  "tags": [
  "level": "high",
  "rulePath": "./sigma/rules/windows/process_creation/proc_creation_win_bcdedit_boot_conf_tamper.yml"

Create the knowledge graph¤

The collected rules are from Sigma and Hayabusa repositories. Hayabusa “are trying to make this rules as close to sigma rules as possible”. In your use case, we need properties defined by Sigma and which also exist in Hayabusa rules. The day where there will be a official RDF vocabulary to define a rule, we will use it. Waiting, your minimal vocabulary is “defined” here: We use this address for the prefix of your RDF vocabulary for your use case.

The filename of the same rule between repositories does not change. So, we are making the IRI of rules with their filename and a arbitrary IRI, like “”. However, we want to give the possibility to open the original YAML rule directly via SPLUNK, so we add the property rdfs:isDefinedBy to associate the rule Web URLs to a rule. We will not use the guid id or Web address of the rule in its IRI because rules are often duplicate between the repositories and the filename and the title seem to be the used IDs of rules in Splunk and not the guid id.

This new transformer are building the following RDF model for your use case:

  1. Create a new project to build the knowledge graph of “Rules Hayabusa Sigma”

  2. In this project, create a RDF dataset “Rules Hayabusa Sigma” in Corporate Memory for all rules with the named graph:

  3. Create a JSON dataset “Rule example (JSON)” in Corporate Memory with one example of rule:

  4. Create the prefix of your vocabulary:

    prefix ctis: <>

  5. Create the transformer for “SIGMA Hayabusa rule” to build this RDF model.

Rule object:

  • type: ctis:Rule

  • IRI: concatenation of “” with the result of this regular expression ^.*?([^\/]*)$ on the rule path

  • property ctis:filename with the result of this regular expression ^.*?([^\/]*)$ on the value path rulePath
  • property rdfs:label with the value path title
  • property rdfs:comment with the value path description
  • property rdfs:seeAlso with the value path references
  • property ctis:mitreAttackTechniqueId is building with this formula with the value path tags
    • Filter by regex: ^attack\.t\d+$
    • Regex replace attack\.t by T

  • property rdfs:isDefinedBy on the value path rulePath is building with this formula to link the rules to their Web addresses.
    • Add two “Regex replace”
      • replace \./hayabusa-rules/ by
      • replace \./sigma/ by

So the rulepath ./sigma/rules/windows/process_creation/proc_creation_win_bcdedit_boot_conf_tamper.yml becomes the link and ./hayabusa-rules/hayabusa/sysmon/Sysmon_15_Info_ADS-Created.ymlbecomes


To test your transformer, you can use the tab “Transform execution”. Here, the knowledge graph will not be cleared after each workflow or execution to test your transformer because the option “clear graph before workflow” is disabled. However during the steps to build this transformer, you can enable tempory this option to see and test the final transformer. You need only to disable this option when your transformer is finished.


Your example of rule exists now in your knowledge graph:

  1. Make the workflow “Import rules” with one input

And don’t forget to allow the replacement of JSON dataset because it allows to replace this specific JSON by all other rules during the execution of this worflow.

  1. Copy the workflow ID


In this example the ID of workflow is RulesHayabusaSigma_671e1f43d94bbc36:Importrules_6ccbc14b656c75c9

Apply the worflow to all files¤

We modify the first bash where we add the line to clear the knowledge graph before importing all rules via our worflow where we subtitute the JSON dataset in input by the rules’ files.


Don’t forget to replace the worflow ID and the final folder before using this bash.


CMEMC config file need to be correctly configurated before to execute this bash, like in the previous tutorial.

For example:

You need to replace “johndo” by other thing, “” by your login (email) in the sandbox and XXXXXXXXX by your password. Save the file (with VI, :wq).

Don’t forget to specify the config by default to use by CMEMC.

#!/bin/bash -x


mkdir -p  ${output_dir_rules}
cd ${output_dir_rules}
git clone --depth 1
git clone --depth 1

for file in $(find . -name '*.yml'); do
    [ -f "$file" ] || break
 yq ".rulePath = \"${file}\"" -o=json  $file > ${file}.json

cmemc graph delete

for file in $(find . -name '*.json'); do
    [ -f "$file" ] || break
    cmemc workflow io  RulesHayabusaSigma_671e1f43d94bbc36:Importrules_6ccbc14b656c75c9 -i ${file}

We can test this script:



Here, we learnt how to generate a knowledge graph with files in input with Corporate Memory to prepare the worflow and cmemc to execute this worklow on all files.


Tutorial: how to link Intrusion Detection Systems (IDS) to Open-Source INTelligence (OSINT)

Next chapter: Link IDS event to a knowledge graph in dashboards via queries

Previous chapter: Build a Knowledge Graph of MITRE ATT&CK® datasets