Skip to content

In-memory dataset¤

1. Purpose¤

The in-memory dataset is a small embedded RDF store that keeps all data in memory and exposes it via SPARQL. It is intended as a temporary working graph inside workflows, not as a large or persistent storage.

Typical use cases:

  • Collecting intermediate results during a workflow run.
  • Storing small lookup graphs used by downstream operators.
  • Testing or prototyping workflows without configuring an external RDF store.

2. Behaviour and lifecycle¤

  • The dataset maintains a single in-memory RDF model.
  • All read and write operations go through a SPARQL endpoint over this model.
  • Data exists only in memory:
    • It is not persisted to disk by this dataset.
    • After an application restart, the dataset contents are empty again.

Within a workflow:

  • The dataset can be used as both input and output:
    • Upstream operators can write triples/entities/links into it.
    • Downstream operators can read from it via SPARQL-based mechanisms.

3. Reading data¤

  • When used as a source, the dataset exposes its data as a SPARQL endpoint.
  • Queries and retrievals behave like against a normal SPARQL dataset:
    • Entity retrieval, path/type discovery, sampling, etc. are executed via SPARQL.
  • There is no file backing this dataset; everything comes from what has been written into the in-memory model during the lifetime of the process.

4. Writing data¤

The in-memory dataset accepts RDF data through:

  • Entity sink

    • Entities written by upstream components are converted to RDF triples and stored in the in-memory model.
  • Link sink

    • Links are written as RDF triples in the same model.
  • Triple sink

    • Triples are directly added to the in-memory model via SPARQL operations.

All three sinks ultimately write into the same in-memory graph; there is no separate physical storage per sink type.

5. Configuration¤

Clear graph before workflow execution¤

  • Parameter: Clear graph before workflow execution (boolean)
  • Default: true

Behaviour:

  • If true:

    • Before the dataset is used in a workflow execution, the graph is cleared (for writes via this dataset).
    • The workflow sees a fresh, empty in-memory graph at the start of the run.
  • If false:

    • Existing data in the in-memory graph is preserved when the workflow starts.
    • New data is added on top of whatever is already stored in the model.

This parameter controls whether the dataset behaves as a fresh scratch graph per workflow run or as a longer-lived in-memory graph within the lifetime of the running application.

6. Limitations and recommendations¤

  • Memory-bound

    • All data is kept in memory; large graphs will increase memory usage and may impact performance.
    • For large or production RDF graphs, use an external RDF store and a SPARQL dataset instead.
  • No persistence

    • Contents are lost when the application/server is restarted.
    • Do not treat this dataset as long-term storage.
  • Scope

    • Best suited for:
      • small to medium intermediate results,
      • testing and prototyping,
      • temporary data that can be regenerated by re-running workflows.

7. Example usage scenarios¤

  • Use as a temporary integration graph:

    • Multiple sources write into the in-memory dataset.
    • A downstream SPARQL-based operator queries the combined graph.
  • Use as a scratch area for experimentation:

    • Quickly test mapping or linking logic by writing output into the in-memory dataset.
    • Inspect the result via SPARQL without configuring an external endpoint.
  • Use as a small lookup store:

    • Preload a small set of reference triples (e.g. codes or mappings).
    • Let workflows query these during execution.

Parameter¤

None

Advanced Parameter¤

Clear graph before workflow execution (deprecated)¤

This is deprecated, use the ‘Clear dataset’ operator instead to clear a dataset in a workflow. If set to true this will clear this dataset before it is used in a workflow execution.

  • ID: clearGraphBeforeExecution
  • Datatype: boolean
  • Default Value: false

Comments