In-memory dataset¤

1. Purpose¤

The in-memory dataset is a small embedded RDF store that keeps all data in memory and exposes it via SPARQL. It is intended as a temporary working graph inside workflows, not as a large or persistent storage.

Typical use cases:

Collecting intermediate results during a workflow run.
Storing small lookup graphs used by downstream operators.
Testing or prototyping workflows without configuring an external RDF store.

2. Behaviour and lifecycle¤

The dataset maintains a single in-memory RDF model.
All read and write operations go through a SPARQL endpoint over this model.
Data exists only in memory:
- It is not persisted to disk by this dataset.
- After an application restart, the dataset contents are empty again.

Within a workflow:

The dataset can be used as both input and output:
- Upstream operators can write triples/entities/links into it.
- Downstream operators can read from it via SPARQL-based mechanisms.

3. Reading data¤

When used as a source, the dataset exposes its data as a SPARQL endpoint.
Queries and retrievals behave like against a normal SPARQL dataset:
- Entity retrieval, path/type discovery, sampling, etc. are executed via SPARQL.
There is no file backing this dataset; everything comes from what has been written into the in-memory model during the lifetime of the process.

4. Writing data¤

The in-memory dataset accepts RDF data through:

Entity sink
- Entities written by upstream components are converted to RDF triples and stored in the in-memory model.
Link sink
- Links are written as RDF triples in the same model.
Triple sink
- Triples are directly added to the in-memory model via SPARQL operations.

All three sinks ultimately write into the same in-memory graph; there is no separate physical storage per sink type.

5. Configuration¤

Clear graph before workflow execution¤

Parameter: Clear graph before workflow execution (boolean)
Default: true

Behaviour:

If true:
- Before the dataset is used in a workflow execution, the graph is cleared (for writes via this dataset).
- The workflow sees a fresh, empty in-memory graph at the start of the run.
If false:
- Existing data in the in-memory graph is preserved when the workflow starts.
- New data is added on top of whatever is already stored in the model.

This parameter controls whether the dataset behaves as a fresh scratch graph per workflow run or as a longer-lived in-memory graph within the lifetime of the running application.

6. Limitations and recommendations¤

Memory-bound
- All data is kept in memory; large graphs will increase memory usage and may impact performance.
- For large or production RDF graphs, use an external RDF store and a SPARQL dataset instead.
No persistence
- Contents are lost when the application/server is restarted.
- Do not treat this dataset as long-term storage.
Scope
- Best suited for:
  - small to medium intermediate results,
  - testing and prototyping,
  - temporary data that can be regenerated by re-running workflows.

7. Example usage scenarios¤

Use as a temporary integration graph:
- Multiple sources write into the in-memory dataset.
- A downstream SPARQL-based operator queries the combined graph.
Use as a scratch area for experimentation:
- Quickly test mapping or linking logic by writing output into the in-memory dataset.
- Inspect the result via SPARQL without configuring an external endpoint.
Use as a small lookup store:
- Preload a small set of reference triples (e.g. codes or mappings).
- Let workflows query these during execution.

Parameter¤

None

Advanced Parameter¤

Clear graph before workflow execution (deprecated)¤

This is deprecated, use the ‘Clear dataset’ operator instead to clear a dataset in a workflow. If set to true this will clear this dataset before it is used in a workflow execution.

ID: clearGraphBeforeExecution
Datatype: boolean
Default Value: false