In-memory dataset¤
1. Purpose¤
The in-memory dataset is a small embedded RDF store that keeps all data in memory and exposes it via SPARQL. It is intended as a temporary working graph inside workflows, not as a large or persistent storage.
Typical use cases:
- Collecting intermediate results during a workflow run.
- Storing small lookup graphs used by downstream operators.
- Testing or prototyping workflows without configuring an external RDF store.
2. Behaviour and lifecycle¤
- The dataset maintains a single in-memory RDF model.
- All read and write operations go through a SPARQL endpoint over this model.
- Data exists only in memory:
- It is not persisted to disk by this dataset.
- After an application restart, the dataset contents are empty again.
Within a workflow:
- The dataset can be used as both input and output:
- Upstream operators can write triples/entities/links into it.
- Downstream operators can read from it via SPARQL-based mechanisms.
3. Reading data¤
- When used as a source, the dataset exposes its data as a SPARQL endpoint.
- Queries and retrievals behave like against a normal SPARQL dataset:
- Entity retrieval, path/type discovery, sampling, etc. are executed via SPARQL.
- There is no file backing this dataset; everything comes from what has been written into the in-memory model during the lifetime of the process.
4. Writing data¤
The in-memory dataset accepts RDF data through:
-
Entity sink
- Entities written by upstream components are converted to RDF triples and stored in the in-memory model.
-
Link sink
- Links are written as RDF triples in the same model.
-
Triple sink
- Triples are directly added to the in-memory model via SPARQL operations.
All three sinks ultimately write into the same in-memory graph; there is no separate physical storage per sink type.
5. Configuration¤
Clear graph before workflow execution¤
- Parameter:
Clear graph before workflow execution(boolean) - Default:
true
Behaviour:
-
If true:
- Before the dataset is used in a workflow execution, the graph is cleared (for writes via this dataset).
- The workflow sees a fresh, empty in-memory graph at the start of the run.
-
If false:
- Existing data in the in-memory graph is preserved when the workflow starts.
- New data is added on top of whatever is already stored in the model.
This parameter controls whether the dataset behaves as a fresh scratch graph per workflow run or as a longer-lived in-memory graph within the lifetime of the running application.
6. Limitations and recommendations¤
-
Memory-bound
- All data is kept in memory; large graphs will increase memory usage and may impact performance.
- For large or production RDF graphs, use an external RDF store and a SPARQL dataset instead.
-
No persistence
- Contents are lost when the application/server is restarted.
- Do not treat this dataset as long-term storage.
-
Scope
- Best suited for:
- small to medium intermediate results,
- testing and prototyping,
- temporary data that can be regenerated by re-running workflows.
- Best suited for:
7. Example usage scenarios¤
-
Use as a temporary integration graph:
- Multiple sources write into the in-memory dataset.
- A downstream SPARQL-based operator queries the combined graph.
-
Use as a scratch area for experimentation:
- Quickly test mapping or linking logic by writing output into the in-memory dataset.
- Inspect the result via SPARQL without configuring an external endpoint.
-
Use as a small lookup store:
- Preload a small set of reference triples (e.g. codes or mappings).
- Let workflows query these during execution.
Parameter¤
None
Advanced Parameter¤
Clear graph before workflow execution (deprecated)¤
This is deprecated, use the ‘Clear dataset’ operator instead to clear a dataset in a workflow. If set to true this will clear this dataset before it is used in a workflow execution.
- ID:
clearGraphBeforeExecution - Datatype:
boolean - Default Value:
false