Activity Reference¤
Project Activities¤
The following activities are available for each project.
Dataset matcher¤
Generates matches between schema paths and datasets based on the schema discovery and profiling information of the datasets.
| Parameter | Type | Description | Example |
|---|---|---|---|
| datasetUri | String | If set, run dataset matching only for this particular dataset. |
The identifier for this plugin is DatasetMatcher.
It can be found in the package com.eccenca.di.datamatching.
Task Activities¤
The following activities are available for different types of tasks.
Custom¤
Execute REST Task¤
Executes the REST task.
This plugin does not require any parameters.
The identifier for this plugin is ExecuteRestTask.
It can be found in the package com.eccenca.di.workflow.operators.rest.
Dataset¤
Dataset profiler¤
Generates profiling data of a dataset, e.g. data types, statistics etc.
| Parameter | Type | Description | Example |
|---|---|---|---|
| datasetUri | String | Optional URI of the dataset resource that should be profiled. If not specified an URI will be generated. | |
| uriPrefix | String | Optional URI prefix that is prepended to every generated URI, e.g. property URIs for every schema path. If not specified an URI prefix will be generated. | |
| entitySampleLimit | String | How many entities should be sampled for the profiling. If left blank, all entities will be considered. | |
| timeLimit | String | The time in milliseconds that each of the schema extraction step and profiling step should spend on. Leave blank for unlimited time. | |
| classProfilingLimit | int | The maximum number of classes that are profiled from the extracted schema. | |
| schemaEntityLimit | int | The maximum number of overall schema entities (types, properties/attributes) that will be extracted. | |
| executionType | String | The execution type to be used: SPARK, LEGACY. The legacy execution uses large in-memory maps and takes longer! |
The identifier for this plugin is DatasetProfiler.
It can be found in the package com.eccenca.di.profiling.
SQL endpoint status¤
Shows the SQL endpoint status.
This plugin does not require any parameters.
The identifier for this plugin is SqlEndpointStatus.
It can be found in the package com.eccenca.di.sql.endpoint.activity.
Types cache¤
Holds the most frequent types in a dataset.
This plugin does not require any parameters.
The identifier for this plugin is TypesCache.
It can be found in the package org.silkframework.workspace.activity.dataset.
LinkSpecification¤
Active learning¤
Executes an active learning iteration.
| Parameter | Type | Description | Example |
|---|---|---|---|
| fixedRandomSeed | boolean | No description |
The identifier for this plugin is ActiveLearning.
It can be found in the package org.silkframework.learning.active.
Evaluate linking¤
Evaluates the linking task by generating links.
| Parameter | Type | Description | Example |
|---|---|---|---|
| includeReferenceLinks | boolean | Do not generate a link for which there is a negative reference link while always generating positive reference links. | |
| useFileCache | boolean | Use a file cache. This avoids memory overflows for big files. | |
| partitionSize | int | The number of entities in a single partition in the cache. | |
| generateLinksWithEntities | boolean | Generate detailed information about the matched entities. If set to false, the generated links won’t be shown in the Workbench. | |
| writeOutputs | boolean | Write the generated links to the configured output of this task. | |
| linkLimit | int | If defined, the execution will stop after the configured number of links is reached.\This is just a hint and the execution may produce slightly fewer or more links. | |
| timeout | int | Timeout in seconds after that the matching task of an evaluation should be aborted. Set to 0 or negative to disable the timeout. |
The identifier for this plugin is EvaluateLinking.
It can be found in the package org.silkframework.workspace.activity.linking.
Execute linking¤
Executes the linking task using the configured execution.
This plugin does not require any parameters.
The identifier for this plugin is ExecuteLinking.
It can be found in the package org.silkframework.workspace.activity.linking.
Linking paths cache¤
Holds the most frequent paths for the selected entities.
This plugin does not require any parameters.
The identifier for this plugin is LinkingPathsCache.
It can be found in the package org.silkframework.workspace.activity.linking.
Reference entities cache¤
For each reference link, the reference entities cache holds all values of the linked entities.
This plugin does not require any parameters.
The identifier for this plugin is ReferenceEntitiesCache.
It can be found in the package org.silkframework.workspace.activity.linking.
Supervised learning¤
Executes the supervised learning.
This plugin does not require any parameters.
The identifier for this plugin is SupervisedLearning.
It can be found in the package org.silkframework.learning.active.
Scheduler¤
Activate¤
Executes the scheduler
This plugin does not require any parameters.
The identifier for this plugin is ExecuteScheduler.
It can be found in the package com.eccenca.di.scheduler.
ScriptTask¤
Execute Script¤
Executes the script.
This plugin does not require any parameters.
The identifier for this plugin is ExecuteScript.
It can be found in the package com.eccenca.di.scripting.scala.
TransformSpecification¤
Execute transform¤
Executes the transformation.
| Parameter | Type | Description | Example |
|---|---|---|---|
| limit | IntOptionParameter | Limits the maximum number of entities that are transformed. |
The identifier for this plugin is ExecuteTransform.
It can be found in the package org.silkframework.workspace.activity.transform.
Transform paths cache¤
Holds the most frequent paths for the selected entities.
This plugin does not require any parameters.
The identifier for this plugin is TransformPathsCache.
It can be found in the package org.silkframework.workspace.activity.transform.
Target vocabulary cache¤
Holds the target vocabularies
This plugin does not require any parameters.
The identifier for this plugin is VocabularyCache.
It can be found in the package org.silkframework.workspace.activity.transform.
Workflow¤
Execute locally¤
Executes the workflow locally.
This plugin does not require any parameters.
The identifier for this plugin is ExecuteLocalWorkflow.
It can be found in the package org.silkframework.workspace.activity.workflow.
WorkflowExecution¤
Generate Spark assembly¤
Generate project and Spark assembly artifacts and deploy them using the specified configuration settings: type, artifact and options like destination in case of a simple copy
| Parameter | Type | Description | Example |
|---|---|---|---|
| executeStaging | boolean | Execute loading phase | |
| executeTransform | boolean | Execute transform phase | |
| executeLoading | boolean | Execute staging phase |
The identifier for this plugin is DeploySparkWorkflow.
It can be found in the package com.eccenca.di.spark.
Default execution¤
Executes a workflow with the executor defined in the configuration
This plugin does not require any parameters.
The identifier for this plugin is ExecuteDefaultWorkflow.
It can be found in the package com.eccenca.di.spark.
Execute operator¤
Executes a workflow on with an executor that uses Apache Spark. Depending on the Spark configuration it can still run on a single local machine or on a cluster.
| Parameter | Type | Description | Example |
|---|---|---|---|
| operator | TaskReference | The workflow to execute. |
The identifier for this plugin is ExecuteSparkOperator.
It can be found in the package com.eccenca.di.spark.
Execute on Spark¤
Executes a workflow on with an executor that uses Apache Spark. Depending on the Spark configuration it can still run on a single local machine or on a cluster.
This plugin does not require any parameters.
The identifier for this plugin is ExecuteSparkWorkflow.
It can be found in the package com.eccenca.di.spark.
Execute with payload¤
Executes a workflow with custom payload.
| Parameter | Type | Description | Example |
|---|---|---|---|
| configuration | MultilineStringParameter | No description | |
| configurationType | String | No description |
The identifier for this plugin is ExecuteWorkflowWithPayload.
It can be found in the package org.silkframework.workbench.workflow.
Generate view¤
Generate and share a view on a workflow executed by the Spark executor. Executes a workflow on Spark and generates a SparkSQL temporary table instead of serializing the result. The table can be accessed via JDBC
| Parameter | Type | Description | Example |
|---|---|---|---|
| caching | boolean | Optional parameter that enables caching (default=false). | |
| userDefinedName | String | Optional View name that is used when a view on a non virtual is generated (default = [TASK-ID]_generated_view). |
The identifier for this plugin is GenerateSparkView.
It can be found in the package com.eccenca.di.sql.virtual.