Skip to content

Activity Reference¤

Project Activities¤

The following activities are available for each project.

Dataset matcher¤

Generates matches between schema paths and datasets based on the schema discovery and profiling information of the datasets.

Parameter Type Description Example
datasetUri String If set, run dataset matching only for this particular dataset.

The identifier for this plugin is DatasetMatcher.

It can be found in the package com.eccenca.di.datamatching.

Task Activities¤

The following activities are available for different types of tasks.

Custom¤

Execute REST Task¤

Executes the REST task.

This plugin does not require any parameters. The identifier for this plugin is ExecuteRestTask.

It can be found in the package com.eccenca.di.workflow.operators.rest.

Dataset¤

Dataset profiler¤

Generates profiling data of a dataset, e.g. data types, statistics etc.

Parameter Type Description Example
datasetUri String Optional URI of the dataset resource that should be profiled. If not specified an URI will be generated.
uriPrefix String Optional URI prefix that is prepended to every generated URI, e.g. property URIs for every schema path. If not specified an URI prefix will be generated.
entitySampleLimit String How many entities should be sampled for the profiling. If left blank, all entities will be considered.
timeLimit String The time in milliseconds that each of the schema extraction step and profiling step should spend on. Leave blank for unlimited time.
classProfilingLimit int The maximum number of classes that are profiled from the extracted schema.
schemaEntityLimit int The maximum number of overall schema entities (types, properties/attributes) that will be extracted.
executionType String The execution type to be used: SPARK, LEGACY. The legacy execution uses large in-memory maps and takes longer!

The identifier for this plugin is DatasetProfiler.

It can be found in the package com.eccenca.di.profiling.

SQL endpoint status¤

Shows the SQL endpoint status.

This plugin does not require any parameters. The identifier for this plugin is SqlEndpointStatus.

It can be found in the package com.eccenca.di.sql.endpoint.activity.

Types cache¤

Holds the most frequent types in a dataset.

This plugin does not require any parameters. The identifier for this plugin is TypesCache.

It can be found in the package org.silkframework.workspace.activity.dataset.

LinkSpecification¤

Active learning¤

Executes an active learning iteration.

Parameter Type Description Example
fixedRandomSeed boolean No description

The identifier for this plugin is ActiveLearning.

It can be found in the package org.silkframework.learning.active.

Evaluate linking¤

Evaluates the linking task by generating links.

Parameter Type Description Example
includeReferenceLinks boolean Do not generate a link for which there is a negative reference link while always generating positive reference links.
useFileCache boolean Use a file cache. This avoids memory overflows for big files.
partitionSize int The number of entities in a single partition in the cache.
generateLinksWithEntities boolean Generate detailed information about the matched entities. If set to false, the generated links won’t be shown in the Workbench.
writeOutputs boolean Write the generated links to the configured output of this task.
linkLimit int If defined, the execution will stop after the configured number of links is reached.\This is just a hint and the execution may produce slightly fewer or more links.
timeout int Timeout in seconds after that the matching task of an evaluation should be aborted. Set to 0 or negative to disable the timeout.

The identifier for this plugin is EvaluateLinking.

It can be found in the package org.silkframework.workspace.activity.linking.

Execute linking¤

Executes the linking task using the configured execution.

This plugin does not require any parameters. The identifier for this plugin is ExecuteLinking.

It can be found in the package org.silkframework.workspace.activity.linking.

Linking paths cache¤

Holds the most frequent paths for the selected entities.

This plugin does not require any parameters. The identifier for this plugin is LinkingPathsCache.

It can be found in the package org.silkframework.workspace.activity.linking.

Reference entities cache¤

For each reference link, the reference entities cache holds all values of the linked entities.

This plugin does not require any parameters. The identifier for this plugin is ReferenceEntitiesCache.

It can be found in the package org.silkframework.workspace.activity.linking.

Supervised learning¤

Executes the supervised learning.

This plugin does not require any parameters. The identifier for this plugin is SupervisedLearning.

It can be found in the package org.silkframework.learning.active.

Scheduler¤

Activate¤

Executes the scheduler

This plugin does not require any parameters. The identifier for this plugin is ExecuteScheduler.

It can be found in the package com.eccenca.di.scheduler.

ScriptTask¤

Execute Script¤

Executes the script.

This plugin does not require any parameters. The identifier for this plugin is ExecuteScript.

It can be found in the package com.eccenca.di.scripting.scala.

TransformSpecification¤

Execute transform¤

Executes the transformation.

Parameter Type Description Example
limit IntOptionParameter Limits the maximum number of entities that are transformed.

The identifier for this plugin is ExecuteTransform.

It can be found in the package org.silkframework.workspace.activity.transform.

Transform paths cache¤

Holds the most frequent paths for the selected entities.

This plugin does not require any parameters. The identifier for this plugin is TransformPathsCache.

It can be found in the package org.silkframework.workspace.activity.transform.

Target vocabulary cache¤

Holds the target vocabularies

This plugin does not require any parameters. The identifier for this plugin is VocabularyCache.

It can be found in the package org.silkframework.workspace.activity.transform.

Workflow¤

Execute locally¤

Executes the workflow locally.

This plugin does not require any parameters. The identifier for this plugin is ExecuteLocalWorkflow.

It can be found in the package org.silkframework.workspace.activity.workflow.

WorkflowExecution¤

Generate Spark assembly¤

Generate project and Spark assembly artifacts and deploy them using the specified configuration settings: type, artifact and options like destination in case of a simple copy

Parameter Type Description Example
executeStaging boolean Execute loading phase
executeTransform boolean Execute transform phase
executeLoading boolean Execute staging phase

The identifier for this plugin is DeploySparkWorkflow.

It can be found in the package com.eccenca.di.spark.

Default execution¤

Executes a workflow with the executor defined in the configuration

This plugin does not require any parameters. The identifier for this plugin is ExecuteDefaultWorkflow.

It can be found in the package com.eccenca.di.spark.

Execute operator¤

Executes a workflow on with an executor that uses Apache Spark. Depending on the Spark configuration it can still run on a single local machine or on a cluster.

Parameter Type Description Example
operator TaskReference The workflow to execute.

The identifier for this plugin is ExecuteSparkOperator.

It can be found in the package com.eccenca.di.spark.

Execute on Spark¤

Executes a workflow on with an executor that uses Apache Spark. Depending on the Spark configuration it can still run on a single local machine or on a cluster.

This plugin does not require any parameters. The identifier for this plugin is ExecuteSparkWorkflow.

It can be found in the package com.eccenca.di.spark.

Execute with payload¤

Executes a workflow with custom payload.

Parameter Type Description Example
configuration MultilineStringParameter No description
configurationType String No description

The identifier for this plugin is ExecuteWorkflowWithPayload.

It can be found in the package org.silkframework.workbench.workflow.

Generate view¤

Generate and share a view on a workflow executed by the Spark executor. Executes a workflow on Spark and generates a SparkSQL temporary table instead of serializing the result. The table can be accessed via JDBC

Parameter Type Description Example
caching boolean Optional parameter that enables caching (default=false).
userDefinedName String Optional View name that is used when a view on a non virtual is generated (default = [TASK-ID]_generated_view).

The identifier for this plugin is GenerateSparkView.

It can be found in the package com.eccenca.di.sql.virtual.

Comments