Plugin Reference¤
Plugin Tasks¤
The following plugin tasks are available:
Cancel Workflow¤
Cancels a workflow if a specified condition is fulfilled.
Parameter | Type | Description | Default |
---|---|---|---|
typeUri | Uri | The entity type to check the condition on. | |
condition | Enum | The cancellation condition | empty |
invertCondition | boolean | If true, the specified condition will be inverted, i.e., the workflow execution will be cancelled if the condition is not fulfilled. | false |
The identifier for this plugin is CancelWorkflow
.
It can be found in the package com.eccenca.di.workflow.operators.cancel
.
SQL query¤
Executes a custom SQL query on the first input dataset and returns the result as its output.
Parameter | Type | Description | Default |
---|---|---|---|
command | MultilineStringParameter | SQL command. The name of the table in the statement must be ‘dataset’, regardless the input. |
The identifier for this plugin is CustomSQLExecution
.
It can be found in the package com.eccenca.di.spark.operator
.
Parse JSON¤
Takes exactly one input and reads either the defined inputPath or the first value of the first entity as a JSON document. Then executes incoming requests as if this were a JSON dataset, e.g. form a transformation task.
Parameter | Type | Description | Default |
---|---|---|---|
inputPath | String | The Silk path expression of the input entity that contains the JSON document. If not set, the value of the first defined property will be taken. | empty string |
basePath | String | The path to the elements to be read, starting from the root element, e.g., ‘/Persons/Person’. If left empty, all direct children of the root element will be read. | empty string |
uriSuffixPattern | String | A URI pattern that is relative to the base URI of the input entity, e.g., /{ID}, where {path} may contain relative paths to elements. This relative part is appended to the input entity URI to construct the full URI pattern. | empty string |
The identifier for this plugin is JsonParserOperator
.
It can be found in the package org.silkframework.plugins.dataset.json
.
Join tables¤
Joins a set of inputs into a single table. Expects a list of entity tables and links. All entity tables are joined into the first entity table using the provided links.
This plugin does not require any parameters.
The identifier for this plugin is Merge
.
It can be found in the package com.eccenca.di.merge
.
Merge tables¤
Stores sets of instance and mapping inputs as relational tables with the mapping as an n:m relation. Expects a list of entity tables and links. All entity tables have a relation to the first entity table using the provided links.
Parameter | Type | Description | Default |
---|---|---|---|
multiTableOutput | boolean | test | true |
pivotTableName | String | Name of the pivot table. | empty string |
mappingNames | String | Name of the mapping tables. Comma separated list. | empty string |
instanceSetNames | String | Name of the tables joined to the pivot. Comma separated list. | empty string |
The identifier for this plugin is MultiTableMerge
.
It can be found in the package com.eccenca.di.merge
.
Pivot¤
The pivot operator takes data in separate rows, aggregates it and converts it into columns. This operator can be used in a workflow right after a mapping task.
Parameter | Type | Description | Default |
---|---|---|---|
pivotProperty | String | The pivot column. | no default |
firstGroupProperty | String | The name of the first group column in the range. | no default |
lastGroupProperty | String | The name of the last group column in the range. If left empty, only the first column is used. | no default |
valueProperty | String | The property that contains the values that will be aggregated. | no default |
aggregationFunction | Enum | The aggregation function used to aggregate values. | sum |
uriPrefix | String | Prefix to prepend to all generated pivot columns. | empty string |
The identifier for this plugin is Pivot
.
It can be found in the package com.eccenca.di.pivot
.
REST request¤
Executes a REST request based on fixed configuration and/or input parameters and returns the result as entity.
Parameter | Type | Description | Default |
---|---|---|---|
url | String | The URL to execute this request against. This can be overwritten at execution time via input. | empty string |
method | String | The HTTP method. One of GET, PUT or POST | GET |
accept | String | The accept header String. | empty string |
requestTimeout | int | Request timeout in ms. The overall maximum time the request should take. | 10000 |
connectionTimeout | int | Connection timeout in ms. The time until which a connection with the remote end must be established. | 5000 |
readTimeout | int | Read timeout in ms. The max. time a request stays idle, i.e. no data is send or received. | 10000 |
contentType | String | The content-type header String. This can be set in case of PUT or POST. If another content type comes back, the task will fail. | empty string |
content | String | The content that is send with a POST or PUT request. For handling this payload dynamically this parameter must be overwritten via the task input. | empty string |
httpHeaders | MultilineStringParameter | Configure additional HTTP headers. One header per line. Each header entry follows the curl syntax. | |
readParametersFromInput | boolean | If this is set to true, specific parameters can be overwritten at execution time. Else inputs are ignored. Parameters that can currently be overwritten: url, content | false |
multipartFileParameter | String | If set to a non-empty String then instead of a normal POST a multipart/form-data file upload request is executed. This value is used as the form parameter name. | empty string |
authorizationHeader | String | The authorization header. This is usually either ‘Authorization’ or ‘Proxy-Authorization’If left empty, no authorization header is sent. | empty string |
authorizationHeaderValue | PasswordParameter | The authorization header value. Usually this has the form ‘type secret’, e.g. for OAuth ‘bearer |
|
acceptAnySslCertificate | boolean | If enabled this will accept any SSL certificate, i.e. make SSL connections unsecure. Only enable if you know what you are doing! | false |
The identifier for this plugin is RestOperator
.
It can be found in the package com.eccenca.di.workflow.operators.rest
.
Scheduler¤
Executes a workflow at specified intervals.
Parameter | Type | Description | Default |
---|---|---|---|
task | TaskReference | The name of the workflow to be executed | no default |
interval | Duration | The interval at which the scheduler should run the referenced task. Must be in ISO-8601 duration format PnDTnHnMn.nS | PT15M |
startTime | String | The time when the scheduled task is run for the first time, e.g., 2017-12-03T10:15:30. If no start time is set, midnight on the day the scheduler is started is assumed. | empty string |
enabled | boolean | Enables or disables the scheduler. | true |
stopOnError | boolean | If true, this will stop the scheduler, so the failed task is not scheduled again for execution. | false |
The identifier for this plugin is Scheduler
.
It can be found in the package com.eccenca.di.scheduler
.
Search addresses¤
Looks up locations from textual descriptions using the configured geocoding API. Outputs results as RDF.
Parameter | Type | Description | Default |
---|---|---|---|
searchAttributes | StringTraversableParameter | List of attributes that contain search terms. Multiple attributes (comma-separated) will be concatenated into a single search. | no default |
limit | IntOptionParameter | Optionally limits the number of results for each search. | |
jsonLdContext | ResourceOption | Optional JSON-LD context to be used for converting the returned JSON to RDF. If not provided, a default context will be used. | |
additionalParameters | String | Additional URL parameters to be attached to each HTTP search request. Example: ‘&countrycodes=de&addressdetails=1’. Consult the API documentation for a list of available parameters. | empty string |
The identifier for this plugin is SearchAddresses
.
It can be found in the package com.eccenca.di.geo
.
Configuration
The geocoding service to be queried for searches can be set up in the configuration. The default configuration is as follows:
com.eccenca.di.geo = {
# The URL of the geocoding service
# url = "https://nominatim.eccenca.com/search"
url = "https://photon.komoot.de/api"
# url = https://api-adresse.data.gouv.fr/search
# Additional URL parameters to be attached to all HTTP search requests. Example: '&countrycodes=de&addressdetails=1'.
# Will be attached in addition to the parameters set on each search operator directly.
searchParameters = ""
# The minimum pause time between subsequent queries
pauseTime = 1s
# Number of coordinates to be cached in-memory
cacheSize = 10
}
In general, all services adhering to the Nominatim search API should be usable. Please note that when using public services, the pause time should be set to avoid overloading.
Logging
By default, individual requests to the geocoding service are not logged. To enable logging each request, the following configuration option can be set:
logging.level {
com.eccenca.di.geo=DEBUG
}
Send eMail¤
Sends an eMail using an SMTP server. If connected to a dataset that is based on a file in a workflow, it will send that file whenever the workflow is executed It can be used to send the result of a workflow via Mail.
Parameter | Type | Description | Default |
---|---|---|---|
host | String | The SMTP host, e.g, mail.myProvider.com | no default |
port | int | The SMTP port | 587 |
user | String | Username | empty string |
password | PasswordParameter | Password | |
from | String | The sender eMail address | empty string |
receiver | String | The email addresses of the receivers. Email addresses are comma separated. Names must be quoted when containing commas.Example: john.smith@example.com, “Doe, John” john.doe@example.com, needs no quoting needs.no.quoting@example.com | empty string |
cc | String | The CC-receiver eMail address. Email addresses are comma separated. Names must be quoted when containing commas.Example: john.smith@example.com, “Doe, John” john.doe@example.com, needs no quoting needs.no.quoting@example.com | empty string |
bcc | String | The BCC-receiver eMail address. Email addresses are comma separated. Names must be quoted when containing commas.Example: john.smith@example.com, “Doe, John” john.doe@example.com, needs no quoting needs.no.quoting@example.com | empty string |
subject | String | The eMail subject | Dataset |
message | MultilineStringParameter | The eMail text message | |
withAttachment | boolean | If enabled a file from the input is attached to the email. A single input to this operator is expected that provides a file, e.g. a file based dataset (XML, JSON etc.). | true |
sslConnection | boolean | When enabled a SSL/TLS connection will be forced from the start without negotiation with the server. Not to be confused with STARTTLS which upgrades an insecure connection to a SSL/TLS connection, which is done by default. | false |
timeout | int | Timeout in milliseconds to establish a connection or wait for a server response. Setting it to 0 or negative number will disable the timeout. | 10000 |
readParametersFromInput | boolean | When enabled this allows to send multiple e-mails. All e-mail configurations are input via the first operator input with each entry representing a different e-mail. The optional second input can be a file based dataset for the attachment. E-mail parameters that can be overwritten are: from, receiver, cc, bcc, subject and message. | false |
nrRetries | int | The number of retries per email when send errors are encountered. | 2 |
delayBetweenDeliveriesMS | int | The delay in milliseconds between sending two consecutive e-mails. This applies to the retry mechanism, but also to sending multiple e-mails. | 2 |
The identifier for this plugin is SendEMail
.
It can be found in the package com.eccenca.di.mail
.
Execute Spark function¤
Applies a specified Scala function to a specified field. E.g. when the inputField is ‘name’, the inputFunction is ‘any => “Arrrrgh!” and the alias is ‘xxx’,)’ a query corresponding to ‘Function existingField1, existingFiled2, … “Arrrrgh!” as “xxx”’ will be generated. If alias is empty the inputField will be overwritten, otherwise a new field will be added and the rest of the schema stays the same.
Parameter | Type | Description | Default |
---|---|---|---|
function | MultilineStringParameter | Scala function expression. | |
inputField | String | Input field. | empty string |
alias | String | Alias. | no default |
The identifier for this plugin is SparkFunction
.
It can be found in the package com.eccenca.di.spark.operator
.
Evaluate template¤
Evaluates a template on a sequence of entities. Can be used after a transformation or directly after datasets that output a single table, such as CSV or Excel. For each input entity, a output entity is generated that provides a single output attribute, which contains the evaluated template.
Parameter | Type | Description | Default |
---|---|---|---|
template | MultilineStringParameter | The template | no default |
language | Enum | The template language. Currently, Jinja is supported. | Jinja |
outputAttribute | String | The attribute in the output that will hold the evaluated template. | output |
forwardInputAttributes | boolean | If true, the input attributes will be forwarded to the output. | false |
The identifier for this plugin is Template
.
It can be found in the package com.eccenca.di.templating.operators
.
The template operator supports the Jinja templating language. Documentation about Jinja can be found in the official Template Designer Documentation.
Currently, the template operator does have the following limitations: - As Jinja does not support special characters, such as colons, in variable names, RDF properties cannot be accessed. For this reason, the transformation that precedes the template operator needs to make sure that it generates attributes that are valid Jinja variable names. - Accessing nested paths is not supported. If the preceding transformation contains hierarchical mappings, only the attributes from the root mapping can be accessed.
Unpivot¤
Given a list of table columns, transforms those columns into attribute-value pairs. This operator can be used in a workflow right after a mapping task.
Parameter | Type | Description | Default |
---|---|---|---|
firstPivotProperty | String | The name of the first pivot column in the range. | no default |
lastPivotProperty | String | the name of the last pivot column in the range. If left empty, all columns starting with the first pivot column are used. | no default |
attributeProperty | String | The URI of the output column used to hold the attribute. | attribute |
valueProperty | String | The URI of the output column used to hold the value. | value |
pivotColumns | String | Comma separated list of pivot column names. This property will override all inferred columns of the first two arguments. | empty string |
The identifier for this plugin is Unpivot
.
It can be found in the package com.eccenca.di.unpivot
.
Parse XML¤
Takes exactly one input and reads either the defined inputPath or the first value of the first entity as XML document. Then executes the given output entity schema similar to the XML dataset to construct the result entities.
Parameter | Type | Description | Default |
---|---|---|---|
inputPath | String | The Silk path expression of the input entity that contains the XML document. If not set, the value of the first defined property will be taken. | empty string |
basePath | String | The path to the elements to be read, starting from the root element, e.g., ‘/Persons/Person’. If left empty, all direct children of the root element will be read. | empty string |
uriSuffixPattern | String | A URI pattern that is relative to the base URI of the input entity, e.g., /{ID}, where {path} may contain relative paths to elements. This relative part is appended to the input entity URI to construct the full URI pattern. | empty string |
The identifier for this plugin is XmlParserOperator
.
It can be found in the package org.silkframework.plugins.dataset.xml
.
Upload File to Knowledge Graph¤
Uploads an N-Triples file from the file repository to a ‘Knowledge Graph’ dataset. The output of this operatorcan be the input of datasets that support graph store file upload, e.g. ‘Knowledge Graph’. The file will be uploaded to the graph specified in that dataset.
Parameter | Type | Description | Default |
---|---|---|---|
fileNT | Resource | N-Triples file from the resource repository that should be uploaded to the Knowledge Graph. | no default |
maxChunkSizeInMB | int | The N-Triples file will be split into multiple chunks if the file size exceeds the max chunk size. | no default |
The identifier for this plugin is eccencaDataPlatformGraphStoreFileUploadOperator
.
It can be found in the package com.eccenca.di.plugins.dataplatform
.
SPARQL Construct query¤
A task that executes a SPARQL Construct query on a SPARQL enabled data source and outputs the SPARQL result. If the result should be written to the same RDF store it is read from, the SPARQL Update operator is preferable.
Parameter | Type | Description | Default |
---|---|---|---|
query | MultilineStringParameter | A SPARQL 1.1 construct query | no default |
tempFile | boolean | When copying directly to the same SPARQL Endpoint or when copying large amounts of triples, set to True by default | true |
The identifier for this plugin is sparqlCopyOperator
.
It can be found in the package org.silkframework.plugins.dataset.rdf.tasks
.
SPARQL Select query¤
A task that executes a SPARQL Select query on a SPARQL enabled data source and outputs the SPARQL result. If the SPARQL source is defined on a specific graph, a FROM clause will be added to the query at execution time, except when there already exists a GRAPH or FROM clause in the query. FROM NAMED clauses are not injected.
Parameter | Type | Description | Default |
---|---|---|---|
selectQuery | MultilineStringParameter | A SPARQL 1.1 select query | no default |
limit | String | If set to a positive integer, the number of results is limited | empty string |
optionalInputDataset | SparqlEndpointDatasetParameter | An optional SPARQL dataset that can be used for example data, so e.g. the transformation editor shows mapping examples. | |
sparqlTimeout | int | SPARQL query timeout (select/update) in milliseconds. A value of zero means that there is no timeout set explicitly. If a value greater zero is specified this overwrites possible default timeouts. | 0 |
The identifier for this plugin is sparqlSelectOperator
.
It can be found in the package org.silkframework.plugins.dataset.rdf.tasks
.
SPARQL Update query¤
A task that outputs SPARQL Update queries for every entity from the input based on a SPARQL Update template. The output of this operator should be connected to the SPARQL datasets to which the results should be written. In contrast to the SPARQL select operator, no FROM clause gets injected into the query.
Parameter | Type | Description | Default |
---|---|---|---|
sparqlUpdateTemplate | MultilineStringParameter | \This operator takes a SPARQL Update Query Template that depending on the templating mode (Simple/Velocity Engine) supports\a set of templating features, e.g. filling in input values via placeholders in the template.\Example for the ‘Simple’ mode:\ DELETE DATA { ${ |
no default |
batchSize | int | How many entities should be handled in a single update request. | 1 |
templatingMode | Enum | The templating mode. ‘Simple’ only allows simple URI and literal insertions, whereas ‘Velocity Engine’ supports complex templating. See ‘Sparql Update Template’ parameter description for examples and http://velocity.apache.org for details on the Velocity templates. | simple |
The identifier for this plugin is sparqlUpdateOperator
.
It can be found in the package org.silkframework.plugins.dataset.rdf.tasks
.
Request RDF triples¤
A task that requests all triples from an RDF dataset.
This plugin does not require any parameters.
The identifier for this plugin is tripleRequestOperator
.
It can be found in the package com.eccenca.di.workflow.operators.tripleRequest
.
Normalize units of measurement¤
Custom task that will substitute numeric values and pertaining unit symbols with a SI-system-unit normalized representation in three columns: * The normalized numeric value. * The unit symbol of the SI-system-unit pertaining to the value. * The origin unit symbol from which it was normalized (so we are able to reverse this action).
Parameter | Type | Description | Default |
---|---|---|---|
valueProperties | String | The names (comma-separated) of columns containing numeric values interpreted as quantities of the dimension indicated by the pertaining unit. | no default |
unitProperties | String | The names (comma-separated) of dedicated columns containing the unit symbol for the pertaining value in the value column (the positions in this list have to align with the pertaining value columns). Either this param or ‘static unit’ has to be set. | empty string |
staticUnits | String | Unit symbols (comma-separated) defining the unit for all values in the pertaining value column. If set, the ‘unitProperty’ param will be ignored and all values of the value column have to be numbers without unit symbols (the positions in this list have to align with the pertaining value columns). | empty string |
targetUnits | String | Unit symbols (comma-separated) defining the target unit to which the value column will be converted (Note: Make sure the input unit can be converted to the target unit). By default the pertaining SI-base unit will be used as normalization unit (the positions in this list have to align with the pertaining value columns) | empty string |
suppressErrors | boolean | If true, will ignore any parsing or value conversion error and return an empty result (might happen because of unknown unit symbols or non-numbers as values). Beware, the value will be lost completely! | false |
configFilePath | WritableResource | An absolute file path for a unit CSV configuration file (for syntax see ‘configuration’ param). If set, the ‘configuration’ param will be ignored. | EmptyResource |
configuration | MultilineStringParameter | While all SI units and decimal prefixes are supported by default, custom or obsolete units have to be added via this configuration.\ NOTE: when constructing formulae depending on other units defined in the configuration, make sure to order them dependently.\ ALSO: Rational numbers are not supported by the UCUM syntax, express them as a fraction (see ‘grain’ example below).\ | # Example configuration, don’t forget to remove the ‘#’ in front of each row.# CSV COLUMNS:# * unit name - the human readable name of the unit# * override - (true |
The identifier for this plugin is ucumNormalizationTask
.
It can be found in the package com.eccenca.di.measure
.
XSLT¤
A task that converts an XML resource via an XSLT script and writes the transformed output into a file resource.
Parameter | Type | Description | Default |
---|---|---|---|
file | Resource | The XSLT file to be used for transforming XML. | no default |
The identifier for this plugin is xsltOperator
.
It can be found in the package org.silkframework.plugins.dataset.xml
.
Dataset Plugins¤
The following dataset plugins are available:
Deprecated¤
SparkSQL view¤
Please use SQL endpoint (embedded) instead.
Parameter | Type | Description | Default |
---|---|---|---|
viewName | String | The name of the view. This specifies the table that can be queried by another virtual dataset or via JDBC (the ‘default’ schema is used for all virtual datasets). | no default |
query | String | Optional SQL query on the selected table. Has no effect when used as an output dataset. | empty string |
cache | boolean | Optional boolean option that selects if the table should be cached by Spark or not (default = true). | true |
uriPattern | String | A pattern used to construct the entity URI. If not provided the prefix + the line number is used. An example of such a pattern is ‘urn:zyx:{id}’ where id is a name of a property. | empty string |
properties | String | Comma-separated list of URL-encoded properties. If not provided, the list of properties is read from the first line. | empty string |
charset | String | The source internal encoding, e.g., UTF8, ISO-8859-1 | UTF-8 |
arraySeparator | String | The character that is used to separate the parts of array values. Write “back slash t” to specify the tab character. | |
useCompatibleTypes | boolean | If true, basic types will be used for types that otherwise would result in client errors. This mainly that arrays will be stored as Strings separated by the separator defined above. If the view is only for use within a SparkContext, this can be set to false. | true |
The identifier for this plugin is sparkView
.
It can be found in the package com.eccenca.di.sql.virtual
.
Uncategorized¤
Text¤
Reads and writes plain text files.
Parameter | Type | Description | Default |
---|---|---|---|
file | WritableResource | The plain text file. | no default |
charset | String | The file encoding, e.g., UTF-8, UTF-8-BOM, ISO-8859-1 | UTF-8 |
typeName | String | A type name that represents this file. | type |
property | String | The single property that holds the text. | text |
The identifier for this plugin is text
.
It can be found in the package org.silkframework.plugins.dataset.text
.
embedded¤
Hive database¤
Read from or write to an embedded Apache Hive endpoint.
Parameter | Type | Description | Default |
---|---|---|---|
schema | String | Name of the hive schema or namespace. | empty string |
table | String | Name of the hive table. | no default |
query | String | Optional query for projection and selection (e.g. ” SELECT * FROM table WHERE x = true”. | empty string |
uriPattern | String | A pattern used to construct the entity URI. If not provided the prefix + the line number is used. An example of such a pattern is ‘urn:zyx:{id}’ where id is a name of a property. | empty string |
properties | String | Comma-separated list of URL-encoded properties. If not provided, the list of properties is read from the first line. | empty string |
charset | String | The source internal encoding, e.g., UTF8, ISO-8859-1 | UTF-8 |
The identifier for this plugin is Hive
.
It can be found in the package com.eccenca.di.spark.dataset
.
Knowledge Graph¤
Read RDF from or write RDF to a Knowledge Graph embedded in Corporate Memory.
Parameter | Type | Description | Default |
---|---|---|---|
endpoint | String | The named endpoint within the eccenca DataPlatform. | default |
graph | String | The URI of the named graph. | no default |
pageSize | int | The number of solutions to be retrieved per SPARQL query. | 100000 |
pauseTime | int | The number of milliseconds to wait between subsequent query | 0 |
retryCount | int | The number of retries if a query fails | 3 |
retryPause | int | The number of milliseconds to wait until a failed query is retried. | 1000 |
strategy | Enum | The strategy use for retrieving entities: simple: Retrieve all entities using a single query; subQuery: Use a single query, but wrap it for improving the performance on Virtuoso; parallel: Use a separate Query for each entity property. | parallel |
clearGraphBeforeExecution | boolean | If set to true this will clear the specified graph before executing a workflow that writes to it. | false |
entityList | MultilineStringParameter | A list of entities to be retrieved. If not given, all entities will be retrieved. Multiple entities are separated by whitespace. | |
sparqlTimeout | int | SPARQL query timeout (select/update) in milliseconds. A value of zero means that there is no timeout. If a value greater zero is specified this overwrites possible default timeouts. This timeout is also propagated to DataPlatform and may overwrite default timeouts there. | 0 |
optimizedRetrieve | boolean | Optimized retrieval method to remove load from the underlying triple store. Query parallelism is limited and cheaper queries are executed against the backend. By putting the main work on DataIntegration side, the RDF backend is kept responsive. | true |
The identifier for this plugin is eccencaDataPlatform
.
It can be found in the package com.eccenca.di.plugins.dataplatform
.
In-memory dataset¤
A Dataset that holds all data in-memory.
Parameter | Type | Description | Default |
---|---|---|---|
clearGraphBeforeExecution | boolean | If set to true this will clear this dataset before it is used in a workflow execution. | true |
The identifier for this plugin is inMemory
.
It can be found in the package org.silkframework.plugins.dataset.rdf.datasets
.
Internal dataset¤
Dataset for storing entities between workflow steps.
Parameter | Type | Description | Default |
---|---|---|---|
graphUri | String | The RDF graph that is used for storing internal data | null |
The identifier for this plugin is internal
.
It can be found in the package org.silkframework.plugins.dataset
.
SQL endpoint¤
Provides a JDBC endpoint that exposes workflow or transformation results as tables, which can be queried using SQL.
Parameter | Type | Description | Default |
---|---|---|---|
tableNamePrefix | String | Prefix of the table that will be shared. In the case of complex mappings more than one table will be created. If one name is given it will be used as a prefix for table names. If left empty the table names will be generated from the user name and time stamps and start with ‘root’, ‘object-mapping’ | empty string |
cache | boolean | Optional boolean option that selects if the table should be cached by Spark or not (default = true). | true |
arraySeparator | String | The character that is used to separate the parts of array values. Write \t to specify the tab character. | |
useCompatibleTypes | boolean | If true, basic types will be used for unusual data types that otherwise may result in client errors. Try switching this on, if a client has weird error messages. (Default = true) | true |
map | Map | Mapping of column names. Similar to aliases E.g. ‘c1:c2’ would rename column c1 into c2. |
The identifier for this plugin is sqlEndpoint
.
It can be found in the package com.eccenca.di.sql.endpoint
.
SQL endpoint dataset parameters
The dataset only requires that the tableNamePrefix parameter is given. This will be used as the prefix for the names of the generated tables. When a set of entities is written to the endpoint a view is generated for each entity type (defined by an ‘rdf_type’ attribute). That means that the mapping or data source that are used as input for the SQL endpoint need to have a type or require a user defined type mapping.
The operator has a compatibility mode. This mode will avoid complex types such as Arrays. When arrays exist in the input they are converted to a String using the given arraySeparator. This avoids errors and warnings in some Jdbc clients that are unable to handle typed arrays and may make working with software like Excel easier.
The parameter aliasMap of the endpoint allows the specification of column aliases. The map is a comma separated list of key-value pairs.
Each key and value is denoted by key:value
. An example for renaming 2 columns (source1, source2 to target1, target2) in the result would be:
source1:target1,source2:target2
Note: Table and column (mapping target) names will be automatically converted to be valid in as many databases as possible. Table names will be shortened to 128 characters. Only a-z, A-Z, 0-9 and _ are allowed. Others will be replaced with an underscore. Column names undergo the same transformation but will be converted to lower case as well. The log will inform about changes. The table names will be generated based on the target type of each mapping. The user needs to make sure that each object mapping specifies a unique type. If two object mappings define the same type, only the last one will be written.
SQL endpoint activity
See [ActivityDocumentation] for a general description of the Data Integration activities. The activity will start automatically, when the SQL endpoint is used as a data sink and Data Integration is configured to make the SQL endpoint accessible remotely.
When the activity is started and running it returns the server status and JDBC URL as its value.
Stopping the activity will drop all views generated by the activity. It can be restarted by rerunning the workflow containing it as a sink.
Remote client configuration (via JDBC and ODBC)
Within Data Integration the SQL endpoint can be used as a source or sink like any other dataset. If the startThriftServer option is set to ‘true’ access via JDBC or ODBC is possible.
ODBC and JDBC drivers can be used to connect to relational databases.
When selecting a version of a driver the client operating system and its type (32bit/64 bit) are the most important factors. The version of the client drivers sometimes is the same as the server’s. If no version of a driver is given, the newest driver of the vendor should work, as it should be backwards compatible.
Any JDBC or ODBC client can connect to an SQL endpoint dataset. SparkSQL uses the same query processing as Hive, therefore the requirements for the client are:
- A JDBC driver compatible with Hive 1.2.11 (platform independent driver org.apache.hive.jdbc.HiveDriver is needed) or
- A JDBC driver compatible with Spark 2.3.3
- A Hive ODBC driver (ODBC driver for the client architecture and operating system needed)
A detailed instruction to connect to a Hive or SparkSQL endpoint with various tools (e.g. SQuirreL, beeline, SQL Developer, …) can be found at Apache HiveServer2 Clients. The database client DBeaver can connect to the SQL endpoint out of the box.
Variable dataset¤
Dataset that acts as a placeholder in workflows and is replaced at request time.
This plugin does not require any parameters.
The identifier for this plugin is variableDataset
.
It can be found in the package org.silkframework.dataset
.
file¤
Alignment¤
Writes the alignment format specified at http://alignapi.gforge.inria.fr/format.html.
Parameter | Type | Description | Default |
---|---|---|---|
file | WritableResource | The alignment file. | no default |
The identifier for this plugin is alignment
.
It can be found in the package org.silkframework.plugins.dataset.rdf.datasets
.
Avro¤
Read from or write to an Apache Avro file.
Parameter | Type | Description | Default |
---|---|---|---|
file | WritableResource | Path (e.g. relative like ‘path/filename.avro’ or absolute ‘hdfs:///path/filename.avro’). | no default |
uriPattern | String | A pattern used to construct the entity URI. If not provided the prefix + the line number is used. An example of such a pattern is ‘urn:zyx:{id}’ where id is a name of a property. | empty string |
properties | String | Comma-separated list of URL-encoded properties. If not provided, the list of properties is read from the first line. | empty string |
charset | String | The file encoding, e.g., UTF8, ISO-8859-1 | UTF-8 |
The identifier for this plugin is avro
.
It can be found in the package com.eccenca.di.spark.dataset
.
CSV¤
Read from or write to an CSV file.
Parameter | Type | Description | Default |
---|---|---|---|
file | WritableResource | The CSV file. This may also be a zip archive of multiple CSV files that share the same schema. | no default |
properties | String | Comma-separated list of properties. If not provided, the list of properties is read from the first line. Properties that are no valid (relative or absolute) URIs will be encoded. | empty string |
separator | String | The character that is used to separate values. If not provided, defaults to ‘,’, i.e., comma-separated values. “\t” for specifying tab-separated values, is also supported. | , |
arraySeparator | String | The character that is used to separate the parts of array values. Write “\t” to specify the tab character. | empty string |
quote | String | Character used to quote values. | “ |
uri | String | Deprecated A pattern used to construct the entity URI. If not provided the prefix + the line number is used. An example of such a pattern is ‘urn:zyx:{id}’ where id is a name of a property. | empty string |
charset | String | The file encoding, e.g., UTF-8, UTF-8-BOM, ISO-8859-1 | UTF-8 |
regexFilter | String | A regex filter used to match rows from the CSV file. If not set all the rows are used. | empty string |
linesToSkip | int | The number of lines to skip in the beginning, e.g. copyright, meta information etc. | 0 |
maxCharsPerColumn | int | The maximum characters per column. Warning: System will request heap memory of that size (2 bytes per character) when reading the CSV. If there are more characters found, the parser will fail. | 128000 |
ignoreBadLines | boolean | If set to true then the parser will ignore lines that have syntax errors or do not have to correct number of fields according to the current config. | false |
quoteEscapeCharacter | String | Escape character to be used inside quotes, used to escape the quote character. It must also be used to escape itself, e.g. by doubling it, e.g. “”. If left empty, it defaults to quote. | “ |
zipFileRegex | String | If the input resource is a ZIP file, files inside the file are filtered via this regex. | .*\.csv$ |
The identifier for this plugin is csv
.
It can be found in the package org.silkframework.plugins.dataset.csv
.
Excel¤
Read from or write to an Excel workbook in Open XML format (XLSX).
Parameter | Type | Description | Default |
---|---|---|---|
file | WritableResource | File name inside the resources directory. | no default |
streaming | boolean | Streaming enables reading and writing large Excels files. Warning: Be careful to disable streaming for large datasets (> 10MB), because of high memory consumption. | true |
linesToSkip | int | The number of lines to skip in the beginning when reading files. | 0 |
hasHeader | boolean | If true, the first line will be read as the table header, which defines the column names. If false, the first line will be read as data. In that case, the columns need to be adressed using #A, #B, etc. | true |
outputObjectValues | boolean | Output results from object rules (URIs). | true |
The identifier for this plugin is excel
.
It can be found in the package com.eccenca.di.excel
.
RDF¤
Dataset which retrieves and writes all entities from/to an RDF file. The dataset is loaded in-memory and thus the size is restricted by the available memory. Large datasets should be loaded into an external RDF store and retrieved using the SPARQL dataset instead.
Parameter | Type | Description | Default |
---|---|---|---|
file | WritableResource | The RDF file. This may also be a zip archive of multiple RDF files. | no default |
format | String | Optional RDF format. If left empty, it will be auto-detected based on the file extension. N-Triples is the only format that can be written, while other formats can only be read. | empty string |
graph | String | The graph name to be read. If not provided, the default graph will be used. Must be provided if the format is N-Quads. | empty string |
entityList | MultilineStringParameter | A list of entities to be retrieved. If not given, all entities will be retrieved. Multiple entities are separated by whitespace. | |
zipFileRegex | String | If the input resource is a ZIP file, files inside the file are filtered via this regex. | .* |
The identifier for this plugin is file
.
It can be found in the package org.silkframework.plugins.dataset.rdf.datasets
.
JSON¤
Read from or write to a JSON file.
Parameter | Type | Description | Default |
---|---|---|---|
file | WritableResource | Json file. | no default |
template | MultilineStringParameter | Template for writing JSON. The term {{output}} will be replaced by the written JSON. | {{output}} |
basePath | String | The path to the elements to be read, starting from the root element, e.g., ‘/Persons/Person’. If left empty, all direct children of the root element will be read. | empty string |
uriPattern | String | A URI pattern, e.g., http://namespace.org/{ID}, where {path} may contain relative paths to elements | empty string |
maxDepth | int | Maximum depth of written JSON. This acts as a safe guard if a recursive structure is written. | 15 |
streaming | boolean | Streaming allows for reading large JSON files. If streaming is enabled, backward paths are not supported. | true |
The identifier for this plugin is json
.
It can be found in the package org.silkframework.plugins.dataset.json
.
Typically, this dataset is used to transform an JSON file to another format, e.g., to RDF.
It supports a number of special paths:
- #id
Is a special syntax for generating an id for a selected element. It can be used in URI patterns for entities which do not provide an identifier. Examples: http://example.org/{#id}
or http://example.org/{/pathToEntity/#id}
.
- #text
retrieves the text of the selected node.
- The backslash can be used to navigate to the parent JSON node, e.g., \parent/key
. The name of the backslash key (here parent
) is ignored.
When storing entities in Json format all entities will be stored in an array at the top-level of the Json document. The option makeFirstEntityJsonObject (false by default) can change this. If activated a top level object will be used. To preserve valid Json, only the first entity will be stored in this case.
Multi CSV ZIP¤
Reads from or writes to multiple CSV files from/to a single ZIP file.
Parameter | Type | Description | Default |
---|---|---|---|
file | WritableResource | Zip file name inside the resources directory/repository. | no default |
separator | String | The character that is used to separate values. If not provided, defaults to ‘,’, i.e., comma-separated values. “\t” for specifying tab-separated values, is also supported. | , |
arraySeparator | String | The character that is used to separate the parts of array values. Write “\t” to specify the tab character. | empty string |
quote | String | Character used to quote values. | “ |
charset | String | The file encoding, e.g., UTF8, ISO-8859-1 | UTF-8 |
linesToSkip | int | The number of lines to skip in the beginning, e.g. copyright, meta information etc. | 0 |
maxCharsPerColumn | int | The maximum characters per column. If there are more characters found, the parser will fail. | 128000 |
ignoreBadLines | boolean | If set to true then the parser will ignore lines that have syntax errors or do not have to correct number of fields according to the current config. | false |
quoteEscapeCharacter | String | Escape character to be used inside quotes, used to escape the quote character. It must also be used to escape itself, e.g. by doubling it, e.g. “”. If left empty, it defaults to quote. | “ |
append | boolean | If ‘True’ then files in the ZIP archive are only added or updated, all other files in the ZIP stay untouched. If ‘False’ then a new ZIP file will be created on every dataset write. | true |
zipFileRegex | String | Filter file paths inside the ZIP file via this regex. By default sub folders or files not ending with .csv are ignored. | ^[^/]*\.csv$ |
The identifier for this plugin is multiCsv
.
It can be found in the package com.eccenca.di.plugins.csv
.
ORC¤
Read from or write to an Apache ORC file.
Parameter | Type | Description | Default |
---|---|---|---|
file | WritableResource | Path (e.g. relative like ‘path/filename.orc’ or absolute ‘hdfs:///path/filename.orc’). | no default |
uriPattern | String | A pattern used to construct the entity URI. If not provided the prefix + the line number is used. An example of such a pattern is ‘urn:zyx:{id}’ where id is a name of a property. | empty string |
properties | String | Comma-separated list of URL-encoded properties. If not provided, the list of properties is read from the first line. | empty string |
partition | String | Optional specification of the attribute for output partitioning | empty string |
compression | String | Optional compression algorithm (e.g. snappy, zlib) | snappy |
charset | String | The file encoding, e.g., UTF8, ISO-8859-1 | UTF-8 |
The identifier for this plugin is orc
.
It can be found in the package com.eccenca.di.spark.dataset
.
Parquet¤
Read from or write to an Apache Parquet file.
Parameter | Type | Description | Default |
---|---|---|---|
file | WritableResource | Path (e.g. relative like ‘path/filename.orc’ or absolute ‘hdfs:///path/filename.parquet’). | no default |
uriPattern | String | A pattern used to construct the entity URI. If not provided the prefix + the line number is used. An example of such a pattern is ‘urn:zyx:{id}’ where id is a name of a property. | empty string |
properties | String | Comma-separated list of URL-encoded properties. If not provided, the list of properties is read from the first line. | empty string |
partition | String | Optional specification of the attribute for output partitioning | empty string |
compression | String | Optional compression algorithm (e.g. snappy, zlib) | empty string |
charset | String | The file encoding, e.g., UTF8, ISO-8859-1 | UTF-8 |
The identifier for this plugin is parquet
.
It can be found in the package com.eccenca.di.spark.dataset
.
XML¤
Read from or write to an XML file.
Parameter | Type | Description | Default |
---|---|---|---|
file | WritableResource | The XML file. This may also be a zip archive of multiple XML files that share the same schema. | no default |
basePath | String | The base path when writing XML. For instance: /RootElement/Entity. Should no longer be used for reading XML! Instead, set the base path by specifying it as input type on the subsequent transformation or linking tasks. | empty string |
uriPattern | String | A URI pattern, e.g., http://namespace.org/{ID}, where {path} may contain relative paths to elements | empty string |
outputTemplate | MultilineStringParameter | The output template used for writing XML. Must be valid XML. The generated entity is identified through a processing instruction of the form <?MyEntity?>. | |
streaming | boolean | Streaming allows for reading large XML files. | true |
maxDepth | int | Maximum depth of written XML. This acts as a safe guard if a recursive structure is written. | 15 |
zipFileRegex | String | If the input resource is a ZIP file, files inside the file are filtered via this regex. | .*\.xml$ |
The identifier for this plugin is xml
.
It can be found in the package org.silkframework.plugins.dataset.xml
.
Typically, this dataset is used to transform an XML file to another format, e.g., to RDF. When this dataset is used as an input for another task (e.g., a transformation task), the input type of the consuming task selects the path where the entities to be read are located.
Example:
<Persons>
<Person>
<Name>John Doe</Name>
<Year>1970</Year>
</Person>
<Person>
<Name>Max Power</Name>
<Year>1980</Year>
</Person>
</Persons>
A transformation for reading all persons of the above XML would set the input type to /Person
.
The transformation iterates all entities matching the given input path.
In the above example the first entity to be read is:
<Person>
<Name>John Doe</Name>
<Year>1970</Year>
</Person>
All paths used in the consuming task are relative to this, e.g., the person name can be addressed with the path /Name
.
Path examples:
- The empty path selects the root element.
/Person
selects all persons./Person[Year = "1970"]
selects all persons which are born in 1970./#id
Is a special syntax for generating an id for a selected element. It can be used in URI patterns for entities which do not provide an identifier. Examples:http://example.org/{#id}
orhttp://example.org/{/pathToEntity/#id}
.- The wildcard * enumerates all direct children, e.g.,
/Persons/*/Name
. - The wildcard ** enumerates all direct and indirect children.
- The backslash can be used to navigate to the parent XML node, e.g.,
\Persons/SomeHeader
. #text
retrieves the text of the selected node.
remote¤
JDBC endpoint¤
Connect to an existing JDBC endpoint.
Parameter | Type | Description | Default |
---|---|---|---|
url | String | JDBC URL, must contain the database as parameter, i.g. with ;database=DBNAME or /database depending on the vendor. | no default |
table | String | Table name. Can be empty if the read-strategy is not set to read the full table. If non-empty it has to contain at least an existing table. | empty string |
sourceQuery | MultilineStringParameter | Source query (e.g. ‘SELECT TOP 10 * FROM table WHERE x = true’. Warning: Uses Driver (mySql, HiveQL, MSSql, Postgres) specific syntax. Can be left empty when full tables are loaded. Note: Even if columns with spaces/special characters are named in the query, they need to be referred to URL-encoded in subsequent transformations. | |
groupBy | String | Comma separated list of attributes appearing in the outer SELECT clause that should be grouped by. The attributes are matched case-insensitive. All other attributes will be grouped via an aggregation function that depends on the supported DBMS, e.g. (JSON) array aggregation. | empty string |
orderBy | String | Optional column to sort the result set. | empty string |
limit | IntOptionParameter | Optional limit of returned records. This limit should be pushed to the source. No value implies that no limit will be applied. | 10 |
queryStrategy | Enum | Query strategy. The strategy decides how the source system is queried. Possible values are: ‘access-complete-table’ and ‘query’. | access-complete-table |
writeStrategy | Enum | Write strategy. If this dataset is a sink, it can be selected if data is overwritten or appended. Possible values are: ‘update-table’ and ‘overwrite-table’ | default |
clearTableBeforeExecution | boolean | If set to true this will clear the specified table before executing a workflow that writes to it. | false |
user | String | Username. Must be empty in some cases e.g. if secret key and client id are used | empty string |
password | PasswordParameter | Password. Can be empty in some cases e.g. if secret key and client id are used | |
tokenEndpoint | String | URL for retrieving tokens, when using MS SQL Active Directory token based authentication. Can be found in the Azure AD Admin Center under OAuth2 endpoint or cab be constructed with the general endpoint URL combined with the tenant id and the suffix /outh/v2/authortized. | empty string |
spnName | String | Service Principal Name identifying the resource. Usually a static URL like https://database.windows.net. | empty string |
clientId | String | Client id or application id. Client id used for MS SQL token based authentication. String seperated by - char. | empty string |
clientSecret | PasswordParameter | Client secret. Client secret used for MS SQL token based authentication. Can be generated in Azure AD admin center. | |
restriction | String | An SQL WHERE clause to filter the records to be retrieved. | empty string |
retries | int | Optional number of retries per query | 0 |
pause | int | Optional pause between queries in ms. | 2000 |
charset | String | The source internal encoding, e.g., UTF-8, ISO-8859-1 | UTF-8 |
forceSparkExecution | boolean | If set to true, Spark will be used for querying the database, even if the local execution manager is configured. | false |
The identifier for this plugin is Jdbc
.
It can be found in the package com.eccenca.di.sql.jdbc
.
General usage
The JDBC dataset supports connections to Hive, Microsoft SQL Server, MySQL, Oracle Database, DB2 and PostgreSQL databases. A login and password and JDBC URL need to be provided. This dataset supports queries or simply schema and table names to define what should be retrieved from a source DB. If the dataset is used as a sink, queries are ignored and only schema and table parameters are used. If the dataset is used as a sink for a hierarchical mapping it behaves similar to the SqlEndpoint: One table is generated per entity type.
The names of the written tables are generated as follows:
- The table name of the root mapping is defined by the table parameter of the dataset. If the table name is empty, a name is generated from the first type of the mapping. Special characters are removed and the name shortened to maximum of 128 characters.
- For each object mapping, the table name is generated from its type.
JDBC Connnection Strings/URLs
Most of the dataset prameters are directly forwrded to the respective driver. Please make sure to use the correct syntax for each DBS as rather unintuitive errors might occur otherwise.
Here are templates for supported database systems:
oracle (external driver needed):
jdbc:oracle:thin:@{host}[:{port}]/{database}
postgres (integrated):
jdbc:postgresql://{host}[:{port}]/[{database}]
MySQL/MAriaDB (integrated):
jdbc:{mysql|mariadb}://{host}[:{port}]/[{database}]
SnowSQL (external driver needed):
jdbc:snowflake://}AWSAccount}.{AWS region}.snowflakecomputing.com?db={database}&schema={schema}
MSSqlServer (integrated):
jdbc:sqlserver://{host}[:{port}];databaseName={database}
H2 (integrated):
jdbc:h2:{file} or jdbc:h2:tcp://{host}:[{port}][/{database}]
DB2 (external driver needed):
jdbc:db2//{host}[:{port}]/{database}
Read and write strategies
There are multiple read and write strategies which can be selected depending on the purpose of the dataset in a workflow.
Read strategies decide how the database is queried:
- full-table: Queries or wraps a complete table. Only the DB schema and table name need to be set
- query: The given source query is passed along to the database. The table name is not necessary in this case but a valid query in the SQL-dialect of the source database system must be provided.
Write strategies decide how a new table is written:
- default: An error will occur if the table exists. If not a new one will be created.
- overwrite: The old table will be removed and a new one will be created.
- append: Data will be appended to the existing table. The schema of the data written has to be the same as the existing table schema.
Optimized Writing
Usually specific database systems have custom commands for loading large amounts of data, e.g. from a CSV file into a database table. For some DBMS and specific JDBC dataset configurations we support these optimized methods of loading data.
Supported DBMS:
- MySQL and MariaDB (full support for versions 8.0.19+ and 10.4+, resp.):
- if older DBMS versions are used some dataset options like ‘groupBy’ might not be supported but equivalent queries will
- the same is true when older driver jars then the one provided by eccenca are used
- both use the MariaDB JDBC driver
- uses
LOAD DATA LOCAL INFILE
internally - only applies when appending data to an existing table and having
Force Spark Execution
disabled - Both the server parameter
local_infile
and the client parameterallowLoadLocalInfile
must be enabled, e.g. by addingallowLoadLocalInfile=true
to the JDBC URL. For MySQL starting with version 8 thelocal_infile
parameter is by default disabled!
Registering JDBC drivers
More 3rd party databases are supported via adding their JDBC drivers to the classpath of Data Integration. Drivers are usually provided by the database manufactures.
If 32 bit and 64 bit versions are provided the latter is usually needed and should aways equal the bit-level of the JVM.
To make sure that the drivers are loaded correctly their class name (in case are jar contains multiple drivers) and location in the file system can be
set with the spark.sql.options.jdbc option in the dataintegration.conf
configuration file.
An example for adding both the DB2 and MySQL drivers to Data Integration configuration file spark.sql.options.*
section:
spark.sql.options {
...
# List of database identifiers to specify user provided JDBC drivers. The second part of the protocol of a JDBC URI (e.g. db2 from
# jdbc:db2://host:port) is used to specify the driver. For each protocol on the list a jar classname and optional download
# location can be provided.
jdbc.drivers = "db2,mysql"
# Some database systems use licenses that are to loose or restrictive for us to ship the drivers. Therefore a path
# to a jar file containing the driver and the name of driver can be specified here.
jdbc.db2.jar = "/home/user/Jars/db2jcc-db2jcc4.jar"
jdbc.mysql.jar = "/home/user/drivers/mysql.jar"
# Name of the actual driver class for each db
jdbc.db2.name = "com.ibm.db2.jcc.DB2Driver"
jdbc.mysql.name = "com.mysql.jdbc.Driver"
}
Recommended DBMS versions:
Microsoft SQL Server 2017: Older versions might work, but do not support the groupBy
parameter.
PostgreSQL 9.5: The groupBy
parameter needs at least version 8.4.
MySQL v8.0.19: Older versions do not support the groupBy
parameter.
DB2 v11.5.x: The groupBy
feature needs at least version 9.7 to function.
Oracle 12.2.x: The groupBy
feature does not work for versions prior to 11g Release 2.
These limitations are the same for JDBC drivers that are older than the fully supported databases.
Queries can achieve a similar outcome if groupBy
is not supported.
Excel (Google Drive)¤
Read data from a remote Google Spreadsheet.
Parameter | Type | Description | Default |
---|---|---|---|
url | String | Link to the document (‘share with anyone having a link’ must be enabled, URL parameters will be removed and corrected automatically). | no default |
streaming | boolean | Streaming enables reading and writing large Excels files. Warning: Be careful to disable streaming for large datasets (> 10MB), because of high memory consumption. | true |
invalidateCacheAfter | Duration | Duration until file based cache is invalidated. | PT5M |
linesToSkip | int | The number of lines to skip in the beginning when reading files. | 0 |
The identifier for this plugin is googlespreadsheet
.
It can be found in the package com.eccenca.di.gdrive
.
The dataset needs the document id of a “share via url” sheet on Google Drive as input. It will automatically correct the URL and add the “export as xlsx” option to a new URL that will be used to download an Excel Spreadsheet. The download will be cached and treated the same way as an xlsx file in the Excel Dataset.
Caching¤
The advanced parameter invalidateCacheAfter
allows the user to specify a duration of the file cache
after which it is refreshed.
A file based cache is created to avoid CAPTCHAs. During the caching and validation of the URL
access occurs with random wait times between 1 and 5 seconds.
The cache is invalidated after 5 minutes by default.
Neo4j¤
Neo4j graph
Parameter | Type | Description | Default |
---|---|---|---|
uri | String | The URL to the Neo4j instance | bolt://localhost:7687 |
user | String | The Neo4j username for basic authentication. | user |
password | PasswordParameter | The Neo4j password for basic authentication. | PASSWORD_PARAMETER:7LtZjhIrbTu9wze0gA4hPg== |
nodeLabel | String | Neo4j label for all entities to be covered by this dataset. When reading, all nodes with this label will be read. When writing, this label will be added to all generated nodes. If the dataset is cleared, only nodes with this label will be deleted. | Any |
clearBeforeExecution | boolean | If set to true, all nodes with the specified label will be removed, before executing a workflow that writes to this graph. | false |
The identifier for this plugin is neo4j
.
It can be found in the package com.eccenca.di.plugins.neo4j
.
Supports reading and writing Neo4j graphs. The following sections outline how graphs are generated and read back.
For more information about Neo4j, please refer to the Neo4j documentation.
Nodes¤
For each entity that is written to a Neo4j dataset, a node will be created.
A property uri
will be added to each generated node, which holds the URI of the original entity.
In applications, the URI property should be used instead of the node identifiers, which are auto-generated in Neo4j and do not represent stable URIs.
When reading nodes, the entity URIs will be generated based on that property.
At the moment, it’s not supported to read nodes that do not provide a uri
property.
Labels¤
Labels in Neo4j are used to group nodes into sets where all nodes that have a certain label belongs to the same set. Neo4j labels are comparable with classes in RDF (not to be confused with labels in RDF).
When writing entities to the Neo4j dataset, the following labels will be added to each generated node:
- For each entity type (such as the type set in a mapping), a label will be added to the node in Neo4j. Since types in eccenca DataIntegration are usually URIs, they will be converted according to the rules further down.
- The label as configured by the label parameter on the Neo4j dataset itself. This is typically used to identify all entities that have been written by a certain Neo4j dataset specification in the project. For instance, if two Neo4j dataset specifications are added to a project - both writing to the same Neo4j database - different labels can be set to distinguish both sets of entities. In that respect it may be used to model a similar concept as graphs in RDF.
Relationships¤
A relationship connects two nodes in Neo4j. Hierarchical mappings will generate relationships for all object mappings.
Relationships can be addressed with property paths in mappings. At the moment, only paths of length 1 are supported, i.e., it’s not possible to use non-property paths.
Handling of URIs¤
In eccenca DataIntegration, URIs are typically used to uniquely identify classes and properties. While URIs are central in RDF, Neo4j does allow arbitrary names and does not have any special support for URIs.
When generating Neo4j labels, properties and relationships, URIs will be shortened according to the following rules.
- If a registered project prefix matches a URI, a name {prefixName}_{localPart}
will be generated. For instance, http://xmlns.com/foaf/0.1/name
will become foaf_name
.
Note that underscores (_
) are used instead of colons (:
) to separate the namespace and the local name.
The reason is that colons are reserved in the Cypher query language and some tools don’t escape properly and fail on databases that use colons in names.
- If no project prefix matches a URI, the URI will be used verbatim. This will look ugly in Neo4j tools, so generally it’s recommended to define prefixes for all used namespaces.
When reading generated entities, the URIs of the classes and properties will be reconstructed based on the prefix table of the project. If the prefixes change between writing and reading, different URIs will be generated.
RDF vs. Neo4j terminology¤
Neo4j uses a different terminology than RDF or description logic. For users familiar with RDF, the following table shows the correspondent terms for some central concepts. This is meant to help understanding and does not aim to provide a precise mapping as there are semantic differences between Neo4j and RDF.
RDF | Neo4j |
---|---|
resource | node |
class | label |
datatype property | property |
object property | relationship |
graph | Do not exist in Neo4j, but labels can be used to mimic graphs. |
Excel (OneDrive, Office365)¤
Read data from a remote onedrive or Office365 Spreadsheet.
Parameter | Type | Description | Default |
---|---|---|---|
url | String | Link to the document (‘share with anyone having a link’ must be enabled). | no default |
streaming | boolean | Streaming enables reading and writing large Excels files. Warning: Be careful to disable streaming for large datasets (> 10MB), because of high memory consumption. | true |
invalidateCacheAfter | Duration | Duration until file based cache is invalidated. | PT5M |
linesToSkip | int | The number of lines to skip in the beginning when reading files. | 0 |
The identifier for this plugin is office365preadsheet
.
It can be found in the package com.eccenca.di.office365
.
The dataset needs the URL of a “share via link” sheet on Office 365/OneDrive as input. It will automatically construct a direct download URL, cache the download file handle it like an XLSX file in the Excel Dataset.
Notes¤
There are 2 types of URLs that can be shared:
- Onedrive links look like
https://1drv.ms/x/s!AucULvzmJ-dsdfsfgaIcyWP_XY_G4w?e=yx65uu
- Onedrive (based one sharepoint, for businesses) links look like
https://eccencagmbh-my.sharepoint.com/:x:/g/personal/person_eccenca_com/EdEMTEw1dclHiEZXyvy8P4YBit8wSyGsiwU5Kt__sQOZzw
The first type should always work as input for this dataset. The second type requires to set up an application in Azure Active Directory. Instructions can be found here: https://github.com/Azure-Samples/ms-identity-msal-java-samples/tree/main/4.%20Spring%20Framework%20Web%20App%20Tutorial/3-Authorization-II/protect-web-api#register-the-service-app-java-spring-resource-api
After following the steps access to sharepoint can be setup in the application.conf file for eccenca DataIntegration.
Example:
com.eccenca.di.office365 = {
authority = "https://login.microsoftonline.com/a0907dd1-f981-4c98-a8b9-1deb27bcf2cc/"
clientId = "4d14959d-3c62-4f90-a072-a96ca4b3fa9f"
secret = "Ceb8Q~QkMMV7TBK-ggB3nh22nUnqoDB1KTmkjj"
scope = "https://graph.microsoft.com/.default"
tenantId = "a0907dd1-f981-4c98-a8b9-1deb27bcf2cc"
}
Caching¤
The advanced parameter invalidateCacheAfter
allows the user to specify a duration of the file cache
after which it is refreshed.
A file based cache is created to avoid CAPTCHAs. During the caching and validation of the URL
access occurs with random wait times between 1 and 5 seconds.
The cache is invalidated after 5 minutes by default.
SPARQL endpoint¤
Connect to an existing SPARQL endpoint.
Parameter | Type | Description | Default |
---|---|---|---|
endpointURI | String | The URI of the SPARQL endpoint, e.g., http://dbpedia.org/sparql | no default |
login | String | Login required for authentication | null |
password | PasswordParameter | Password required for authentication | |
graph | String | Only retrieve entities from a specific graph | null |
pageSize | int | The number of solutions to be retrieved per SPARQL query. | 1000 |
entityList | MultilineStringParameter | A list of entities to be retrieved. If not given, all entities will be retrieved. Multiple entities are separated by whitespace. | |
pauseTime | int | The number of milliseconds to wait between subsequent query | 0 |
retryCount | int | The number of retries if a query fails | 3 |
retryPause | int | The number of milliseconds to wait until a failed query is retried. | 1000 |
queryParameters | String | Additional parameters to be appended to every request e.g. &soft-limit=1 | empty string |
strategy | Enum | The strategy use for retrieving entities: simple: Retrieve all entities using a single query; subQuery: Use a single query, but wrap it for improving the performance on Virtuoso; parallel: Use a separate Query for each entity property. | parallel |
useOrderBy | boolean | Include useOrderBy in queries to enforce correct order of values. | true |
clearGraphBeforeExecution | boolean | If set to true this will clear the specified graph before executing a workflow that writes to it. | false |
sparqlTimeout | int | SPARQL query timeout (select/update) in milliseconds. A value of zero means that the timeout configured via property is used (e.g. configured via silk.remoteSparqlEndpoint.defaults.read.timeout.ms). To overwrite the configured value specify a value greater than zero. | 0 |
The identifier for this plugin is sparqlEndpoint
.
It can be found in the package org.silkframework.plugins.dataset.rdf.datasets
.
Distance Measures¤
The following distance measures are available:
Characterbased¤
Character-based distance measures compare strings on the character level. They are well suited for handling typographical errors.
Is substring¤
Checks if a source value is a substring of a target value.
Parameter | Type | Description | Default |
---|---|---|---|
reverse | boolean | Reverse source and target inputs | false |
The identifier for this plugin is isSubstring
.
It can be found in the package org.silkframework.rule.plugins.distance.characterbased
.
Jaro distance¤
String similarity based on the Jaro distance metric.
This plugin does not require any parameters.
The identifier for this plugin is jaro
.
It can be found in the package org.silkframework.rule.plugins.distance.characterbased
.
Jaro-Winkler distance¤
String similarity based on the Jaro-Winkler distance measure.
This plugin does not require any parameters.
The identifier for this plugin is jaroWinkler
.
It can be found in the package org.silkframework.rule.plugins.distance.characterbased
.
Normalized Levenshtein distance¤
Normalized Levenshtein distance.
Parameter | Type | Description | Default |
---|---|---|---|
minChar | char | The minimum character that is used for indexing | 0 |
maxChar | char | The maximum character that is used for indexing | z |
The identifier for this plugin is levenshtein
.
It can be found in the package org.silkframework.rule.plugins.distance.characterbased
.
Levenshtein distance¤
Levenshtein distance. Returns a distance value between zero and the size of the string.
Parameter | Type | Description | Default |
---|---|---|---|
minChar | char | The minimum character that is used for indexing | 0 |
maxChar | char | The maximum character that is used for indexing | z |
The identifier for this plugin is levenshteinDistance
.
It can be found in the package org.silkframework.rule.plugins.distance.characterbased
.
qGrams¤
String similarity based on q-grams (by default q=2).
Parameter | Type | Description | Default |
---|---|---|---|
q | int | No description | 2 |
minChar | char | No description | 0 |
maxChar | char | No description | z |
The identifier for this plugin is qGrams
.
It can be found in the package org.silkframework.rule.plugins.distance.characterbased
.
Starts with¤
Returns success if the first string starts with the second string, failure otherwise.
Parameter | Type | Description | Default |
---|---|---|---|
reverse | boolean | Reverse source and target values | false |
minLength | int | The minimum length of the string being contained. | 2 |
maxLength | int | The potential maximum length of the strings that must match. If the max length is greater than the length of the string to match, the full string must match. | 2147483647 |
The identifier for this plugin is startsWith
.
It can be found in the package org.silkframework.rule.plugins.distance.characterbased
.
Substring comparison¤
Return 0 to 1 for strong similarity to weak similarity. Based on the paper: Stoilos, Giorgos, Giorgos Stamou, and Stefanos Kollias. “A string metric for ontology alignment.” The Semantic Web-ISWC 2005. Springer Berlin Heidelberg, 2005. 624-637.
Parameter | Type | Description | Default |
---|---|---|---|
granularity | String | The minimum length of a possible substring match. | 3 |
The identifier for this plugin is substringDistance
.
It can be found in the package org.silkframework.rule.plugins.distance.characterbased
.
Equality¤
Constant¤
Always returns a constant similarity value.
Parameter | Type | Description | Default |
---|---|---|---|
value | double | No description | 1.0 |
The identifier for this plugin is constantDistance
.
It can be found in the package org.silkframework.rule.plugins.distance.equality
.
String equality¤
Checks for equality of the string representation of the given values. Returns success if string values are equal, failure otherwise. For a numeric comparison of values use the ‘Numeric Equality’ comparator.
This plugin does not require any parameters.
The identifier for this plugin is equality
.
It can be found in the package org.silkframework.rule.plugins.distance.equality
.
Greater than¤
Checks if the source value is greater than the target value.
Parameter | Type | Description | Default |
---|---|---|---|
orEqual | boolean | Accept equal values | false |
order | Enum | Per default, if both strings are numbers, numerical order is used for comparison. Otherwise, alphanumerical order is used. Choose a more specific order for improved performance. | Autodetect |
reverse | boolean | Reverse source and target inputs | false |
The identifier for this plugin is greaterThan
.
It can be found in the package org.silkframework.rule.plugins.distance.equality
.
Inequality¤
Returns success if values are not equal, failure otherwise.
This plugin does not require any parameters.
The identifier for this plugin is inequality
.
It can be found in the package org.silkframework.rule.plugins.distance.equality
.
Lower than¤
Checks if the source value is lower than the target value.
Parameter | Type | Description | Default |
---|---|---|---|
orEqual | boolean | Accept equal values | false |
order | Enum | Per default, if both strings are numbers, numerical order is used for comparison. Otherwise, alphanumerical order is used. Choose a more specific order for improved performance. | Autodetect |
reverse | boolean | Reverse source and target inputs | false |
The identifier for this plugin is lowerThan
.
It can be found in the package org.silkframework.rule.plugins.distance.equality
.
Numeric equality¤
Compares values numerically instead of their string representation as the ‘String Equality’ operator does. Allows to set the needed precision of the comparison. A value of 0.0 means that the values must represent exactly the same (floating point) value, values higher than that allow for a margin of tolerance. Example: With a precision of 0.1, the following pairs of values will be considered equal: (1.3, 1.35), (0.0, 0.9999), (0.0, -0.90001), but following pairs will NOT match: (1.2, 1.30001), (1.0, 1.10001), (1.0, 0.89999).
Parameter | Type | Description | Default |
---|---|---|---|
precision | double | The range of tolerance in floating point number comparisons. Must be 0 or a non-negative number smaller than 1. | 0.0 |
The identifier for this plugin is numericEquality
.
It can be found in the package org.silkframework.rule.plugins.distance.equality
.
Relaxed equality¤
Return success if strings are equal, failure otherwise. Lower/upper case and differences like ö/o, n/ñ, c/ç etc. are treated as equal.
This plugin does not require any parameters.
The identifier for this plugin is relaxedEquality
.
It can be found in the package org.silkframework.rule.plugins.distance.equality
.
Language¤
CJK reading distance¤
CJK Reading Distance.
Parameter | Type | Description | Default |
---|---|---|---|
minChar | char | No description | 0 |
maxChar | char | No description | z |
The identifier for this plugin is cjkReadingDistance
.
It can be found in the package org.silkframework.rule.plugins.distance.asian
.
Korean phoneme distance¤
Korean phoneme distance.
Parameter | Type | Description | Default |
---|---|---|---|
minChar | char | No description | 0 |
maxChar | char | No description | z |
The identifier for this plugin is koreanPhonemeDistance
.
It can be found in the package org.silkframework.rule.plugins.distance.asian
.
Korean translit distance¤
Transliterated Korean distance.
Parameter | Type | Description | Default |
---|---|---|---|
minChar | char | No description | 0 |
maxChar | char | No description | z |
The identifier for this plugin is koreanTranslitDistance
.
It can be found in the package org.silkframework.rule.plugins.distance.asian
.
Numeric¤
Compare physical quantities¤
Computes the distance between two physical quantities. The distance is normalized to the SI base unit of the dimension. For instance for lengths, the distance will be in metres. Comparing incompatible units will yield a validation error.
Parameter | Type | Description | Default |
---|---|---|---|
numberFormat | String | The IETF BCP 47 language tag, e.g., ‘en’. | en |
The identifier for this plugin is PhysicalQuantitiesDistance
.
It can be found in the package com.eccenca.di.measure
.
SI units and common derived units are supported. The following section lists all supported units. By default, all quantities are normalized to their base unit. For instance, lengths will be normalized to metres.
Time
Time is expressed in seconds (s). The following alternative units are supported: mo_s, mo_g, a, min, a_g, mo, mo_j, a_j, h, a_t, d.
Length
Length is expressed in metres (m). The following alternative units are supported: in, nmi, Ao, mil, yd, AU, ft, pc, fth, mi, hd.
Mass
Mass is expressed in kilograms (kg). The following alternative units are supported: lb, ston, t, stone, u, gr, lcwt, oz, g, scwt, dr, lton.
Electric current
Electric current is expressed in amperes (A). The following alternative units are supported: Bi, Gb.
Temperature
Temperature is expressed in kelvins (K). The following alternative units are supported: Cel.
Amount of substance
Amount of substance is expressed in moles (mol).
Luminous intensity
Luminous intensity is expressed in candelas (cd).
Area
Area is expressed in square metres (m²). The following alternative units are supported: m2, ar, syd, cml, b, sft, sin.
Volume
Volume is expressed in cubic metres (㎥). The following alternative units are supported: st, bf, cyd, cr, L, l, cin, cft, m3.
Energy
Energy is expressed in joules (J). The following alternative units are supported: cal_IT, eV, cal_m, cal, cal_th.
Angle
Angle is expressed in radians (rad). The following alternative units are supported: circ, gon, deg, ‘, ‘’.
Others
- 1/m, derived units: Ky
- kg/(m·s), derived units: P
- bit/s, derived units: Bd
- bit, derived units: By
- Sv
- N
- Ω, derived units: Ohm
- T, derived units: G
- sr, derived units: sph
- F
- C/kg, derived units: R
- cd/m², derived units: sb, Lmb
- Pa, derived units: bar, atm
- kg/(m·s²), derived units: att
- m²/s, derived units: St
- A/m, derived units: Oe
- kg·m²/s², derived units: erg
- kg/m³, derived units: g%
- mho
- V
- lx, derived units: ph
- m/s², derived units: Gal, m/s2
- m/s, derived units: kn
- m·kg/s², derived units: gf, lbf, dyn
- m²/s², derived units: RAD, REM
- C
- Gy
- Hz
- H
- lm
- W
- Wb, derived units: Mx
- Bq, derived units: Ci
- S
Date¤
The distance in days between two dates (‘YYYY-MM-DD’ format).
Parameter | Type | Description | Default |
---|---|---|---|
requireMonthAndDay | boolean | If true, no distance value will be generated if months or days are missing (e.g., 2019-11). If false, missing month or day fields will default to 1. | false |
The identifier for this plugin is date
.
It can be found in the package org.silkframework.rule.plugins.distance.numeric
.
DateTime¤
Distance between two date time values (xsd:dateTime format) in seconds.
This plugin does not require any parameters.
The identifier for this plugin is dateTime
.
It can be found in the package org.silkframework.rule.plugins.distance.numeric
.
Inside numeric interval¤
Checks if a number is contained inside a numeric interval, such as ‘1900 - 2000’.
Parameter | Type | Description | Default |
---|---|---|---|
separator | String | No description | — |
The identifier for this plugin is insideNumericInterval
.
It can be found in the package org.silkframework.rule.plugins.distance.numeric
.
Numeric similarity¤
Computes the numeric distance between two numbers.
Parameter | Type | Description | Default |
---|---|---|---|
minValue | double | No description | -Infinity |
maxValue | double | No description | Infinity |
The identifier for this plugin is num
.
It can be found in the package org.silkframework.rule.plugins.distance.numeric
.
Geographical distance¤
Computes the geographical distance between two points. Author: Konrad Höffner (MOLE subgroup of Research Group AKSW, University of Leipzig)
Parameter | Type | Description | Default |
---|---|---|---|
unit | String | No description | km |
The identifier for this plugin is wgs84
.
It can be found in the package org.silkframework.rule.plugins.distance.numeric
.
Tokenbased¤
While character-based distance measures work well for typographical errors, there are a number of tasks where token-base distance measures are better suited:
- Strings where parts are reordered e.g. “John Doe” and “Doe, John”
- Texts consisting of multiple words
Cosine¤
Cosine Distance Measure.
Parameter | Type | Description | Default |
---|---|---|---|
k | int | No description | 3 |
The identifier for this plugin is cosine
.
It can be found in the package org.silkframework.rule.plugins.distance.tokenbased
.
Dice coefficient¤
Dice similarity coefficient.
This plugin does not require any parameters.
The identifier for this plugin is dice
.
It can be found in the package org.silkframework.rule.plugins.distance.tokenbased
.
Jaccard¤
Jaccard similarity coefficient.
This plugin does not require any parameters.
The identifier for this plugin is jaccard
.
It can be found in the package org.silkframework.rule.plugins.distance.tokenbased
.
Soft Jaccard¤
Soft Jaccard similarity coefficient. Same as Jaccard distance but values within an levenhstein distance of ‘maxDistance’ are considered equivalent.
Parameter | Type | Description | Default |
---|---|---|---|
maxDistance | int | No description | 1 |
The identifier for this plugin is softjaccard
.
It can be found in the package org.silkframework.rule.plugins.distance.tokenbased
.
Token-wise distance¤
Token-wise string distance using the specified metric.
Parameter | Type | Description | Default |
---|---|---|---|
ignoreCase | boolean | No description | true |
metricName | String | No description | levenshtein |
splitRegex | String | No description | [\s\d\p{Punct}]+ |
stopwords | String | No description | empty string |
stopwordWeight | double | No description | 0.01 |
nonStopwordWeight | double | No description | 0.1 |
useIncrementalIdfWeights | boolean | No description | false |
matchThreshold | double | No description | 0.0 |
orderingImpact | double | No description | 0.0 |
adjustByTokenLength | boolean | No description | false |
The identifier for this plugin is tokenwiseDistance
.
It can be found in the package org.silkframework.rule.plugins.distance.tokenbased
.
Transformations¤
The following transform and normalization functions are available:
Combine¤
Concatenate¤
Concatenates strings from multiple inputs.
Parameter | Type | Description | Default |
---|---|---|---|
glue | String | Separator to be inserted between two concatenated strings. | empty string |
missingValuesAsEmptyStrings | boolean | Handle missing values as empty strings. | false |
The identifier for this plugin is concat
.
It can be found in the package org.silkframework.rule.plugins.transformer.combine
.
Examples
Returns [] for parameters [] and input values [].
Returns [a] for parameters [] and input values [[a]].
Returns [ab] for parameters [] and input values [[a], [b]].
Returns [First-Last] for parameters [glue -> -] and input values [[First], [Last]].
Returns [First-Second, First-Third] for parameters [glue -> -] and input values [[First], [Second, Third]].
Returns [First–Second] for parameters [glue -> -] and input values [[First], [], [Second]].
Returns [] for parameters [glue -> -] and input values [[First], [], [Second]].
Returns [First–Second] for parameters [glue -> -, missingValuesAsEmptyStrings -> true] and input values [[First], [], [Second]].
Concatenate multiple values¤
Concatenates multiple values received for an input. If applied to multiple inputs, yields at most one value per input. Optionally removes duplicate values.
Parameter | Type | Description | Default |
---|---|---|---|
glue | String | No description | empty string |
removeDuplicates | boolean | No description | false |
The identifier for this plugin is concatMultiValues
.
It can be found in the package org.silkframework.rule.plugins.transformer.combine
.
Examples
Returns [] for parameters [] and input values [].
Returns [a] for parameters [] and input values [[a]].
Returns [ab] for parameters [] and input values [[a, b]].
Returns [axb] for parameters [glue -> x] and input values [[a, b]].
Returns [ab, 12] for parameters [] and input values [[a, b], [1, 2]].
Merge¤
Merges the values of all inputs.
This plugin does not require any parameters.
The identifier for this plugin is merge
.
It can be found in the package org.silkframework.rule.plugins.transformer.combine
.
Examples
Returns [] for parameters [] and input values [].
Returns [a, b, c] for parameters [] and input values [[a, b], [c]].
Conditional¤
Contains all of¤
Accepts two inputs. If the first input contains all of the second input values it returns ‘true’, else ‘false’ is returned.
This plugin does not require any parameters.
The identifier for this plugin is containsAllOf
.
It can be found in the package org.silkframework.rule.plugins.transformer.conditional
.
Examples
Returns [true] for parameters [] and input values [[A, B, C], [A, B]].
Returns [false] for parameters [] and input values [[A, B, C], [A, D]].
Returns [false] for parameters [] and input values [[A, B, C], [D]].
Returns [true] for parameters [] and input values [[A, B, C], [A, B, C]].
Fails validation and thus returns [] for parameters [] and input values [[A, B, C], []].
Fails validation and thus returns [] for parameters [] and input values [[A], [A], [A]].
Fails validation and thus returns [] for parameters [] and input values [[A]].
Contains any of¤
Accepts two inputs. If the first input contains any of the second input values it returns ‘true’, else ‘false’ is returned.
This plugin does not require any parameters.
The identifier for this plugin is containsAnyOf
.
It can be found in the package org.silkframework.rule.plugins.transformer.conditional
.
Examples
Returns [true] for parameters [] and input values [[A, B, C], [A, B]].
Returns [true] for parameters [] and input values [[A, B, C], [A, D]].
Returns [false] for parameters [] and input values [[A, B, C], [D]].
Returns [true] for parameters [] and input values [[A, B, C], [A, B, C]].
Fails validation and thus returns [] for parameters [] and input values [[A, B, C], []].
Fails validation and thus returns [] for parameters [] and input values [[A], [A], [A]].
Fails validation and thus returns [] for parameters [] and input values [[A]].
If contains¤
Accepts two or three inputs. If the first input contains the given value, the second input is forwarded. Otherwise, the third input is forwarded (if present).
Parameter | Type | Description | Default |
---|---|---|---|
search | String | No description | no default |
The identifier for this plugin is ifContains
.
It can be found in the package org.silkframework.rule.plugins.transformer.conditional
.
Examples
Returns [this is a match] for parameters [search -> match] and input values [[matching string], [this is a match]].
Returns [] for parameters [search -> match] and input values [[different string], [this is a match]].
Returns [this is no match] for parameters [search -> match] and input values [[different string], [this is a match], [this is no match]].
If exists¤
Accepts two or three inputs. If the first input provides a value, the second input is forwarded. Otherwise, the third input is forwarded (if present).
This plugin does not require any parameters.
The identifier for this plugin is ifExists
.
It can be found in the package org.silkframework.rule.plugins.transformer.conditional
.
Examples
Returns [yes] for parameters [] and input values [[value], [yes], [no]].
Returns [no] for parameters [] and input values [[], [yes], [no]].
Returns [] for parameters [] and input values [[value], []].
If matches regex¤
Accepts two or three inputs.
If any value of the first input matches the regex, the second input is forwarded.
Otherwise, the third input is forwarded (if present).
Parameter | Type | Description | Default |
---|---|---|---|
regex | String | No description | no default |
negate | boolean | No description | false |
The identifier for this plugin is ifMatchesRegex
.
It can be found in the package org.silkframework.rule.plugins.transformer.conditional
.
Negate binary (NOT)¤
Accepts one input, which is either ‘true’, ‘1’ or ‘false’, ‘0’ and negates it.
This plugin does not require any parameters.
The identifier for this plugin is negateTransformer
.
It can be found in the package org.silkframework.rule.plugins.transformer.conditional
.
Examples
Returns [1, 0, true, false, true, false] for parameters [] and input values [[0, 1, false, true, False, True]].
Fails validation and thus returns [] for parameters [] and input values [[falsee, true]].
Fails validation and thus returns [] for parameters [] and input values [[]].
Conversion¤
Convert charset¤
Convert the string from “sourceCharset” to “targetCharset”.
Parameter | Type | Description | Default |
---|---|---|---|
sourceCharset | String | No description | ISO-8859-1 |
targetCharset | String | No description | UTF-8 |
The identifier for this plugin is convertCharset
.
It can be found in the package org.silkframework.rule.plugins.transformer.conversion
.
Clean HTML¤
Cleans HTML using a tag white list and allows selection of HTML sections with xPath or cssSelector expressions. If the tag or attribute white lists are left empty default white lists will be used. The operator takes two inputs: the page HTML and (optional) the page Url which may be needed to resolve relative links in the page HTML.
Parameter | Type | Description | Default |
---|---|---|---|
tagWhiteList | String | Tags to keep in the cleaned Text (or reference to a configuration). | empty string |
attributeWhiteList | String | Tags to keep in the cleaned Text (or reference to a configuration). | empty string |
selectors | MultilineStringParameter | CSS or XPath queries for selection of content (or reference to a configuration). Comma separated. CssSelectors can be pipe separated for non-sequential execution. | no default |
method | Enum | Selects use of xPath or css selectors (‘xPath’ or ‘cssSelectors’). | xPath |
The identifier for this plugin is htmlCleaner
.
It can be found in the package com.eccenca.di.plugins.html
.
Date¤
Parse date¤
Parses and normalizes dates in different formats.
Parameter | Type | Description | Default |
---|---|---|---|
inputDateFormatId | Enum | The input date/time format used for parsing the date/time string. | w3c Date |
alternativeInputFormat | String | An input format string that should be used instead of the selected input format. Java DateFormat string. | empty string |
outputDateFormatId | Enum | The output date/time format used for parsing the date/time string. | w3c Date |
alternativeOutputFormat | String | An output format string that should be used instead of the selected output format. Java DateFormat string. | empty string |
The identifier for this plugin is DateTypeParser
.
It can be found in the package com.eccenca.di.schema.discovery.parser
.
Examples
Returns [1999-03-20] for parameters [inputDateFormatId -> German style date format, outputDateFormatId -> w3c Date] and input values [[20.03.1999]].
Returns [20.03.1999] for parameters [inputDateFormatId -> w3c Date, outputDateFormatId -> German style date format] and input values [[1999-03-20]].
Returns [2017-04-04] for parameters [inputDateFormatId -> common ISO8601, outputDateFormatId -> w3c Date] and input values [[2017-04-04T00:00:00.000+02:00]].
Returns [2017-04-04] for parameters [inputDateFormatId -> common ISO8601, outputDateFormatId -> w3c Date] and input values [[2017-04-04T00:00:00+02:00]].
Returns [24-Jun-2021 14:50:05 +02:00] for parameters [inputDateFormatId -> common ISO8601, outputDateFormatId -> dateTime with month abbr. (US)] and input values [[2021-06-24T14:50:05.895+02:00]].
Returns [24-Dez.-2021 14:50:05 +02:00] for parameters [inputDateFormatId -> dateTime with month abbr. (US), outputDateFormatId -> dateTime with month abbr. (DE)] and input values [[24-Dec-2021 14:50:05 +02:00]].
Returns [1999-03-20T20:34.44] for parameters [alternativeInputFormat -> dd.MM.yyyy HH:mm.ss, alternativeOutputFormat -> yyyy-MM-dd’T’HH:mm.ss] and input values [[20.03.1999 20:34.44]].
Returns [12:20:00.000] for parameters [inputDateFormatId -> excelDateTime, outputDateFormatId -> xsdTime] and input values [[12:20:00.000]].
Returns [–01] for parameters [inputDateFormatId -> w3c YearMonth, outputDateFormatId -> w3c Month] and input values [[2020-01]].
Returns [—31] for parameters [inputDateFormatId -> w3c MonthDay, outputDateFormatId -> w3c Day] and input values [[–12-31]].
Returns [–12-31] for parameters [inputDateFormatId -> w3c Date, outputDateFormatId -> w3c MonthDay] and input values [[2020-12-31]].
Fails validation and thus returns [] for parameters [inputDateFormatId -> w3c MonthDay, outputDateFormatId -> w3c Date] and input values [[–12-31]].
Returns [2020-02-22T16:34:14] for parameters [alternativeInputFormat -> yyyy-MM-dd HHss.SSS, outputDateFormatId -> w3cDateTime] and input values [[2020-02-22 16:34:14.000]].
Compare dates¤
Compares two dates. Returns 1 if the comparison yields true and 0 otherwise. If there are multiple dates in both sets, the comparator must be true for all dates. For instance, {2014-08-02,2014-08-03} < {2014-08-03} yields 0 as not all dates in the first set are smaller than in the second.
Parameter | Type | Description | Default |
---|---|---|---|
comparator | Enum | No description | < |
The identifier for this plugin is compareDates
.
It can be found in the package org.silkframework.rule.plugins.transformer.date
.
Examples
Returns [1] for parameters [comparator -> <] and input values [[2017-01-01], [2017-01-02]].
Returns [0] for parameters [comparator -> <] and input values [[2017-01-02], [2017-01-01]].
Returns [1] for parameters [comparator -> >] and input values [[2017-01-02], [2017-01-01]].
Returns [0] for parameters [comparator -> >] and input values [[2017-01-01], [2017-01-02]].
Returns [1] for parameters [comparator -> =] and input values [[2017-01-01], [2017-01-01]].
Returns [0] for parameters [comparator -> =] and input values [[2017-01-02], [2017-01-01]].
Current date¤
Outputs the current date.
This plugin does not require any parameters.
The identifier for this plugin is currentDate
.
It can be found in the package org.silkframework.rule.plugins.transformer.date
.
Date to timestamp¤
Convert an xsd:dateTime to a timestamp. Returns the passed time since the Unix Epoch (1970-01-01).
Parameter | Type | Description | Default |
---|---|---|---|
unit | Enum | No description | milliseconds |
The identifier for this plugin is datetoTimestamp
.
It can be found in the package org.silkframework.rule.plugins.transformer.date
.
Examples
Returns [1499117572000] for parameters [] and input values [[2017-07-03T21:32:52Z]].
Returns [1499113972000] for parameters [] and input values [[2017-07-03T21:32:52+01:00]].
Returns [1499113972] for parameters [unit -> seconds] and input values [[2017-07-03T21:32:52+01:00]].
Returns [1499040000000] for parameters [] and input values [[2017-07-03]].
Duration¤
Computes the time difference between two data times.
This plugin does not require any parameters.
The identifier for this plugin is duration
.
It can be found in the package org.silkframework.rule.plugins.transformer.date
.
Duration in days¤
Converts an xsd:duration to days.
This plugin does not require any parameters.
The identifier for this plugin is durationInDays
.
It can be found in the package org.silkframework.rule.plugins.transformer.date
.
Duration in seconds¤
Converts an xsd:duration to seconds.
This plugin does not require any parameters.
The identifier for this plugin is durationInSeconds
.
It can be found in the package org.silkframework.rule.plugins.transformer.date
.
Duration in years¤
Converts an xsd:duration to years.
This plugin does not require any parameters.
The identifier for this plugin is durationInYears
.
It can be found in the package org.silkframework.rule.plugins.transformer.date
.
Number to duration¤
Converts a number to an xsd:duration.
Parameter | Type | Description | Default |
---|---|---|---|
unit | Enum | No description | day |
The identifier for this plugin is numberToDuration
.
It can be found in the package org.silkframework.rule.plugins.transformer.date
.
Parse date pattern¤
Parses a date based on a specified pattern, returning an xsd:date.
Parameter | Type | Description | Default |
---|---|---|---|
format | String | The date pattern used to parse the input values | dd-MM-yyyy |
lenient | boolean | If set to true, the parser tries to use heuristics to parse dates with invalid fields (such as a day of zero). | false |
The identifier for this plugin is parseDate
.
It can be found in the package org.silkframework.rule.plugins.transformer.date
.
Examples
Returns [2015-04-03] for parameters [format -> dd.MM.yyyy] and input values [[03.04.2015]].
Returns [2015-04-03] for parameters [format -> dd.MM.yyyy] and input values [[3.4.2015]].
Returns [2015-04-03] for parameters [format -> yyyyMMdd] and input values [[20150403]].
Fails validation and thus returns [] for parameters [format -> yyyyMMdd, lenient -> false] and input values [[20150000]].
Timestamp to date¤
Convert a timestamp to xsd:date format. Expects an integer that denotes the passed time since the Unix Epoch (1970-01-01)
Parameter | Type | Description | Default |
---|---|---|---|
format | String | Custom output format (e.g., ‘yyyy-MM-dd’). If left empty, a full xsd:dateTime (UTC) is returned. | empty string |
unit | Enum | No description | milliseconds |
The identifier for this plugin is timeToDate
.
It can be found in the package org.silkframework.rule.plugins.transformer.date
.
Examples
Returns [2017-07-03T21:32:52Z] for parameters [] and input values [[1499117572000]].
Returns [2017-07-03] for parameters [format -> yyyy-MM-dd] and input values [[1499040000000]].
Returns [2017-07-03] for parameters [format -> yyyy-MM-dd, unit -> seconds] and input values [[1499040000]].
Validate date after¤
Validates if the first input date is after the second input date. Outputs the first input if the validation is successful.
Parameter | Type | Description | Default |
---|---|---|---|
allowEqual | boolean | Allow both dates to be equal. | false |
The identifier for this plugin is validateDateAfter
.
It can be found in the package org.silkframework.rule.plugins.transformer.validation
.
Examples
Fails validation and thus returns [] for parameters [] and input values [[2015-04-02], [2015-04-03]].
Returns [2015-04-04] for parameters [] and input values [[2015-04-04], [2015-04-03]].
Returns [2015-04-03] for parameters [allowEqual -> true] and input values [[2015-04-03], [2015-04-03]].
Fails validation and thus returns [] for parameters [allowEqual -> false] and input values [[2015-04-03], [2015-04-03]].
Validate date range¤
Validates if dates are within a specified range.
Parameter | Type | Description | Default |
---|---|---|---|
minDate | String | Earliest allowed date in YYYY-MM-DD | no default |
maxDate | String | Latest allowed data in YYYY-MM-DD | no default |
The identifier for this plugin is validateDateRange
.
It can be found in the package org.silkframework.rule.plugins.transformer.validation
.
Validate numeric range¤
Validates if a number is within a specified range.
Parameter | Type | Description | Default |
---|---|---|---|
min | double | Minimum allowed number | no default |
max | double | Maximum allowed number | no default |
The identifier for this plugin is validateNumericRange
.
It can be found in the package org.silkframework.rule.plugins.transformer.validation
.
Excel¤
Abs¤
Excel ABS(number): Returns the absolute value of the given number.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ABS |
The identifier for this plugin is Excel_ABS
.
It can be found in the package com.eccenca.di.excel
.
Acos¤
Excel ACOS(number): Returns the inverse cosine of the given number in radians.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ACOS |
The identifier for this plugin is Excel_ACOS
.
It can be found in the package com.eccenca.di.excel
.
Acosh¤
Excel ACOSH(number): Returns the inverse hyperbolic cosine of the given number in radians.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ACOSH |
The identifier for this plugin is Excel_ACOSH
.
It can be found in the package com.eccenca.di.excel
.
And¤
Excel AND(argument1; argument2 …argument30): Returns TRUE if all the arguments are considered TRUE, and FALSE otherwise.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | AND |
The identifier for this plugin is Excel_AND
.
It can be found in the package com.eccenca.di.excel
.
Asin¤
Excel ASIN(number): Returns the inverse sine of the given number in radians.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ASIN |
The identifier for this plugin is Excel_ASIN
.
It can be found in the package com.eccenca.di.excel
.
Asinh¤
Excel ASINH(number): Returns the inverse hyperbolic sine of the given number in radians.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ASINH |
The identifier for this plugin is Excel_ASINH
.
It can be found in the package com.eccenca.di.excel
.
Atan¤
Excel ATAN(number): Returns the inverse tangent of the given number in radians.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ATAN |
The identifier for this plugin is Excel_ATAN
.
It can be found in the package com.eccenca.di.excel
.
Atan2¤
Excel ATAN2(number_x; number_y): Returns the inverse tangent of the specified x and y coordinates. Number_x is the value for the x coordinate. Number_y is the value for the y coordinate.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ATAN2 |
The identifier for this plugin is Excel_ATAN2
.
It can be found in the package com.eccenca.di.excel
.
Atanh¤
Excel ATANH(number): Returns the inverse hyperbolic tangent of the given number. (Angle is returned in radians.)
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ATANH |
The identifier for this plugin is Excel_ATANH
.
It can be found in the package com.eccenca.di.excel
.
Avedev¤
Excel AVEDEV(number1; number2; … number_30): Returns the average of the absolute deviations of data points from their mean. Displays the diffusion in a data set. Number_1; number_2; … number_30 are values or ranges that represent a sample. Each number can also be replaced by a reference.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | AVEDEV |
The identifier for this plugin is Excel_AVEDEV
.
It can be found in the package com.eccenca.di.excel
.
Average¤
Excel AVERAGE(number_1; number_2; … number_30): Returns the average of the arguments. Number_1; number_2; … number_30 are numerical values or ranges. Text is ignored.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | AVERAGE |
The identifier for this plugin is Excel_AVERAGE
.
It can be found in the package com.eccenca.di.excel
.
Ceiling¤
Excel CEILING(number; significance; mode): Rounds the given number to the nearest integer or multiple of significance. Significance is the value to whose multiple of ten the value is to be rounded up (.01, .1, 1, 10, etc.). Mode is an optional value. If it is indicated and non-zero and if the number and significance are negative, rounding up is carried out based on that value.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | CEILING |
The identifier for this plugin is Excel_CEILING
.
It can be found in the package com.eccenca.di.excel
.
Choose¤
Excel CHOOSE(index; value1; … value30): Uses an index to return a value from a list of up to 30 values. Index is a reference or number between 1 and 30 indicating which value is to be taken from the list. Value1; … value30 is the list of values entered as a reference to a cell or as individual values.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | CHOOSE |
The identifier for this plugin is Excel_CHOOSE
.
It can be found in the package com.eccenca.di.excel
.
Clean¤
Excel CLEAN(text): Removes all non-printing characters from the string. Text refers to the text from which to remove all non-printable characters.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | CLEAN |
The identifier for this plugin is Excel_CLEAN
.
It can be found in the package com.eccenca.di.excel
.
Code¤
Excel CODE(text): Returns a numeric code for the first character in a text string. Text is the text for which the code of the first character is to be found.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | CODE |
The identifier for this plugin is Excel_CODE
.
It can be found in the package com.eccenca.di.excel
.
Combin¤
Excel COMBIN(count_1; count_2): Returns the number of combinations for a given number of objects. Count_1 is the total number of elements. Count_2 is the selected count from the elements. This is the same as the nCr function on a calculator.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | COMBIN |
The identifier for this plugin is Excel_COMBIN
.
It can be found in the package com.eccenca.di.excel
.
Cos¤
Excel COS(number): Returns the cosine of the given number (angle in radians).
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | COS |
The identifier for this plugin is Excel_COS
.
It can be found in the package com.eccenca.di.excel
.
Cosh¤
Excel COSH(number): Returns the hyperbolic cosine of the given number (angle in radians).
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | COSH |
The identifier for this plugin is Excel_COSH
.
It can be found in the package com.eccenca.di.excel
.
Count¤
Excel COUNT(value_1; value_2; … value_30): Counts how many numbers are in the list of arguments. Text entries are ignored. Value_1; value_2; … value_30 are values or ranges which are to be counted.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | COUNT |
The identifier for this plugin is Excel_COUNT
.
It can be found in the package com.eccenca.di.excel
.
Counta¤
Excel COUNTA(value_1; value_2; … value_30): Counts how many values are in the list of arguments. Text entries are also counted, even when they contain an empty string of length 0. If an argument is an array or reference, empty cells within the array or reference are ignored. value_1; value_2; … value_30 are up to 30 arguments representing the values to be counted.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | COUNTA |
The identifier for this plugin is Excel_COUNTA
.
It can be found in the package com.eccenca.di.excel
.
Degrees¤
Excel DEGREES(number): Converts the given number in radians to degrees.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | DEGREES |
The identifier for this plugin is Excel_DEGREES
.
It can be found in the package com.eccenca.di.excel
.
Devsq¤
Excel DEVSQ(number_1; number_2; … number_30): Returns the sum of squares of deviations based on a sample mean. Number_1; number_2; … number_30 are numerical values or ranges representing a sample.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | DEVSQ |
The identifier for this plugin is Excel_DEVSQ
.
It can be found in the package com.eccenca.di.excel
.
Even¤
Excel EVEN(number): Rounds the given number up to the nearest even integer.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | EVEN |
The identifier for this plugin is Excel_EVEN
.
It can be found in the package com.eccenca.di.excel
.
Exact¤
Excel EXACT(text_1; text_2): Compares two text strings and returns TRUE if they are identical. This function is case- sensitive. Text_1 is the first text to compare. Text_2 is the second text to compare.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | EXACT |
The identifier for this plugin is Excel_EXACT
.
It can be found in the package com.eccenca.di.excel
.
Exp¤
Excel EXP(number): Returns e raised to the power of the given number.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | EXP |
The identifier for this plugin is Excel_EXP
.
It can be found in the package com.eccenca.di.excel
.
Fact¤
Excel FACT(number): Returns the factorial of the given number.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | FACT |
The identifier for this plugin is Excel_FACT
.
It can be found in the package com.eccenca.di.excel
.
False¤
Excel FALSE(): Set the logical value to FALSE. The FALSE() function does not require any arguments.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | FALSE |
The identifier for this plugin is Excel_FALSE
.
It can be found in the package com.eccenca.di.excel
.
Find¤
Excel FIND(find_text; text; position): Looks for a string of text within another string. Where to begin the search can also be defined. The search term can be a number or any string of characters. The search is case-sensitive. Find_text is the text to be found. Text is the text where the search takes place. Position (optional) is the position in the text from which the search starts.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | FIND |
The identifier for this plugin is Excel_FIND
.
It can be found in the package com.eccenca.di.excel
.
Floor¤
Excel FLOOR(number; significance; mode): Rounds the given number down to the nearest multiple of significance. Significance is the value to whose multiple of ten the number is to be rounded down (.01, .1, 1, 10, etc.). Mode is an optional value. If it is indicated and non-zero and if the number and significance are negative, rounding up is carried out based on that value.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | FLOOR |
The identifier for this plugin is Excel_FLOOR
.
It can be found in the package com.eccenca.di.excel
.
Fv¤
Excel FV(rate; NPER; PMT; PV; type): Returns the future value of an investment based on periodic, constant payments and a constant interest rate. Rate is the periodic interest rate. NPER is the total number of periods. PMT is the annuity paid regularly per period. PV (optional) is the present cash value of an investment. Type (optional) defines whether the payment is due at the beginning (1) or the end (0) of a period.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | FV |
The identifier for this plugin is Excel_FV
.
It can be found in the package com.eccenca.di.excel
.
Geomean¤
Excel GEOMEAN(number_1; number_2; … number_30): Returns the geometric mean of a sample. Number_1; number_2; … number_30 are numerical arguments or ranges that represent a random sample.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | GEOMEAN |
The identifier for this plugin is Excel_GEOMEAN
.
It can be found in the package com.eccenca.di.excel
.
If¤
Excel IF(test; then_value; otherwise_value): Returns different values based on the test value. Note that in this implementation it will not actually evaluate logical conditions. Then_value is the value that is returned if the test is TRUE. Otherwise_value (optional) is the value that is returned if the test is FALSE.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | IF |
The identifier for this plugin is Excel_IF
.
It can be found in the package com.eccenca.di.excel
.
Int¤
Excel INT(number): Rounds the given number down to the nearest integer.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | INT |
The identifier for this plugin is Excel_INT
.
It can be found in the package com.eccenca.di.excel
.
Intercept¤
Excel INTERCEPT(data_Y; data_X): Calculates the y-value at which a line will intersect the y-axis by using known x-values and y-values. Data_Y is the dependent set of observations or data. Data_X is the independent set of observations or data. Names, arrays or references containing numbers must be used here. Numbers can also be entered directly.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | INTERCEPT |
The identifier for this plugin is Excel_INTERCEPT
.
It can be found in the package com.eccenca.di.excel
.
Ipmt¤
Excel IPMT(rate; period; NPER; PV; FV; type): Calculates the periodic amortization for an investment with regular payments and a constant interest rate. Rate is the periodic interest rate. Period is the period for which the compound interest is calculated. NPER is the total number of periods during which annuity is paid. Period=NPER, if compound interest for the last period is calculated. PV is the present cash value in sequence of payments. FV (optional) is the desired value (future value) at the end of the periods. Type (optional) defines whether the payment is due at the beginning (1) or the end (0) of a period.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | IPMT |
The identifier for this plugin is Excel_IPMT
.
It can be found in the package com.eccenca.di.excel
.
Irr¤
Excel IRR(values; guess): Calculates the internal rate of return for an investment. The values represent cash flow values at regular intervals; at least one value must be negative (payments), and at least one value must be positive (income). Values is an array containing the values. Guess (optional) is the estimated value. If you can provide only a few values, you should provide an initial guess to enable the iteration.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | IRR |
The identifier for this plugin is Excel_IRR
.
It can be found in the package com.eccenca.di.excel
.
Large¤
Excel LARGE(data; rank_c): Returns the Rank_c-th largest value in a data set. Data is the cell range of data. Rank_c is the ranking of the value (2nd largest, 3rd largest, etc.) written as an integer.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | LARGE |
The identifier for this plugin is Excel_LARGE
.
It can be found in the package com.eccenca.di.excel
.
Left¤
Excel LEFT(text; number): Returns the first character or characters in a text string. Text is the text where the initial partial words are to be determined. Number (optional) is the number of characters for the start text. If this parameter is not defined, one character is returned.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | LEFT |
The identifier for this plugin is Excel_LEFT
.
It can be found in the package com.eccenca.di.excel
.
Ln¤
Excel LN(number): Returns the natural logarithm based on the constant e of the given number.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | LN |
The identifier for this plugin is Excel_LN
.
It can be found in the package com.eccenca.di.excel
.
Log¤
Excel LOG(number; base): Returns the logarithm of the given number to the specified base. Base is the base for the logarithm calculation.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | LOG |
The identifier for this plugin is Excel_LOG
.
It can be found in the package com.eccenca.di.excel
.
Log10¤
Excel LOG10(number): Returns the base-10 logarithm of the given number.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | LOG10 |
The identifier for this plugin is Excel_LOG10
.
It can be found in the package com.eccenca.di.excel
.
Max¤
Excel MAX(number_1; number_2; … number_30): Returns the maximum value in a list of arguments. Number_1; number_2; … number_30 are numerical values or ranges.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | MAX |
The identifier for this plugin is Excel_MAX
.
It can be found in the package com.eccenca.di.excel
.
Maxa¤
Excel MAXA(value_1; value_2; … value_30): Returns the maximum value in a list of arguments. Unlike MAX, text can be entered. The value of the text is 0. Value_1; value_2; … value_30 are values or ranges.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | MAXA |
The identifier for this plugin is Excel_MAXA
.
It can be found in the package com.eccenca.di.excel
.
Median¤
Excel MEDIAN(number_1; number_2; … number_30): Returns the median of a set of numbers. Number_1; number_2; … number_30 are values or ranges, which represent a sample. Each number can also be replaced by a reference.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | MEDIAN |
The identifier for this plugin is Excel_MEDIAN
.
It can be found in the package com.eccenca.di.excel
.
Mid¤
Excel MID(text; start; number): Returns a text segment of a character string. The parameters specify the starting position and the number of characters. Text is the text containing the characters to extract. Start is the position of the first character in the text to extract. Number is the number of characters in the part of the text.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | MID |
The identifier for this plugin is Excel_MID
.
It can be found in the package com.eccenca.di.excel
.
Min¤
Excel MIN(number_1; number_2; … number_30): Returns the minimum value in a list of arguments. Number_1; number_2; … number_30 are numerical values or ranges.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | MIN |
The identifier for this plugin is Excel_MIN
.
It can be found in the package com.eccenca.di.excel
.
Mina¤
Excel MINA(value_1; value_2; … value_30): Returns the minimum value in a list of arguments. Here text can also be entered. The value of the text is 0. Value_1; value_2; … value_30 are values or ranges.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | MINA |
The identifier for this plugin is Excel_MINA
.
It can be found in the package com.eccenca.di.excel
.
Mirr¤
Excel MIRR(values; investment; reinvest_rate): Calculates the modified internal rate of return of a series of investments. Values corresponds to the array or the cell reference for cells whose content corresponds to the payments. Investment is the rate of interest of the investments (the negative values of the array) Reinvest_rate is the rate of interest of the reinvestment (the positive values of the array).
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | MIRR |
The identifier for this plugin is Excel_MIRR
.
It can be found in the package com.eccenca.di.excel
.
Mod¤
Excel MOD(dividend; divisor): Returns the remainder after a number is divided by a divisor. Dividend is the number which will be divided by the divisor. Divisor is the number by which to divide the dividend.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | MOD |
The identifier for this plugin is Excel_MOD
.
It can be found in the package com.eccenca.di.excel
.
Mode¤
Excel MODE(number_1; number_2; … number_30): Returns the most common value in a data set. Number_1; number_2; … number_30 are numerical values or ranges. If several values have the same frequency, it returns the smallest value. An error occurs when a value does not appear twice.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | MODE |
The identifier for this plugin is Excel_MODE
.
It can be found in the package com.eccenca.di.excel
.
Not¤
Excel NOT(logical_value): Reverses the logical value. Logical_value is any value to be reversed.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | NOT |
The identifier for this plugin is Excel_NOT
.
It can be found in the package com.eccenca.di.excel
.
Nper¤
Excel NPER(rate; PMT; PV; FV; type): Returns the number of periods for an investment based on periodic, constant payments and a constant interest rate. Rate is the periodic interest rate. PMT is the constant annuity paid in each period. PV is the present value (cash value) in a sequence of payments. FV (optional) is the future value, which is reached at the end of the last period. Type (optional) defines whether the payment is due at the beginning (1) or the end (0) of a period.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | NPER |
The identifier for this plugin is Excel_NPER
.
It can be found in the package com.eccenca.di.excel
.
Npv¤
Excel NPV(Rate; value_1; value_2; … value_30): Returns the net present value of an investment based on a series of periodic cash flows and a discount rate. Rate is the discount rate for a period. Value_1; value_2;… value_30 are values representing deposits or withdrawals.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | NPV |
The identifier for this plugin is Excel_NPV
.
It can be found in the package com.eccenca.di.excel
.
Odd¤
Excel ODD(number): Rounds the given number up to the nearest odd integer.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ODD |
The identifier for this plugin is Excel_ODD
.
It can be found in the package com.eccenca.di.excel
.
Or¤
Excel OR(logical_value_1; logical_value_2; …logical_value_30): Returns TRUE if at least one argument is TRUE. Returns the value FALSE if all the arguments have the logical value FALSE. Logical_value_1; logical_value_2; …logical_value_30 are conditions to be checked. All conditions can be either TRUE or FALSE. If a range is entered as a parameter, the function uses the value from the range that is in the current column or row.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | OR |
The identifier for this plugin is Excel_OR
.
It can be found in the package com.eccenca.di.excel
.
Percentile¤
Excel PERCENTILE(data; alpha): Returns the alpha-percentile of data values in an array. Data is the array of data. Alpha is the percentage of the scale between 0 and 1.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | PERCENTILE |
The identifier for this plugin is Excel_PERCENTILE
.
It can be found in the package com.eccenca.di.excel
.
Pi¤
Excel PI(): Returns the value of PI to fourteen decimal places.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | PI |
The identifier for this plugin is Excel_PI
.
It can be found in the package com.eccenca.di.excel
.
Pmt¤
Excel PMT(rate; NPER; PV; FV; type): Returns the periodic payment for an annuity with constant interest rates. Rate is the periodic interest rate. NPER is the number of periods in which annuity is paid. PV is the present value (cash value) in a sequence of payments. FV (optional) is the desired value (future value) to be reached at the end of the periodic payments. Type (optional) defines whether the payment is due at the beginning (1) or the end (0) of a period.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | PMT |
The identifier for this plugin is Excel_PMT
.
It can be found in the package com.eccenca.di.excel
.
Poisson¤
Excel POISSON(number; mean; C): Returns the Poisson distribution for the given Number. Mean is the middle value of the Poisson distribution. C = 0 calculates the density function, and C = 1 calculates the distribution.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | POISSON |
The identifier for this plugin is Excel_POISSON
.
It can be found in the package com.eccenca.di.excel
.
Power¤
Excel POWER(base; power): Returns the result of a number raised to a power. Base is the number that is to be raised to the given power. Power is the exponent by which the base is to be raised.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | POWER |
The identifier for this plugin is Excel_POWER
.
It can be found in the package com.eccenca.di.excel
.
Ppmt¤
Excel PPMT(rate; period; NPER; PV; FV; type): Returns for a given period the payment on the principal for an investment that is based on periodic and constant payments and a constant interest rate. Rate is the periodic interest rate. Period is the amortization period. NPER is the total number of periods during which annuity is paid. PV is the present value in the sequence of payments. FV (optional) is the desired (future) value. Type (optional) defines whether the payment is due at the beginning (1) or the end (0) of a period.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | PPMT |
The identifier for this plugin is Excel_PPMT
.
It can be found in the package com.eccenca.di.excel
.
Product¤
Excel PRODUCT(number 1 to 30): Multiplies all the numbers given as arguments and returns the product. Number 1 to number 30 are up to 30 arguments whose product is to be calculated, separated by semi-colons.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | PRODUCT |
The identifier for this plugin is Excel_PRODUCT
.
It can be found in the package com.eccenca.di.excel
.
Proper¤
Excel PROPER(text): Capitalizes the first letter in all words of a text string. Text is the text to be converted.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | PROPER |
The identifier for this plugin is Excel_PROPER
.
It can be found in the package com.eccenca.di.excel
.
Pv¤
Excel PV(rate; NPER; PMT; FV; type): Returns the present value of an investment resulting from a series of regular payments. Rate defines the interest rate per period. NPER is the total number of payment periods. PMT is the regular payment made per period. FV (optional) defines the future value remaining after the final installment has been made. Type (optional) defines whether the payment is due at the beginning (1) or the end (0) of a period.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | PV |
The identifier for this plugin is Excel_PV
.
It can be found in the package com.eccenca.di.excel
.
Radians¤
Excel RADIANS(number): Converts the given number in degrees to radians.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | RADIANS |
The identifier for this plugin is Excel_RADIANS
.
It can be found in the package com.eccenca.di.excel
.
Rand¤
Excel RAND(): Returns a random number between 0 and 1.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | RAND |
The identifier for this plugin is Excel_RAND
.
It can be found in the package com.eccenca.di.excel
.
Rank¤
Excel RANK(value; data; type): Returns the rank of the given Value in a sample. Data is the array or range of data in the sample. Type (optional) is the sequence order, either ascending (0) or descending (1).
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | RANK |
The identifier for this plugin is Excel_RANK
.
It can be found in the package com.eccenca.di.excel
.
Rate¤
Excel RATE(NPER; PMT; PV; FV; type; guess): Returns the constant interest rate per period of an annuity. NPER is the total number of periods, during which payments are made (payment period). PMT is the constant payment (annuity) paid during each period. PV is the cash value in the sequence of payments. FV (optional) is the future value, which is reached at the end of the periodic payments. Type (optional) defines whether the payment is due at the beginning (1) or the end (0) of a period. Guess (optional) determines the estimated value of the interest with iterative calculation.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | RATE |
The identifier for this plugin is Excel_RATE
.
It can be found in the package com.eccenca.di.excel
.
Replace¤
Excel REPLACE(text; position; length; new_text): Replaces part of a text string with a different text string. This function can be used to replace both characters and numbers (which are automatically converted to text). The result of the function is always displayed as text. To perform further calculations with a number which has been replaced by text, convert it back to a number using the VALUE function. Any text containing numbers must be enclosed in quotation marks so it is not interpreted as a number and automatically converted to text. Text is text of which a part will be replaced. Position is the position within the text where the replacement will begin. Length is the number of characters in text to be replaced. New_text is the text which replaces text..
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | REPLACE |
The identifier for this plugin is Excel_REPLACE
.
It can be found in the package com.eccenca.di.excel
.
Rept¤
Excel REPT(text; number): Repeats a character string by the given number of copies. Text is the text to be repeated. Number is the number of repetitions. The result can be a maximum of 255 characters.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | REPT |
The identifier for this plugin is Excel_REPT
.
It can be found in the package com.eccenca.di.excel
.
Right¤
Excel RIGHT(text; number): Defines the last character or characters in a text string. Text is the text of which the right part is to be determined. Number (optional) is the number of characters from the right part of the text.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | RIGHT |
The identifier for this plugin is Excel_RIGHT
.
It can be found in the package com.eccenca.di.excel
.
Roman¤
Excel ROMAN(number; mode): Converts a number into a Roman numeral. The value range must be between 0 and 3999; the modes can be integers from 0 to 4. Number is the number that is to be converted into a Roman numeral. Mode (optional) indicates the degree of simplification. The higher the value, the greater is the simplification of the Roman numeral.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ROMAN |
The identifier for this plugin is Excel_ROMAN
.
It can be found in the package com.eccenca.di.excel
.
Round¤
Excel ROUND(number; count): Rounds the given number to a certain number of decimal places according to valid mathematical criteria. Count (optional) is the number of the places to which the value is to be rounded. If the count parameter is negative, only the whole number portion is rounded. It is rounded to the place indicated by the count.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ROUND |
The identifier for this plugin is Excel_ROUND
.
It can be found in the package com.eccenca.di.excel
.
Rounddown¤
Excel ROUNDDOWN(number; count): Rounds the given number. Count (optional) is the number of digits to be rounded down to. If the count parameter is negative, only the whole number portion is rounded. It is rounded to the place indicated by the count.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ROUNDDOWN |
The identifier for this plugin is Excel_ROUNDDOWN
.
It can be found in the package com.eccenca.di.excel
.
Roundup¤
Excel ROUNDUP(number; count): Rounds the given number up. Count (optional) is the number of digits to which rounding up is to be done. If the count parameter is negative, only the whole number portion is rounded. It is rounded to the place indicated by the count.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | ROUNDUP |
The identifier for this plugin is Excel_ROUNDUP
.
It can be found in the package com.eccenca.di.excel
.
Search¤
Excel SEARCH(find_text; text; position): Returns the position of a text segment within a character string. The start of the search can be set as an option. The search text can be a number or any sequence of characters. The search is not case-sensitive. The search supports regular expressions. Find_text is the text to be searched for. Text is the text where the search will take place. Position (optional) is the position in the text where the search is to start.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SEARCH |
The identifier for this plugin is Excel_SEARCH
.
It can be found in the package com.eccenca.di.excel
.
Sign¤
Excel SIGN(number): Returns the sign of the given number. The function returns the result 1 for a positive sign, 1 for a negative sign, and 0 for zero.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SIGN |
The identifier for this plugin is Excel_SIGN
.
It can be found in the package com.eccenca.di.excel
.
Sin¤
Excel SIN(number): Returns the sine of the given number (angle in radians).
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SIN |
The identifier for this plugin is Excel_SIN
.
It can be found in the package com.eccenca.di.excel
.
Sinh¤
Excel SINH(number): Returns the hyperbolic sine of the given number (angle in radians).
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SINH |
The identifier for this plugin is Excel_SINH
.
It can be found in the package com.eccenca.di.excel
.
Slope¤
Excel SLOPE(data_Y; data_X): Returns the slope of the linear regression line. Data_Y is the array or matrix of Y data. Data_X is the array or matrix of X data.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SLOPE |
The identifier for this plugin is Excel_SLOPE
.
It can be found in the package com.eccenca.di.excel
.
Small¤
Excel SMALL(data; rank_c): Returns the Rank_c-th smallest value in a data set. Data is the cell range of data. Rank_c is the rank of the value (2nd smallest, 3rd smallest, etc.) written as an integer.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SMALL |
The identifier for this plugin is Excel_SMALL
.
It can be found in the package com.eccenca.di.excel
.
Sqrt¤
Excel SQRT(number): Returns the positive square root of the given number. The value of the number must be positive.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SQRT |
The identifier for this plugin is Excel_SQRT
.
It can be found in the package com.eccenca.di.excel
.
Stdev¤
Excel STDEV(number_1; number_2; … number_30): Estimates the standard deviation based on a sample. Number_1; number_2; … number_30 are numerical values or ranges representing a sample based on an entire population.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | STDEV |
The identifier for this plugin is Excel_STDEV
.
It can be found in the package com.eccenca.di.excel
.
Substitute¤
Excel SUBSTITUTE(text; search_text; new text; occurrence): Substitutes new text for old text in a string. Text is the text in which text segments are to be exchanged. Search_text is the text segment that is to be replaced (a number of times). New text is the text that is to replace the text segment. Occurrence (optional) indicates how many occurrences of the search text are to be replaced. If this parameter is missing, the search text is replaced throughout.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SUBSTITUTE |
The identifier for this plugin is Excel_SUBSTITUTE
.
It can be found in the package com.eccenca.di.excel
.
Sum¤
Excel SUM(number_1; number_2; … number_30): Adds all the numbers in a range of cells. Number_1; number_2;… number_30 are up to 30 arguments whose sum is to be calculated. You can also enter a range using cell references.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SUM |
The identifier for this plugin is Excel_SUM
.
It can be found in the package com.eccenca.di.excel
.
Sumproduct¤
Excel SUMPRODUCT(array 1; array 2; …array 30): Multiplies corresponding elements in the given arrays, and returns the sum of those products. Array 1; array 2;…array 30 are arrays whose corresponding elements are to be multiplied. At least one array must be part of the argument list. If only one array is given, all array elements are summed.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SUMPRODUCT |
The identifier for this plugin is Excel_SUMPRODUCT
.
It can be found in the package com.eccenca.di.excel
.
Sumsq¤
Excel SUMSQ(number_1; number_2; … number_30): Calculates the sum of the squares of numbers (totaling up of the squares of the arguments) Number_1; number_2;… number_30 are up to 30 arguments, the sum of whose squares is to be calculated.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SUMSQ |
The identifier for this plugin is Excel_SUMSQ
.
It can be found in the package com.eccenca.di.excel
.
Sumx2my2¤
Excel SUMX2MY2(array_X; array_Y): Returns the sum of the difference of squares of corresponding values in two arrays. Array_X is the first array whose elements are to be squared and added. Array_Y is the second array whose elements are to be squared and subtracted.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SUMX2MY2 |
The identifier for this plugin is Excel_SUMX2MY2
.
It can be found in the package com.eccenca.di.excel
.
Sumx2py2¤
Excel SUMX2PY2(array_X; array_Y): Returns the sum of the sum of squares of corresponding values in two arrays. Array_X is the first array whose arguments are to be squared and added. Array_Y is the second array, whose elements are to be added and squared.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SUMX2PY2 |
The identifier for this plugin is Excel_SUMX2PY2
.
It can be found in the package com.eccenca.di.excel
.
Sumxmy2¤
Excel SUMXMY2(array_X; array_Y): Adds the squares of the variance between corresponding values in two arrays. Array_X is the first array whose elements are to be subtracted and squared. Array_Y is the second array, whose elements are to be subtracted and squared.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | SUMXMY2 |
The identifier for this plugin is Excel_SUMXMY2
.
It can be found in the package com.eccenca.di.excel
.
Tan¤
Excel TAN(number): Returns the tangent of the given number (angle in radians).
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | TAN |
The identifier for this plugin is Excel_TAN
.
It can be found in the package com.eccenca.di.excel
.
Tanh¤
Excel TANH(number): Returns the hyperbolic tangent of the given number (angle in radians).
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | TANH |
The identifier for this plugin is Excel_TANH
.
It can be found in the package com.eccenca.di.excel
.
True¤
Excel TRUE(): Sets the logical value to TRUE. The TRUE() function does not require any arguments.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | TRUE |
The identifier for this plugin is Excel_TRUE
.
It can be found in the package com.eccenca.di.excel
.
Trunc¤
Excel TRUNC(number; count): Truncates a number to an integer by removing the fractional part of the number according to the precision specified in Tools > Options > OpenOffice.org Calc > Calculate. Number is the number whose decimal places are to be cut off. Count is the number of decimal places which are not cut off.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | TRUNC |
The identifier for this plugin is Excel_TRUNC
.
It can be found in the package com.eccenca.di.excel
.
Var¤
Excel VAR(number_1; number_2; … number_30): Estimates the variance based on a sample. Number_1; number_2; … number_30 are numerical values or ranges representing a sample based on an entire population.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | VAR |
The identifier for this plugin is Excel_VAR
.
It can be found in the package com.eccenca.di.excel
.
Varp¤
Excel VARP(Number_1; number_2; … number_30): Calculates a variance based on the entire population. Number_1; number_2; … number_30 are numerical values or ranges representing an entire population.
Parameter | Type | Description | Default |
---|---|---|---|
functionName | String | The name of the Excel function | VARP |
The identifier for this plugin is Excel_VARP
.
It can be found in the package com.eccenca.di.excel
.
Extract¤
Regex extract¤
Extracts occurrences of a regex “regex” in a string. If there is at least one capture group, it will return the string of the first capture group instead.
Parameter | Type | Description | Default |
---|---|---|---|
regex | String | Regular expression | no default |
extractAll | boolean | If true, all matches are extracted. If false, only the first match is extracted. | false |
The identifier for this plugin is regexExtract
.
It can be found in the package org.silkframework.rule.plugins.transformer.extraction
.
Examples
returns the first match Returns [afe123] for parameters [regex -> [a-z]{2,4}123] and input values [[afe123_abc123]].
returns all matches, if extractAll = true Returns [afe123, abc123] for parameters [regex -> [a-z]{2,4}123, extractAll -> true] and input values [[afe123_abc123]].
returns an empty list if nothing matches Returns [] for parameters [regex -> ^[a-z]{2,4}123] and input values [[abcdef123]].
returns the match of the first capture group that matches Returns [abcd] for parameters [regex -> ^([a-z]{2,4})123([a-z]+)] and input values [[abcd123xyz]].
Filter¤
Filter by length¤
Removes all strings that are shorter than ‘min’ characters and longer than ‘max’ characters.
Parameter | Type | Description | Default |
---|---|---|---|
min | int | No description | 0 |
max | int | No description | 2147483647 |
The identifier for this plugin is filterByLength
.
It can be found in the package org.silkframework.rule.plugins.transformer.filter
.
Filter by regex¤
Removes all strings that do NOT match a regex. If ‘negate’ is true, only strings will be removed that match the regex.
Parameter | Type | Description | Default |
---|---|---|---|
regex | String | No description | no default |
negate | boolean | No description | false |
The identifier for this plugin is filterByRegex
.
It can be found in the package org.silkframework.rule.plugins.transformer.filter
.
Remove empty values¤
Removes empty values.
This plugin does not require any parameters.
The identifier for this plugin is removeEmptyValues
.
It can be found in the package org.silkframework.rule.plugins.transformer.filter
.
Examples
Returns [value1, value2] for parameters [] and input values [[value1, , value2]].
Returns [] for parameters [] and input values [[, ]].
Remove stopwords (remote stopword list)¤
Removes stopwords from all values. The stopword list is retrieved via a http connection (e.g. https://sites.google.com/site/kevinbouge/stopwords-lists/stopwords_de.txt). Each line in the stopword list contains a stopword. The separator defines a regex that is used for detecting words.
Parameter | Type | Description | Default |
---|---|---|---|
stopWordListUrl | String | No description | no default |
separator | String | No description | [\s-]+ |
The identifier for this plugin is removeRemoteStopwords
.
It can be found in the package org.silkframework.rule.plugins.transformer.filter
.
Remove stopwords¤
Removes stopwords from all values. Each line in the stopword list contains a stopword. The separator defines a regex that is used for detecting words.
Parameter | Type | Description | Default |
---|---|---|---|
stopwordList | Resource | No description | no default |
separator | String | No description | [\s-]+ |
The identifier for this plugin is removeStopwords
.
It can be found in the package org.silkframework.plugins.filter
.
Remove values¤
Removes values that contain words from a blacklist. The blacklist values are separated with commas.
Parameter | Type | Description | Default |
---|---|---|---|
blacklist | String | No description | no default |
The identifier for this plugin is removeValues
.
It can be found in the package org.silkframework.rule.plugins.transformer.filter
.
Geo¤
Retrieve coordinates¤
Retrieves geographic coordinates using Nominatim.
Parameter | Type | Description | Default |
---|---|---|---|
additionalParameters | String | Additional URL parameters to be attached to each HTTP search request. Example: ‘&countrycodes=de&addressdetails=1’. Consult the API documentation for a list of available parameters. | empty string |
The identifier for this plugin is RetrieveCoordinates
.
It can be found in the package com.eccenca.di.geo
.
Configuration
The geocoding service to be queried for searches can be set up in the configuration. The default configuration is as follows:
com.eccenca.di.geo = {
# The URL of the geocoding service
# url = "https://nominatim.eccenca.com/search"
url = "https://photon.komoot.de/api"
# url = https://api-adresse.data.gouv.fr/search
# Additional URL parameters to be attached to all HTTP search requests. Example: '&countrycodes=de&addressdetails=1'.
# Will be attached in addition to the parameters set on each search operator directly.
searchParameters = ""
# The minimum pause time between subsequent queries
pauseTime = 1s
# Number of coordinates to be cached in-memory
cacheSize = 10
}
In general, all services adhering to the Nominatim search API should be usable. Please note that when using public services, the pause time should be set to avoid overloading.
Logging
By default, individual requests to the geocoding service are not logged. To enable logging each request, the following configuration option can be set:
logging.level {
com.eccenca.di.geo=DEBUG
}
Retrieve latitude¤
Retrieves geographic coordinates using Nominatim and returns the latitude.
Parameter | Type | Description | Default |
---|---|---|---|
additionalParameters | String | Additional URL parameters to be attached to each HTTP search request. Example: ‘&countrycodes=de&addressdetails=1’. Consult the API documentation for a list of available parameters. | empty string |
The identifier for this plugin is RetrieveLatitude
.
It can be found in the package com.eccenca.di.geo
.
Configuration
The geocoding service to be queried for searches can be set up in the configuration. The default configuration is as follows:
com.eccenca.di.geo = {
# The URL of the geocoding service
# url = "https://nominatim.eccenca.com/search"
url = "https://photon.komoot.de/api"
# url = https://api-adresse.data.gouv.fr/search
# Additional URL parameters to be attached to all HTTP search requests. Example: '&countrycodes=de&addressdetails=1'.
# Will be attached in addition to the parameters set on each search operator directly.
searchParameters = ""
# The minimum pause time between subsequent queries
pauseTime = 1s
# Number of coordinates to be cached in-memory
cacheSize = 10
}
In general, all services adhering to the Nominatim search API should be usable. Please note that when using public services, the pause time should be set to avoid overloading.
Logging
By default, individual requests to the geocoding service are not logged. To enable logging each request, the following configuration option can be set:
logging.level {
com.eccenca.di.geo=DEBUG
}
Retrieve longitude¤
Retrieves geographic coordinates using Nominatim and returns the longitude.
Parameter | Type | Description | Default |
---|---|---|---|
additionalParameters | String | Additional URL parameters to be attached to each HTTP search request. Example: ‘&countrycodes=de&addressdetails=1’. Consult the API documentation for a list of available parameters. | empty string |
The identifier for this plugin is RetrieveLongitude
.
It can be found in the package com.eccenca.di.geo
.
Configuration
The geocoding service to be queried for searches can be set up in the configuration. The default configuration is as follows:
com.eccenca.di.geo = {
# The URL of the geocoding service
# url = "https://nominatim.eccenca.com/search"
url = "https://photon.komoot.de/api"
# url = https://api-adresse.data.gouv.fr/search
# Additional URL parameters to be attached to all HTTP search requests. Example: '&countrycodes=de&addressdetails=1'.
# Will be attached in addition to the parameters set on each search operator directly.
searchParameters = ""
# The minimum pause time between subsequent queries
pauseTime = 1s
# Number of coordinates to be cached in-memory
cacheSize = 10
}
In general, all services adhering to the Nominatim search API should be usable. Please note that when using public services, the pause time should be set to avoid overloading.
Logging
By default, individual requests to the geocoding service are not logged. To enable logging each request, the following configuration option can be set:
logging.level {
com.eccenca.di.geo=DEBUG
}
Linguistic¤
NYSIIS¤
NYSIIS phonetic encoding. Provided by the StringMetric library: http://rockymadden.com/stringmetric/.
Parameter | Type | Description | Default |
---|---|---|---|
refined | boolean | No description | true |
The identifier for this plugin is NYSIIS
.
It can be found in the package org.silkframework.rule.plugins.transformer.linguistic
.
Metaphone¤
Metaphone phonetic encoding. Provided by the StringMetric library: http://rockymadden.com/stringmetric/.
This plugin does not require any parameters.
The identifier for this plugin is metaphone
.
It can be found in the package org.silkframework.rule.plugins.transformer.linguistic
.
Normalize chars¤
Replaces diacritical characters with non-diacritical ones (eg, ö -> o), plus some specialities like transforming æ -> ae, ß -> ss.
This plugin does not require any parameters.
The identifier for this plugin is normalizeChars
.
It can be found in the package org.silkframework.rule.plugins.transformer.linguistic
.
Soundex¤
Soundex algorithm. Provided by the StringMetric library: http://rockymadden.com/stringmetric/.
Parameter | Type | Description | Default |
---|---|---|---|
refined | boolean | No description | true |
The identifier for this plugin is soundex
.
It can be found in the package org.silkframework.rule.plugins.transformer.linguistic
.
Stem¤
Stems a string using the Porter Stemmer.
This plugin does not require any parameters.
The identifier for this plugin is stem
.
It can be found in the package org.silkframework.rule.plugins.transformer.linguistic
.
Normalize¤
Strip non-alphabetic characters¤
Strips all non-alphabetic characters from a string. Spaces are retained.
This plugin does not require any parameters.
The identifier for this plugin is alphaReduce
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Capitalize¤
Capitalizes the string i.e. converts the first character to upper case. If ‘allWords’ is set to true, all words are capitalized and not only the first character.
Parameter | Type | Description | Default |
---|---|---|---|
allWords | boolean | No description | false |
The identifier for this plugin is capitalize
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Examples
Returns [Capitalize me] for parameters [allWords -> false] and input values [[capitalize me]].
Returns [Capitalize Me] for parameters [allWords -> true] and input values [[capitalize me]].
Extract physical quantity¤
Extracts physical quantities, such as length or weight values. Values are expected of the form ‘{Number}{UnitPrefix}{Symbol}’ and are converted to the base unit.
Example:
- Given a value ‘10km, 3mg’.
- If the symbol parameter is set to ‘m’, the extracted value is 10000.
- If the symbol parameter is set to ‘g’, the extracted value is 0.001.
Parameter | Type | Description | Default |
---|---|---|---|
symbol | String | The symbol of the dimension, e.g., ‘m’ for meter. | empty string |
numberFormat | String | The IETF BCP 47 language tag, e.g. ‘en’. | en |
filter | String | Only extracts from values that contain the given regex (case-insensitive). | empty string |
index | int | If there are multiple matches, retrieve the value with the given index (zero-based). | 0 |
The identifier for this plugin is extractPhysicalQuantity
.
It can be found in the package org.silkframework.rule.plugins.transformer.numeric
.
Clean HTML¤
Cleans HTML using a tag white list and allows selection of HTML sections with xPath or cssSelector expressions. If the tag or attribute white lists are left empty default white lists will be used. The operator takes two inputs: the page HTML and (optional) the page Url which may be needed to resolve relative links in the page HTML.
Parameter | Type | Description | Default |
---|---|---|---|
tagWhiteList | String | Tags to keep in the cleaned Text (or reference to a configuration). | empty string |
attributeWhiteList | String | Tags to keep in the cleaned Text (or reference to a configuration). | empty string |
selectors | MultilineStringParameter | CSS or XPath queries for selection of content (or reference to a configuration). Comma separated. CssSelectors can be pipe separated for non-sequential execution. | no default |
method | Enum | Selects use of xPath or css selectors (‘xPath’ or ‘cssSelectors’). | xPath |
The identifier for this plugin is htmlCleaner
.
It can be found in the package com.eccenca.di.plugins.html
.
Lower case¤
Converts a string to lower case.
This plugin does not require any parameters.
The identifier for this plugin is lowerCase
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Remove blanks¤
Remove whitespace from a string.
This plugin does not require any parameters.
The identifier for this plugin is removeBlanks
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Remove duplicates¤
Removes duplicated values, making a value sequence distinct.
This plugin does not require any parameters.
The identifier for this plugin is removeDuplicates
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Remove parentheses¤
Remove all parentheses including their content, e.g., transforms ‘Berlin (City)’ -> ‘Berlin’.
This plugin does not require any parameters.
The identifier for this plugin is removeParentheses
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Remove special chars¤
Remove special characters (including punctuation) from a string.
This plugin does not require any parameters.
The identifier for this plugin is removeSpecialChars
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Strip URI prefix¤
Strips the URI prefix and decodes the remainder. Leaves values unchanged which are not a valid URI.
This plugin does not require any parameters.
The identifier for this plugin is stripUriPrefix
.
It can be found in the package org.silkframework.rule.plugins.transformer.substring
.
Examples
Returns [value] for parameters [] and input values [[http://example.org/some/path/to/value]].
Returns [value] for parameters [] and input values [[urn:scheme:value]].
Returns [encoded välue] for parameters [] and input values [[http://example.org/some/path/to/encoded%20v%C3%A4lue]].
Returns [value] for parameters [] and input values [[value]].
Trim¤
Remove leading and trailing whitespaces.
This plugin does not require any parameters.
The identifier for this plugin is trim
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Upper case¤
Converts a string to upper case.
This plugin does not require any parameters.
The identifier for this plugin is upperCase
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Fix URI¤
Generates valid absolute URIs from the given values. Already valid absolute URIs are left untouched.
Parameter | Type | Description | Default |
---|---|---|---|
uriPrefix | String | No description | urn:url-encoded-value: |
The identifier for this plugin is uriFix
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Examples
Returns [urn:url-encoded-value:ab] for parameters [] and input values [[ab]].
Returns [urn:url-encoded-value:a%26b] for parameters [] and input values [[a&b]].
Returns [http://example.org/some/path] for parameters [] and input values [[http://example.org/some/path]].
Returns [http://example.org/path?query=some+stuff#hashtag] for parameters [] and input values [[http://example.org/path?query=some+stuff#hashtag]].
Returns [urn:valid:uri] for parameters [] and input values [[urn:valid:uri]].
Returns [http://www.broken%20domain.com/broken%20weird%20path%20%C3%A4%C3%B6%C3%BC/nice/path/andNowSomeFragment#fragment%C3%A4%C3%B6%C3%BC] for parameters [] and input values [[http://www.broken domain.com/broken weird path äöü/nice/path/andNowSomeFragment#fragmentäöü]].
Returns [http://domain/#%23path%23] for parameters [] and input values [[http://domain/##path#]].
Returns [urn:url-encoded-value:http+%3A+invalid+URI] for parameters [] and input values [[http : invalid URI]].
Encode URL¤
URL encodes the string.
Parameter | Type | Description | Default |
---|---|---|---|
encoding | String | The character encoding. | UTF-8 |
The identifier for this plugin is urlEncode
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Examples
Returns [ab] for parameters [] and input values [[ab]].
Returns [a%26b] for parameters [] and input values [[a&b]].
Returns [http%3A%2F%2Fexample.org%2Fsome%2Fpath] for parameters [] and input values [[http://example.org/some/path]].
Numeric¤
Normalize physical quantity¤
Normalizes physical quantities. Can either convert to a configured unit or to SI base units. For instance for lengths, values will be converted to metres if no target unit is configured. Will output the pure numeric value without the unit. If one input is provided, the physical quantities are parsed from the provided strings of the form “1 km”. If two inputs are provided, the numeric values are parsed from the first input and the units are parsed from the second inputs.
Parameter | Type | Description | Default |
---|---|---|---|
targetUnit | String | Target unit. Can be left empty to convert to the respective SI base units. | empty string |
numberFormat | String | The IETF BCP 47 language tag, e.g., ‘en’. | en |
The identifier for this plugin is PhysicalQuantitiesNormalizer
.
It can be found in the package com.eccenca.di.measure
.
SI units and common derived units are supported. The following section lists all supported units. By default, all quantities are normalized to their base unit. For instance, lengths will be normalized to metres.
Time
Time is expressed in seconds (s). The following alternative units are supported: mo_s, mo_g, a, min, a_g, mo, mo_j, a_j, h, a_t, d.
Length
Length is expressed in metres (m). The following alternative units are supported: in, nmi, Ao, mil, yd, AU, ft, pc, fth, mi, hd.
Mass
Mass is expressed in kilograms (kg). The following alternative units are supported: lb, ston, t, stone, u, gr, lcwt, oz, g, scwt, dr, lton.
Electric current
Electric current is expressed in amperes (A). The following alternative units are supported: Bi, Gb.
Temperature
Temperature is expressed in kelvins (K). The following alternative units are supported: Cel.
Amount of substance
Amount of substance is expressed in moles (mol).
Luminous intensity
Luminous intensity is expressed in candelas (cd).
Area
Area is expressed in square metres (m²). The following alternative units are supported: m2, ar, syd, cml, b, sft, sin.
Volume
Volume is expressed in cubic metres (㎥). The following alternative units are supported: st, bf, cyd, cr, L, l, cin, cft, m3.
Energy
Energy is expressed in joules (J). The following alternative units are supported: cal_IT, eV, cal_m, cal, cal_th.
Angle
Angle is expressed in radians (rad). The following alternative units are supported: circ, gon, deg, ‘, ‘’.
Others
- 1/m, derived units: Ky
- kg/(m·s), derived units: P
- bit/s, derived units: Bd
- bit, derived units: By
- Sv
- N
- Ω, derived units: Ohm
- T, derived units: G
- sr, derived units: sph
- F
- C/kg, derived units: R
- cd/m², derived units: sb, Lmb
- Pa, derived units: bar, atm
- kg/(m·s²), derived units: att
- m²/s, derived units: St
- A/m, derived units: Oe
- kg·m²/s², derived units: erg
- kg/m³, derived units: g%
- mho
- V
- lx, derived units: ph
- m/s², derived units: Gal, m/s2
- m/s, derived units: kn
- m·kg/s², derived units: gf, lbf, dyn
- m²/s², derived units: RAD, REM
- C
- Gy
- Hz
- H
- lm
- W
- Wb, derived units: Mx
- Bq, derived units: Ci
- S Examples
Returns [1000.0] for parameters [] and input values [[1 km]].
Returns [0.3048] for parameters [] and input values [[1.0000 ft]].
Returns [0.45359237] for parameters [] and input values [[1.0lb]].
Returns [1.0] for parameters [] and input values [[1000000000.0 nm]].
Returns [-1000000.0] for parameters [] and input values [[-1E6 m]].
Returns [1000.5] for parameters [numberFormat -> de] and input values [[1.000,5 m]].
Returns [1000.5] for parameters [] and input values [[1,000.5 m]].
Returns [0.621371192237334] for parameters [targetUnit -> mi] and input values [[1 km]].
Fails validation and thus returns [] for parameters [targetUnit -> m] and input values [[1 kg]].
Fails validation and thus returns [] for parameters [] and input values [[100.0]].
Returns [1000.0] for parameters [] and input values [[1], [km]].
Returns [1000.0, 10.0] for parameters [] and input values [[1, 10000], [km, mm]].
Fails validation and thus returns [] for parameters [] and input values [[1, 10000, 10], [km, mm]].
Aggregate numbers¤
Aggregates all numbers in this set using a mathematical operation.
Parameter | Type | Description | Default |
---|---|---|---|
operator | String | One of ‘+’, ‘*’, ‘min’, ‘max’, ‘average’. | no default |
The identifier for this plugin is aggregateNumbers
.
It can be found in the package org.silkframework.rule.plugins.transformer.numeric
.
Compare numbers¤
Compares the numbers of two sets. Returns 1 if the comparison yields true and 0 otherwise. If there are multiple numbers in both sets, the comparator must be true for all numbers. For instance, {1,2} < {2,3} yields 0 as not all numbers in the first set are smaller than in the second.
Parameter | Type | Description | Default |
---|---|---|---|
comparator | Enum | No description | < |
The identifier for this plugin is compareNumbers
.
It can be found in the package org.silkframework.rule.plugins.transformer.numeric
.
Count values¤
Counts the number of values.
This plugin does not require any parameters.
The identifier for this plugin is count
.
It can be found in the package org.silkframework.rule.plugins.transformer.numeric
.
Examples
Returns [1] for parameters [] and input values [[value1]].
Returns [2] for parameters [] and input values [[value1, value2]].
Extract physical quantity¤
Extracts physical quantities, such as length or weight values. Values are expected of the form ‘{Number}{UnitPrefix}{Symbol}’ and are converted to the base unit.
Example:
- Given a value ‘10km, 3mg’.
- If the symbol parameter is set to ‘m’, the extracted value is 10000.
- If the symbol parameter is set to ‘g’, the extracted value is 0.001.
Parameter | Type | Description | Default |
---|---|---|---|
symbol | String | The symbol of the dimension, e.g., ‘m’ for meter. | empty string |
numberFormat | String | The IETF BCP 47 language tag, e.g. ‘en’. | en |
filter | String | Only extracts from values that contain the given regex (case-insensitive). | empty string |
index | int | If there are multiple matches, retrieve the value with the given index (zero-based). | 0 |
The identifier for this plugin is extractPhysicalQuantity
.
It can be found in the package org.silkframework.rule.plugins.transformer.numeric
.
Format number¤
Formats a number according to a user-defined pattern. The pattern syntax is documented at: https://docs.oracle.com/javase/8/docs/api/java/text/DecimalFormat.html
Parameter | Type | Description | Default |
---|---|---|---|
pattern | String | No description | no default |
locale | String | No description | en |
The identifier for this plugin is formatNumber
.
It can be found in the package org.silkframework.rule.plugins.transformer.numeric
.
Examples
Returns [001] for parameters [pattern -> 000] and input values [[1]].
Returns [000123.780] for parameters [pattern -> 000000.000] and input values [[123.78]].
Returns [123,456.789] for parameters [pattern -> ###,###.###] and input values [[123456.789]].
Returns [123.456,789] for parameters [pattern -> ###.###,###, locale -> de] and input values [[123456.789]].
Returns [10 apples] for parameters [pattern -> # apples] and input values [[10]].
Returns [0010] for parameters [pattern -> 000‘0’] and input values [[1]].
Returns [1] for parameters [pattern -> 0] and input values [[1.0]].
Logarithm¤
Transforms all numbers by applying the logarithm function. Non-numeric values are left unchanged.
Parameter | Type | Description | Default |
---|---|---|---|
base | int | No description | 10 |
The identifier for this plugin is log
.
It can be found in the package org.silkframework.rule.plugins.transformer.numeric
.
Numeric operation¤
Applies a numeric operation to the values of multiple input operators. Uses double-precision floating-point numbers for computation.
Parameter | Type | Description | Default |
---|---|---|---|
operator | String | The operator to be applied to all values. One of ‘+’, ‘-‘, ‘*’, ‘/’ | no default |
The identifier for this plugin is numOperation
.
It can be found in the package org.silkframework.rule.plugins.transformer.numeric
.
Examples
Returns [2.0] for parameters [operator -> +] and input values [[1], [1]].
Returns [0.0] for parameters [operator -> -] and input values [[1], [1]].
Returns [30.0] for parameters [operator -> *] and input values [[5], [6]].
Returns [2.5] for parameters [operator -> /] and input values [[5], [2]].
Returns [] for parameters [operator -> +] and input values [[1], [no number]].
Returns [1.0] for parameters [operator -> *] and input values [[1], []].
Returns [3.0] for parameters [operator -> +] and input values [[1, 1], [1]].
Numeric reduce¤
Strip all non-numeric characters from a string.
Parameter | Type | Description | Default |
---|---|---|---|
keepPunctuation | boolean | No description | true |
The identifier for this plugin is numReduce
.
It can be found in the package org.silkframework.rule.plugins.transformer.numeric
.
Examples
Returns [12] for parameters [keepPunctuation -> false] and input values [[some1.2Value]].
Returns [1.2] for parameters [keepPunctuation -> true] and input values [[some1.2Value]].
Parser¤
Parse date¤
Parses and normalizes dates in different formats.
Parameter | Type | Description | Default |
---|---|---|---|
inputDateFormatId | Enum | The input date/time format used for parsing the date/time string. | w3c Date |
alternativeInputFormat | String | An input format string that should be used instead of the selected input format. Java DateFormat string. | empty string |
outputDateFormatId | Enum | The output date/time format used for parsing the date/time string. | w3c Date |
alternativeOutputFormat | String | An output format string that should be used instead of the selected output format. Java DateFormat string. | empty string |
The identifier for this plugin is DateTypeParser
.
It can be found in the package com.eccenca.di.schema.discovery.parser
.
Examples
Returns [1999-03-20] for parameters [inputDateFormatId -> German style date format, outputDateFormatId -> w3c Date] and input values [[20.03.1999]].
Returns [20.03.1999] for parameters [inputDateFormatId -> w3c Date, outputDateFormatId -> German style date format] and input values [[1999-03-20]].
Returns [2017-04-04] for parameters [inputDateFormatId -> common ISO8601, outputDateFormatId -> w3c Date] and input values [[2017-04-04T00:00:00.000+02:00]].
Returns [2017-04-04] for parameters [inputDateFormatId -> common ISO8601, outputDateFormatId -> w3c Date] and input values [[2017-04-04T00:00:00+02:00]].
Returns [24-Jun-2021 14:50:05 +02:00] for parameters [inputDateFormatId -> common ISO8601, outputDateFormatId -> dateTime with month abbr. (US)] and input values [[2021-06-24T14:50:05.895+02:00]].
Returns [24-Dez.-2021 14:50:05 +02:00] for parameters [inputDateFormatId -> dateTime with month abbr. (US), outputDateFormatId -> dateTime with month abbr. (DE)] and input values [[24-Dec-2021 14:50:05 +02:00]].
Returns [1999-03-20T20:34.44] for parameters [alternativeInputFormat -> dd.MM.yyyy HH:mm.ss, alternativeOutputFormat -> yyyy-MM-dd’T’HH:mm.ss] and input values [[20.03.1999 20:34.44]].
Returns [12:20:00.000] for parameters [inputDateFormatId -> excelDateTime, outputDateFormatId -> xsdTime] and input values [[12:20:00.000]].
Returns [–01] for parameters [inputDateFormatId -> w3c YearMonth, outputDateFormatId -> w3c Month] and input values [[2020-01]].
Returns [—31] for parameters [inputDateFormatId -> w3c MonthDay, outputDateFormatId -> w3c Day] and input values [[–12-31]].
Returns [–12-31] for parameters [inputDateFormatId -> w3c Date, outputDateFormatId -> w3c MonthDay] and input values [[2020-12-31]].
Fails validation and thus returns [] for parameters [inputDateFormatId -> w3c MonthDay, outputDateFormatId -> w3c Date] and input values [[–12-31]].
Returns [2020-02-22T16:34:14] for parameters [alternativeInputFormat -> yyyy-MM-dd HHss.SSS, outputDateFormatId -> w3cDateTime] and input values [[2020-02-22 16:34:14.000]].
Parse float¤
Parses and normalizes float values.
Parameter | Type | Description | Default |
---|---|---|---|
commaAsDecimalPoint | boolean | No description | false |
thousandSeparator | boolean | No description | false |
bracketsForNegative | boolean | No description | false |
The identifier for this plugin is FloatTypeParser
.
It can be found in the package com.eccenca.di.schema.discovery.parser
.
Parse geo coordinate¤
Parses and normalizes geo coordinates.
This plugin does not require any parameters.
The identifier for this plugin is GeoCoordinateParser
.
It can be found in the package com.eccenca.di.schema.discovery.parser
.
Parse geo location¤
Parses and normalizes geo locations like continents, countries, states and cities.
Parameter | Type | Description | Default |
---|---|---|---|
parseTypeId | Enum | What type of location should be parsed. | no default |
fullStateName | boolean | Set to true if the full state name should be output instead of the 2-letter code. | true |
The identifier for this plugin is GeoLocationParser
.
It can be found in the package com.eccenca.di.schema.discovery.parser
.
Parse integer¤
Parses integer values.
Parameter | Type | Description | Default |
---|---|---|---|
commaAsDecimalPoint | boolean | Use comma as decimal point (uses a point, otherwise) | false |
thousandSeparator | boolean | Use comma or point to separate digits | false |
The identifier for this plugin is IntegerParser
.
It can be found in the package com.eccenca.di.schema.discovery.parser
.
Parse ISIN¤
Parses International Securities Identification Numbers (ISIN) values and fails if the String is no valid ISIN.
This plugin does not require any parameters.
The identifier for this plugin is IsinParser
.
It can be found in the package com.eccenca.di.schema.discovery.parser
.
Parse SKOS term¤
Parses values from a SKOS ontology.
Parameter | Type | Description | Default |
---|---|---|---|
surfaceFormToRepresentationMapping | Map | No description | no default |
The identifier for this plugin is SkosTypeParser
.
It can be found in the package com.eccenca.di.schema.discovery.discoverer
.
Parse string¤
Parses string values, basically an identity function.
This plugin does not require any parameters.
The identifier for this plugin is StringParser
.
It can be found in the package com.eccenca.di.schema.discovery.parser
.
Clean HTML¤
Cleans HTML using a tag white list and allows selection of HTML sections with xPath or cssSelector expressions. If the tag or attribute white lists are left empty default white lists will be used. The operator takes two inputs: the page HTML and (optional) the page Url which may be needed to resolve relative links in the page HTML.
Parameter | Type | Description | Default |
---|---|---|---|
tagWhiteList | String | Tags to keep in the cleaned Text (or reference to a configuration). | empty string |
attributeWhiteList | String | Tags to keep in the cleaned Text (or reference to a configuration). | empty string |
selectors | MultilineStringParameter | CSS or XPath queries for selection of content (or reference to a configuration). Comma separated. CssSelectors can be pipe separated for non-sequential execution. | no default |
method | Enum | Selects use of xPath or css selectors (‘xPath’ or ‘cssSelectors’). | xPath |
The identifier for this plugin is htmlCleaner
.
It can be found in the package com.eccenca.di.plugins.html
.
Replace¤
Excel map¤
Replaces values based on a map of values read from a file in Open XML format (XLSX). The XLSX file may contain several sheets of the form:
mapFrom,mapTo
,
An empty string can be created in Excel and alternatives by inserting =”” in the input line of a cell.
If there are multiple values for a single key, all values will be returned for the given key.
Note that the mapping table will be cached in memory. If the Excel file is updated (even while transforming), the map will be reloaded within seconds.
Parameter | Type | Description | Default |
---|---|---|---|
excelFile | Resource | Excel file inside the resources directory containing one or more sheets with mapping tables. | no default |
sheetName | String | The sheet that contains the mapping table or empty if the first sheet should be taken. | empty string |
skipLines | int | How many rows to skip before reading the mapping table. By default the expected header row is skipped. | 1 |
strict | boolean | If set to true the operator throws validation errors for values it cannot map. If set to false it will output them unchanged. | true |
conflictStrategy | Enum | The strategy how to cope with map conflicts when in strict-mode. Current strategies are to retain the values and to remove them. | retain |
The identifier for this plugin is excelMap
.
It can be found in the package com.eccenca.di.excel
.
Map¤
Replaces values based on a map of values.
Parameter | Type | Description | Default |
---|---|---|---|
map | Map | A map of values | no default |
default | String | Default if the map defines no value | no default |
The identifier for this plugin is map
.
It can be found in the package org.silkframework.rule.plugins.transformer.replace
.
Examples
Returns [Value1] for parameters [map -> Key1:Value1,Key2:Value2, default -> Undefined] and input values [[Key1]].
Returns [Undefined] for parameters [map -> Key1:Value1,Key2:Value2, default -> Undefined] and input values [[Key1X]].
Map with default¤
Takes two inputs. Tries to map the first input based on the map of values parameter config. If the input value is not found in the map, it takes the value of the second input. The indexes of the mapped value and the default value match. If there are less default values than values to map, the last default value is replicated to match the count.
Parameter | Type | Description | Default |
---|---|---|---|
map | Map | A map of values | no default |
The identifier for this plugin is mapWithDefaultInput
.
It can be found in the package org.silkframework.rule.plugins.transformer.replace
.
Regex replace¤
Replace all occurrences of a regex “regex” with “replace” in a string.
Parameter | Type | Description | Default |
---|---|---|---|
regex | String | No description | no default |
replace | String | No description | no default |
The identifier for this plugin is regexReplace
.
It can be found in the package org.silkframework.rule.plugins.transformer.replace
.
Replace¤
Replace all occurrences of a string “search” with “replace” in a string.
Parameter | Type | Description | Default |
---|---|---|---|
search | String | No description | no default |
replace | String | No description | no default |
The identifier for this plugin is replace
.
It can be found in the package org.silkframework.rule.plugins.transformer.replace
.
Selection¤
Coalesce (first non-empty input)¤
Forwards the first non-empty input, i.e. for which any value(s) exist. A single empty string is considered a value.
This plugin does not require any parameters.
The identifier for this plugin is coalesce
.
It can be found in the package org.silkframework.rule.plugins.transformer.selection
.
Examples
Returns [] for parameters [] and input values [[], [], []].
Returns [] for parameters [] and input values [[], []].
Returns [] for parameters [] and input values [].
Returns [first] for parameters [] and input values [[], [first], [second]].
Returns [first A, first B] for parameters [] and input values [[], [first A, first B], [second]].
Returns [first] for parameters [] and input values [[first], [second]].
Regex selection¤
This transformer takes 3 inputs.
The first input should have exactly one value that should be passed out again untouched.
The second input has at least two Regex values - two in order to make sense.
The third input should have exactly one value which is checked against the regexes.
The result of the transformer is a sequence with the same length of number of regexes.
For the output value (of the first input) is set to each position in this sequence where
the related regex also matched.
If oneOnly is true only the position of the <strong>first</strong> matching regex will be set to the output value.
Parameter | Type | Description | Default |
---|---|---|---|
oneOnly | boolean | No description | false |
The identifier for this plugin is regexSelect
.
It can be found in the package org.silkframework.rule.plugins.transformer.selection
.
Sequence¤
Count values¤
Counts the number of values.
This plugin does not require any parameters.
The identifier for this plugin is count
.
It can be found in the package org.silkframework.rule.plugins.transformer.numeric
.
Examples
Returns [1] for parameters [] and input values [[value1]].
Returns [2] for parameters [] and input values [[value1, value2]].
Get value by index¤
Returns the value found at the specified index. Fails or returns an empty result depending on failIfNoFound is set or not. Please be aware that this will work only if the data source supports some kind of ordering like XML or JSON. This is probably not a good idea to do with RDF models.
If emptyStringToEmptyResult is true then instead of a result with an empty String, an empty result is returned.
Parameter | Type | Description | Default |
---|---|---|---|
index | int | No description | no default |
failIfNotFound | boolean | No description | false |
emptyStringToEmptyResult | boolean | No description | false |
The identifier for this plugin is getValueByIndex
.
It can be found in the package org.silkframework.rule.plugins.transformer.sequence
.
Sequence values to indexes¤
Transforms the sequence of values to their respective indexes in the sequence. Example: - (“a”, “b”, “c”) becomes (0, 1, 2)
If there is more than one input, the values are numbered from the first input on and continued for the next inputs. Applied against an RDF source the order might not be deterministic.
This plugin does not require any parameters.
The identifier for this plugin is toSequenceIndex
.
It can be found in the package org.silkframework.rule.plugins.transformer.sequence
.
Substring¤
Strip postfix¤
Strips a postfix of a string.
Parameter | Type | Description | Default |
---|---|---|---|
postfix | String | No description | no default |
The identifier for this plugin is stripPostfix
.
It can be found in the package org.silkframework.rule.plugins.transformer.substring
.
Examples
Returns [value] for parameters [postfix -> Postfix] and input values [[valuePostfix]].
Returns [Value] for parameters [postfix -> Postfix] and input values [[Value]].
Strip prefix¤
Strips a prefix of a string.
Parameter | Type | Description | Default |
---|---|---|---|
prefix | String | No description | no default |
The identifier for this plugin is stripPrefix
.
It can be found in the package org.silkframework.rule.plugins.transformer.substring
.
Examples
Returns [Value] for parameters [prefix -> prefix] and input values [[prefixValue]].
Returns [ValueWithoutPrefix] for parameters [prefix -> prefix] and input values [[ValueWithoutPrefix]].
Strip URI prefix¤
Strips the URI prefix and decodes the remainder. Leaves values unchanged which are not a valid URI.
This plugin does not require any parameters.
The identifier for this plugin is stripUriPrefix
.
It can be found in the package org.silkframework.rule.plugins.transformer.substring
.
Examples
Returns [value] for parameters [] and input values [[http://example.org/some/path/to/value]].
Returns [value] for parameters [] and input values [[urn:scheme:value]].
Returns [encoded välue] for parameters [] and input values [[http://example.org/some/path/to/encoded%20v%C3%A4lue]].
Returns [value] for parameters [] and input values [[value]].
Substring¤
Returns a substring between ‘beginIndex’ (inclusive) and ‘endIndex’ (exclusive). If ‘endIndex’ is 0 (default), it is ignored and the entire remaining string starting with ‘beginIndex’ is returned. If ‘endIndex’ is negative, -endIndex characters are removed from the end.
Parameter | Type | Description | Default |
---|---|---|---|
beginIndex | int | The beginning index, inclusive. | 0 |
endIndex | int | The end index, exclusive. Ignored if set to 0, i.e., the entire remaining string starting with ‘beginIndex’ is returned. If negative, -endIndex characters are removed from the end | 0 |
stringMustBeInRange | boolean | If true, only strings will be accepted that are within the start and end indices, throwing a validating error if an index is out of range. | true |
The identifier for this plugin is substring
.
It can be found in the package org.silkframework.rule.plugins.transformer.substring
.
Examples
Returns [a] for parameters [beginIndex -> 0, endIndex -> 1] and input values [[abc]].
Returns [c] for parameters [beginIndex -> 2, endIndex -> 3] and input values [[abc]].
Returns [] for parameters [beginIndex -> 3, endIndex -> 3] and input values [[abc]].
Fails validation and thus returns [c] for parameters [beginIndex -> 2, endIndex -> 4] and input values [[abc]].
Returns [c] for parameters [beginIndex -> 2, endIndex -> 4, stringMustBeInRange -> false] and input values [[abc]].
Returns [] for parameters [beginIndex -> 10, endIndex -> 20, stringMustBeInRange -> false] and input values [[abc]].
Returns [ab] for parameters [beginIndex -> 0, endIndex -> -1] and input values [[abc]].
Returns [bc] for parameters [beginIndex -> 1, endIndex -> 0] and input values [[abc]].
Trim¤
Remove leading and trailing whitespaces.
This plugin does not require any parameters.
The identifier for this plugin is trim
.
It can be found in the package org.silkframework.rule.plugins.transformer.normalize
.
Until character¤
Extracts the substring until the character given.
Parameter | Type | Description | Default |
---|---|---|---|
untilCharacter | char | No description | no default |
The identifier for this plugin is untilCharacter
.
It can be found in the package org.silkframework.rule.plugins.transformer.substring
.
Examples
Returns [ab] for parameters [untilCharacter -> c] and input values [[abcde]].
Returns [abab] for parameters [untilCharacter -> c] and input values [[abab]].
Template¤
Evaluate template¤
Evaluates a template. Input values can be addressed using the variables ‘input1’, ‘input2’, etc.
Parameter | Type | Description | Default |
---|---|---|---|
template | MultilineStringParameter | The template | no default |
language | Enum | The template language. Currently, Jinja is supported. | Jinja |
The identifier for this plugin is TemplateTransformer
.
It can be found in the package com.eccenca.di.templating.operators
.
Examples
Returns [Hello John Doe,
How are you today?] for parameters [template -> Hello {{input1}} {{input2}},
How are you today?] and input values [[John], [Doe]].
Fails validation and thus returns [] for parameters [template -> Hello {{badVariable}} {{input1}}] and input values [[John], [Doe]].
Fails validation and thus returns [] for parameters [template -> Hello {{input01}}] and input values [].
Fails validation and thus returns [] for parameters [template -> Hello {{input1}}] and input values [].
Returns [Hello AB] for parameters [template -> Hello {{input1}}] and input values [[A, B]].
Returns [Hello Bob, Eve, how are you doing?] for parameters [template -> Hello {% for value in input1 %}{{value}}, {% endfor %}how are you doing?] and input values [[Bob, Eve]].
Tokenization¤
Camel case tokenizer¤
Tokenizes a camel case string. That is it splits strings between a lower case characted and an upper case character.
This plugin does not require any parameters.
The identifier for this plugin is camelcasetokenizer
.
It can be found in the package org.silkframework.rule.plugins.transformer.tokenization
.
Examples
Returns [camel, Case, String] for parameters [] and input values [[camelCaseString]].
Returns [nocamelcase] for parameters [] and input values [[nocamelcase]].
Tokenize¤
Tokenizes all input values.
Parameter | Type | Description | Default |
---|---|---|---|
regex | String | The regular expression used to split values. | \s |
The identifier for this plugin is tokenize
.
It can be found in the package org.silkframework.rule.plugins.transformer.tokenization
.
Examples
Returns [Hello, World] for parameters [] and input values [[Hello World]].
Returns [.175, .050] for parameters [regex -> ,] and input values [[.175,.050]].
Validation¤
Validate date after¤
Validates if the first input date is after the second input date. Outputs the first input if the validation is successful.
Parameter | Type | Description | Default |
---|---|---|---|
allowEqual | boolean | Allow both dates to be equal. | false |
The identifier for this plugin is validateDateAfter
.
It can be found in the package org.silkframework.rule.plugins.transformer.validation
.
Examples
Fails validation and thus returns [] for parameters [] and input values [[2015-04-02], [2015-04-03]].
Returns [2015-04-04] for parameters [] and input values [[2015-04-04], [2015-04-03]].
Returns [2015-04-03] for parameters [allowEqual -> true] and input values [[2015-04-03], [2015-04-03]].
Fails validation and thus returns [] for parameters [allowEqual -> false] and input values [[2015-04-03], [2015-04-03]].
Validate date range¤
Validates if dates are within a specified range.
Parameter | Type | Description | Default |
---|---|---|---|
minDate | String | Earliest allowed date in YYYY-MM-DD | no default |
maxDate | String | Latest allowed data in YYYY-MM-DD | no default |
The identifier for this plugin is validateDateRange
.
It can be found in the package org.silkframework.rule.plugins.transformer.validation
.
Validate number of values¤
Validates that the number of values lies in a specified range.
Parameter | Type | Description | Default |
---|---|---|---|
min | int | Minimum allowed number of values | 0 |
max | int | Maximum allowed number of values | 1 |
The identifier for this plugin is validateNumberOfValues
.
It can be found in the package org.silkframework.rule.plugins.transformer.validation
.
Examples
Returns [value1] for parameters [min -> 0, max -> 1] and input values [[value1]].
Fails validation and thus returns [] for parameters [min -> 0, max -> 1] and input values [[value1, value2]].
Validate numeric range¤
Validates if a number is within a specified range.
Parameter | Type | Description | Default |
---|---|---|---|
min | double | Minimum allowed number | no default |
max | double | Maximum allowed number | no default |
The identifier for this plugin is validateNumericRange
.
It can be found in the package org.silkframework.rule.plugins.transformer.validation
.
Validate regex¤
Validates if all values match a regular expression.
Parameter | Type | Description | Default |
---|---|---|---|
regex | String | regular expression | \w* |
The identifier for this plugin is validateRegex
.
It can be found in the package org.silkframework.rule.plugins.transformer.validation
.
Value¤
Constant¤
Generates a constant value.
Parameter | Type | Description | Default |
---|---|---|---|
value | String | The constant value to be generated | empty string |
The identifier for this plugin is constant
.
It can be found in the package org.silkframework.rule.plugins.transformer.value
.
Constant URI¤
Generates a constant URI.
Parameter | Type | Description | Default |
---|---|---|---|
value | Uri | The constant URI to be generated |
The identifier for this plugin is constantUri
.
It can be found in the package org.silkframework.rule.plugins.transformer.value
.
Current date¤
Outputs the current date.
This plugin does not require any parameters.
The identifier for this plugin is currentDate
.
It can be found in the package org.silkframework.rule.plugins.transformer.date
.
Dataset parameter¤
Reads a meta data parameter from a dataset in Corporate Memory. If authentication is enabled, workbench.superuser must be configured.
Parameter | Type | Description | Default |
---|---|---|---|
project | ProjectReference | The project of the dataset. | cmem |
dataset | TaskReference | The dataset the meta data parameter is read from. | no default |
key | String | No description | no default |
lang | String | No description | empty string |
The identifier for this plugin is datasetParameter
.
It can be found in the package com.eccenca.di.workflow.operators.datasetParam
.
Default Value¤
Generates a default value, if the input values are empty. Forwards any non-empty values.
Parameter | Type | Description | Default |
---|---|---|---|
value | String | The default value to be generated, if input values are empty | default |
The identifier for this plugin is defaultValue
.
It can be found in the package org.silkframework.rule.plugins.transformer.value
.
Examples
Returns [input value] for parameters [] and input values [[input value]].
Returns [default value] for parameters [value -> default value] and input values [[]].
Empty value¤
Generates an empty value value.
This plugin does not require any parameters.
The identifier for this plugin is emptyValue
.
It can be found in the package org.silkframework.rule.plugins.transformer.value
.
Random number¤
Generates a set of random numbers.
Parameter | Type | Description | Default |
---|---|---|---|
min | double | The smallest number that could be generated. | 0.0 |
max | double | The largest number that could be generated. | 100.0 |
minCount | int | The minimum number of values to generate in each set. | 1 |
maxCount | int | The maximum number of values to generate in each set. | 1 |
The identifier for this plugin is randomNumber
.
It can be found in the package org.silkframework.rule.plugins.transformer.value
.
Read parameter¤
Reads a parameter from a Java Properties file.
Parameter | Type | Description | Default |
---|---|---|---|
resource | Resource | The Java properties file to read the parameter from. | no default |
parameter | String | The name of the parameter. | no default |
The identifier for this plugin is readParameter
.
It can be found in the package org.silkframework.plugins.transformer.value
.
UUID¤
Generates UUIDs. If no input value is provided, a random UUID (type 4) is generated using a cryptographically strong pseudo random number generator. If input values are provided, a name-based UUID (type 3) is generated for each input value. Each input value will generate a separate UUID. For building a UUID from multiple inputs, the Concatenate operator can be used.
This plugin does not require any parameters.
The identifier for this plugin is uuid
.
It can be found in the package org.silkframework.rule.plugins.transformer.value
.
Examples
Returns [cee963a2-8f70-3e97-b51a-85ef732e66dd] for parameters [] and input values [[input value]].
Returns [690802dd-a317-335f-807c-e4e1e32b7b5b, 925cbd7f-377b-3fbd-8f4c-ca41529b74ad] for parameters [] and input values [[üöä!, êéè]].
Aggregations¤
The following aggregation functions are available:
Average¤
Computes the weighted average.
This plugin does not require any parameters.
The identifier for this plugin is average
.
It can be found in the package org.silkframework.rule.plugins.aggegrator
.
Geometric mean¤
Compute the (weighted) geometric mean.
This plugin does not require any parameters.
The identifier for this plugin is geometricMean
.
It can be found in the package org.silkframework.rule.plugins.aggegrator
.
Handle missing values¤
Generates a default similarity score, if no similarity score is provided (e.g., due to missing values). Using this operator can have a performance impact, since it lowers the efficiency of the underlying computation.
Parameter | Type | Description | Default |
---|---|---|---|
defaultValue | double | The default value to be generated, if no similarity score is provided. Must be a value between -1 (inclusive) and 1 (inclusive). ‘1’ represents boolean true and ‘-1’ represents boolean false. | -1.0 |
The identifier for this plugin is handleMissingValues
.
It can be found in the package org.silkframework.rule.plugins.aggegrator
.
Or¤
At least one input score must be within the threshold. Selects the maximum score.
This plugin does not require any parameters.
The identifier for this plugin is max
.
It can be found in the package org.silkframework.rule.plugins.aggegrator
.
And¤
All input scores must be within the threshold. Selects the minimum score.
This plugin does not require any parameters.
The identifier for this plugin is min
.
It can be found in the package org.silkframework.rule.plugins.aggegrator
.
Negate¤
Negates the result of the input comparison. A single input is expected. Using this operator can have a performance impact, since it lowers the efficiency of the underlying computation.
This plugin does not require any parameters.
The identifier for this plugin is negate
.
It can be found in the package org.silkframework.rule.plugins.aggegrator
.
Euclidian distance¤
Calculates the Euclidian distance.
This plugin does not require any parameters.
The identifier for this plugin is quadraticMean
.
It can be found in the package org.silkframework.rule.plugins.aggegrator
.
Scale¤
Scales the result of the first input. All other inputs are ignored.
Parameter | Type | Description | Default |
---|---|---|---|
factor | double | All input similarity values are multiplied with this factor. | 1.0 |
The identifier for this plugin is scale
.
It can be found in the package org.silkframework.rule.plugins.aggegrator
.