Create Custom Transformations with Python¤
Caution
This section describes the obsolete Python 2 plugin system. We recommend to migrate to the new Python 3 plugin system: Python Plugins
Introduction¤
Beside the fact that there are over 180 built-in operators available for your data transformations, there will be the moment where you need a new special operator to solve a specific problem which can’t be solved with the built-ins or which is just easier to solve when you simply program it.This page gives an overview on the Script Transform Operator and how to use it to create Python based custom transformations.
General Working Model¤
The python script operator has two parameters: A multi-text field for the script and a function field for the name of the function to be executed. The operator performs the following two steps:
- first, it loads the script
- then, it executes the function for each “row” of the transformation.
Each input value is always given as an array of strings (such as [“Eve”, “Alice”, “Bob”], more specifically - as an instance of org.python.core.PyArray). If there are no values for the current iteration, an empty array is given. The number of input arrays (of strings) depends on the number of incoming connections in the transformation operator. These connections are ordered, means the first connected building block delivers parameter one (as an array of strings), the second building block delivers parameter two (as an array of strings), etc.
In the same way as the input parameters, the result value should be a list of strings. The operator will try its best to map whatever is returned to a proper list of strings but this could fail, so don’t try it too hard …
Preliminaries¤
Enabling the script operators¤
Because the script operators allow potentially unsafe operations (such as writing to the file system), they are disabled by default. In order to use those plugins, they need to be enabled explicitly in the config:
The following script operators are available:
python2Script
: Python 2 transform operator.scalaScript
: Scala transform operator.script
: Scala script operator to be used in workflows.
Using Python libraries¤
External python libraries can be configured and will be loaded from the the following folder by default:
com.eccenca.di.scripting.transformer.Python2ScriptTransformer = {
modulePath = ${elds.home}"/etc/dataintegration/pythonModules/"
}
The configured modulePath will be added to the Python sys.path
.
Parameter Validation¤
Some parameter value validation should be done inside the defined function. This includes
- test how many strings are in the list
- test if these strings have a specific format
Error Handling and Logging¤
Syntax errors are instantly shown in the transformation editor while execution errors or exceptions are shown in the evaluation and execution report. Both type of errors are also logged.
In addition to error logging, the function can create print-output which is added to the logging as well.
Example: Days between Two Dates¤
The following well commented and very verbose code example calculates the difference between two dates and returns the number of dates as a result:
Based on this example the following pytest test suite is “green” (given here for clarification of the operator behaviour):
This well tested operator can now be used in your transformation (left: the transformation flow, right: the evaluation report)
Special environment variables¤
A number of useful variables are injected and can be accessed from the Python script as follows:
The following variables are available:
- CMEM_BASE_URI: The base URI of the current CorporateMemory deployment.
- OAUTH_ACCESS_TOKEN: The current super user token. Note that this is only available, if a super user is configured.
- OAUTH_GRANT_TYPE: The corresponding OAuth grant type. Set to:
prefetched_token