Download file¤
1. Purpose¤
The Download File operator downloads a single file from a remote URL and exposes it as a file output that can be consumed by downstream operators.
Typical use cases:
- Importing external datasets into a workflow.
- Downloading configuration files or archives.
- Fetching files from internal HTTP endpoints.
2. Input and output¤
Input¤
- Inputs: none
The operator does not consume any upstream entities.
Output¤
- Output: one file
- A single file is created from the HTTP response body.
- The file is exposed as a file output that downstream operators can read.
- The MIME type is taken from the HTTP response
Content-Typeif available.
3. Configuration notes¤
The detailed list of parameters is shown in the UI (auto-generated). This section explains how to think about the most important ones.
-
URL
- Remote endpoint from which the file is downloaded.
- The operator is intended for
http://andhttps://URLs.
-
Timeouts
- Request timeout (in milliseconds): upper bound for the whole request, including streaming the response.
- Connection timeout: how long to wait for the TCP connection to be established.
- Read timeout: maximum idle time during data transfer (no bytes sent or received).
- For large files or slow servers, increase the request/read timeouts as needed.
-
HTTP headers
- One header per line, curl-style syntax:
Header-Name: Header-Value. - Headers are sent as-is with the request.
- One header per line, curl-style syntax:
-
Authorization header / value
- Authorization header: name of the header (e.g.
Authorization,Proxy-Authorization). - Authorization value: header value (e.g.
Bearer <access-token>). - The value is stored encrypted in the backend.
- Authorization header: name of the header (e.g.
4. Behaviour¤
When executed, the operator:
- Builds an HTTP request from the configured parameters.
- Sends an HTTP GET request to the configured URL.
- Streams the response body into a temporary file on disk.
- Emits that file as the operator’s output.
- Reports the execution via the standard task/execution reporting mechanisms.
File handling:
- The file is created with a temporary name.
- The file extension is determined as follows:
- For ZIP-like content types (e.g.
application/zip), the extension.zipis used. - Otherwise, the extension is derived from the URL path if possible.
- If no extension can be determined, a generic fallback (for example
.tmp) is used.
- For ZIP-like content types (e.g.
Only a single file is produced per execution. If the request fails, no file is emitted.
5. Supported URLs and protocols¤
The operator sends an HTTP request to the configured URL.
-
Intended behaviour
- Use
http://andhttps://URLs. - These are the supported schemes for downloading files.
- Use
-
Other schemes (e.g.
ftp://)- Not supported.
- Using non-HTTP(S) URLs may result in an error and should not be relied on.
If reliable FTP or other protocols are needed, they should be handled by a dedicated operator or external tooling.
6. Error handling and failure modes¤
Typical failure scenarios:
- Invalid URL / DNS / connection issues
- The operator fails the execution; no file is produced.
- Non-2xx HTTP status codes (e.g. 404, 500)
- The request fails and the file is not created.
- Timeouts
- If connection or read timeouts are exceeded, the request is aborted.
- Streaming / I/O errors
- If writing to the temporary file fails, the execution fails and the partially written file is not exposed.
Errors are reported via the standard task and execution reporting mechanisms.
7. Examples¤
7.1 Simple HTTP download¤
- URL:
https://example.com/data.csv - Parameters:
- Accept header:
text/csv(optional) - Timeouts: defaults or slightly increased for large files.
- Accept header:
Result:
- One file containing the downloaded CSV data, which can be passed to downstream file-processing operators.
7.2 Authenticated download¤
- URL:
https://internal.example.com/report.json - Parameters:
- Authorization header:
Authorization - Authorization value:
Bearer <access-token>
- Authorization header:
Result:
- One JSON file containing the report, assuming the token is valid and the server returns a successful response.
Parameter¤
URL¤
The URL of the file to be downloaded.
- ID:
url - Datatype:
string - Default Value:
None
Accept¤
The accept header String.
- ID:
accept - Datatype:
string - Default Value:
None
Request timeout¤
Request timeout in ms. The overall maximum time the request should take.
- ID:
requestTimeout - Datatype:
int - Default Value:
10000
Connection timeout¤
Connection timeout in ms. The time until which a connection with the remote end must be established.
- ID:
connectionTimeout - Datatype:
int - Default Value:
5000
Read timeout¤
Read timeout in ms. The max. time a request stays idle, i.e. no data is send or received.
- ID:
readTimeout - Datatype:
int - Default Value:
10000
HTTP headers¤
Configure additional HTTP headers. One header per line. Each header entry follows the curl syntax.
- ID:
httpHeaders - Datatype:
multiline string - Default Value:
None
Authorization header¤
The authorization header. This is usually either ‘Authorization’ or ‘Proxy-Authorization’If left empty, no authorization header is sent.
- ID:
authorizationHeader - Datatype:
string - Default Value:
None
Authorization header value¤
The authorization header value. Usually this has the form ‘type secret’, e.g. for OAuth bearer <insert secret access token>.This config parameter will be encrypted in the backend.
- ID:
authorizationHeaderValue - Datatype:
password - Default Value:
None
Advanced Parameter¤
None