This page describes proven deployment scenarios for eccenca Corporate Memory.

All Corporate Memory components are distributed as Docker images and can be obtained from eccenca’s Artifactory service. To run them you need a Docker enabled Linux server. In addition to that, eccenca provides distribution archives for all components which contain configuration examples (YAML) as well as JAR/WAR artifacts.

Operating Systems (OS)

Corporate Memory is tested on Ubuntu 18.04 (backward compatible with 16.04 and 14.04) and RHEL 7.7.

Special note on RHEL SELinux Support: there is no limitation for RedHat SELinux. We recommend to keep the SELinux in enforced mode. You can keep the default setting of the /etc/selinux/config file.

sample /etc/selinux/config

# This file controls the state of SELinux on the system.
# SELINUX= can take one of these three values:
#     enforcing - SELinux security policy is enforced.
#     permissive - SELinux prints warnings instead of enforcing.
#     disabled - No SELinux policy is loaded.
SELINUX=enforcing
# SELINUXTYPE= can take one of three values:
#     targeted - Targeted processes are protected,
#     minimum - Modification of targeted policy. Only selected processes are protected.
#     mls - Multi Level Security protection.
SELINUXTYPE=targeted
BASH

Docker-compose based Orchestration deployment

Docker Compose is a convenient way to provision several Docker containers locally for development setups or on remote servers for single node setups.

eccenca is heavily using Docker Compose for all kinds of internal and customer deployments. For more details on how to use docker-compose based orchestration refer to Scenario: Local Installation and Scenario: Single Node Cloud Installation.

Running DataIntegration on a Spark Cluster

eccenca DataIntegration supports the execution of DataIntegration workflows in a cluster environment with Apache Spark.

Prerequisites

For the execution of DataIntegration in a Spark cluster the following software components from the Hadoop eco-system are recommended:

  • Scala 2.11 or 2.10
  • Apache Spark 2.1.2 (compiled for Scala 2.11)
  • Apache Hadoop 2.7 (HDFS)
  • Apache Hive 1.2, with a relational data bases as meta store (e.g. Derby)

Recent versions of the following Hadoop distributions are generally supported as well:

  • Hortonworks (HDP 2.5)
  • Cloudera (CDH 5.8)
  • Oracle Big Data Lite (4.6)
  • Microsoft HDInsight (based on HDP)

Installation

A Spark application can run in three different modes:

  • local mode
  • client mode
  • cluster mode

The local mode is for running Spark applications on one local machine. In the client mode the DataIntegration application will run outside of the cluster and create Spark Jobs to be executed in the cluster at run time. The cluster mode requires that the application using Spark runs completely in the cluster and is managed by the software running on the cluster (e.g. Spark, Apache Yarn, Mesos). DataIntegration supports local mode (for testing), client mode (for production, only with clusters managed by Spark) or cluster mode on Yarn (for production, integrates best with other distributed applications).

When running DataIntegration in a cluster, the same installation procedure and prerequisites apply as for the local installation. The application can be installed outside the cluster or on any cluster node. A number of configuration options have to be set to be able to connect to and use a Spark cluster. The necessary configuration options are described in DataIntegration.