Installation¤
This page describes proven deployment scenarios for eccenca Corporate Memory.
All Corporate Memory components are distributed as Docker images and can be obtained from eccenca’s Artifactory service. To run them you need a Docker enabled Linux server. In addition to that, eccenca provides distribution archives for all components which contain configuration examples (YAML) as well as JAR/WAR artifacts.
Operating Systems (OS)¤
Corporate Memory is tested on Ubuntu 18.04 (backward compatible with 16.04 and 14.04) and RHEL 7.7.
Special note on RHEL SELinux Support: there is no limitation for RedHat SELinux. We recommend to keep the SELinux in enforced mode. You can keep the default setting of the /etc/selinux/config
file.
sample config
Docker-compose based Orchestration deployment¤
Docker Compose is a convenient way to provision several Docker containers locally for development setups or on remote servers for single node setups.
eccenca is heavily using Docker Compose for all kinds of internal and customer deployments. For more details on how to use docker-compose based orchestration refer to Scenario: Local Installation and Scenario: Single Node Cloud Installation.
DataIntegration¤
Running on a Spark Cluster¤
eccenca DataIntegration supports the execution of DataIntegration workflows in a cluster environment with Apache Spark.
Prerequisites¤
For the execution of DataIntegration in a Spark cluster the following software components from the Hadoop eco-system are recommended:
- Scala 2.11 or 2.10
- Apache Spark 2.1.2 (compiled for Scala 2.11)
- Apache Hadoop 2.7 (HDFS)
- Apache Hive 1.2, with a relational data bases as meta store (e.g. Derby)
Recent versions of the following Hadoop distributions are generally supported as well:
- Hortonworks (HDP 2.5)
- Cloudera (CDH 5.8)
- Oracle Big Data Lite (4.6)
- Microsoft HDInsight (based on HDP)
Installation ¤
A Spark application can run in three different modes:
- local mode
- client mode
- cluster mode
The local mode is for running Spark applications on one local machine. In the client mode the DataIntegration application will run outside of the cluster and create Spark Jobs to be executed in the cluster at run time. The cluster mode requires that the application using Spark runs completely in the cluster and is managed by the software running on the cluster (e.g. Spark, Apache Yarn, Mesos). DataIntegration supports local mode (for testing), client mode (for production, only with clusters managed by Spark) or cluster mode on Yarn (for production, integrates best with other distributed applications).
When running DataIntegration in a cluster, the same installation procedure and prerequisites apply as for the local installation. The application can be installed outside the cluster or on any cluster node. A number of configuration options have to be set to be able to connect to and use a Spark cluster. The necessary configuration options are described in DataIntegration.
DataPlatform¤
Scaling¤
Run multiple DataPlatform instances with the same configuration to enable high-availability and/or high-performance setups.
Prerequisites ¤
For running multiple DataPlatform instances the following prerequisites apply:
- The same application configuration properties must be used by all scaled instances.
- If access control for any SPARQL endpoint is active, a shared Redis cache used by all DataPlatform instances is required.
Limitations¤
When running multiple DataPlatform instances it is not possible to use a shared Virtuoso backend with provisioned access control active.
Troubleshooting¤
In case DataPlatform failed to start, check the logs for error messages pointing to faulty parameters in the configuration. Since not every faulty behavior is apparent from reading the logs, the following checks can help you to verify the configuration:
- Check the
http(s)://<servername:port>/actuator/health/
endpoint to verify if the SPARQL proxy service endpoints are configured properly.
Note: Refer to the Spring documentation on how to set active profiles.
Plugins¤
In some cases DataPlatform needs to be extended with plugins. Extensions are necessary when drivers cannot be included due to licensing restrictions or when plugins are delivered separately.
In this case, you have to update the .war file of DataPlatform by placing the plugin .jar files in the same directory, or by stating the path via the configuration option.
To include plugins that are located in the same directory as the eccenca-DataPlatform.war
file, execute the .war file with the option -u
or --update-war
:
If the plugins to be included are not located in the same folder as the .war file, you can specify a directory containing the plugins as the argument of the -u
or --update-war
option.
The last command repackages the eccenca-DataPlatform.war
by including all plugins (.jar) located in the specified directory.
Note: Make sure that only the eccenca-DataPlatform.war
file is in the directory since multiple .war files can cause problems.
Note: During the update procedure, the directory WEB-INF
is created. Due to security concerns the update mechanism does not delete this directory. You can delete it after the update process is finished.