Administer Big Data Extensions
Overview
KNIME Big Data Extensions integrate Apache Spark and the Apache Hadoop ecosystem with KNIME Analytics Platform.
This guide supports IT professionals who need to connect KNIME Analytics Platform to an existing Hadoop or Spark environment.
Complete the steps in this guide so that users of KNIME Analytics Platform can run Spark workflows. Running Spark workflows on KNIME Server requires additional steps described in the Secured Cluster Connection Guide for KNIME Server.
Overall architecture

The KNIME Extension for Apache Spark requires Apache Livy to run as a REST service on an edge or front-end node of the cluster. Review the compatibility lists below and learn how to install Livy.
Compatibility
General compatibility
The KNIME Extension for Apache Spark is compatible with:
- Spark 2.x - 3.5
- Livy 0.4 - 0.7
Cloudera CDP compatibility
The KNIME Extension for Apache Spark is compatible with:
- Spark 3.3 on CDP 7.1.8 as provided by Cloudera CDS 3.3
- Spark 3.2 on CDP 7.1.7 as provided by Cloudera CDS 3.2
- Spark 3.1 on CDP 7.1.6 as provided by Cloudera CDS 3.1
- Spark 3.0 on CDP 7.1.5 as provided by Cloudera CDS 3.0
- Spark 2.4 as included in CDP 7
Cloudera CDH compatibility
The KNIME Extension for Apache Spark is compatible with:
- Spark 2.x on CDH 5 as provided by Cloudera CDS
- Spark 2.x as included in CDH 6
WARNING
Cloudera CDH 5 and 6 do not include Livy. KNIME therefore provides CSDs and parcels for Livy. See the Cloudera CDH section for download and installation details.
Cloudera HDP compatibility
The KNIME Extension for Apache Spark is compatible with:
- Spark 2.x as included in HDP 2.6.3 - 2.6.5
- Spark 2.x as included in HDP 3.0.0 - 3.1.5
Amazon EMR compatibility
The KNIME Extension for Apache Spark is compatible with:
- EMR 7.x with Spark 3.5 and Livy 0.7 - 0.8
- EMR 6.x with Spark 3.x and Livy 0.6 - 0.7
- EMR 5.9+ with Spark 2.x and Livy 0.4 - 0.7
Phase Out of Support for H2O Sparkling Water
Starting with KNIME Analytics Platform version 5.3, support for H2O Sparkling Water is being phased out. Upcoming Spark versions will not receive updates. Sparkling Water is no longer supported on clusters with Spark 3.4 or higher (for example, Databricks). The Create Local Big Data Environment node remains available for now, but support will be removed soon.
Apache Livy Setup
Cloudera CDP
Cloudera Runtime 7.0 - 7.1 includes Spark 2.4 and Livy as a service. Cloudera provides Spark 3.x as a Custom Service Descriptor that can coexist with the included Spark version. See Installing CDS Powered by Apache Spark in the CDS 3.3, CDS 3.2, CDS 3.1, or CDS 3.0 documentation for more information. Livy in the Spark 3.x CSD uses port 28998 instead of the default 8998.
TIP
If you plan to run Spark workflows on KNIME Server, review the Secured Cluster Connection Guide for KNIME Server to enable user impersonation.
Cloudera CDH
KNIME provides a CSD and parcel for Cloudera CDH so that you can install Livy as an add-on service. The current Livy version is {livy-release}.
TIP
The following steps describe how to install Livy as a managed service through Cloudera Manager using a parcel. Consult the official Cloudera parcel documentation if you need additional guidance.
Prerequisites
- A cluster with CDH 5.8 or newer, or CDH 6.0 or newer
- Only on CDH 5: Spark 2.2 or higher installed as an add-on service (provided by Cloudera CDS)
- Root shell access (for example, via SSH) on the machine that hosts Cloudera Manager
- Full administrative access to the Cloudera Manager web UI
Installation steps
In a root shell on the machine where Cloudera Manager is installed:
Download a matching CSD from CSDs for Cloudera CDH to
/opt/cloudera/csd/.If Cloudera Manager cannot access the public internet, copy the matching
.parceland.sha1files from Parcels for Cloudera CDH to/opt/cloudera/parcel-repo.Restart Cloudera Manager, for example:
bashsystemctl restart cloudera-scm-server
In the Cloudera Manager web UI:
- Open the Parcel manager and locate the
LIVYparcel. - Download (if needed), distribute, and activate the
LIVYparcel. - Add the Livy service to your cluster. See the Cloudera documentation on adding services.
- In the HDFS service configuration, add the following settings to the Cluster-wide Advanced Configuration Snippet (Safety Valve) for core-site.xml:
hadoop.proxyuser.livy.hosts=*hadoop.proxyuser.livy.groups=*
- If your cluster uses HDFS Transparent Encryption, add the following settings to the Key Management Server Advanced Configuration Snippet (Safety Valve) for kms-site.xml:
hadoop.kms.proxyuser.livy.hosts=*hadoop.kms.proxyuser.livy.groups=*
- If you plan to run Spark workflows on KNIME Server, review the Secured Cluster Connection Guide for KNIME Server to enable user impersonation.
- Restart every service affected by your configuration changes.
Cloudera HDP
Hortonworks Data Platform already includes compatible versions of Apache Livy and Spark 2. Review the Cloudera HDP compatibility section for supported versions. Follow the Hortonworks documentation to install Spark with the Livy for Spark2 Server component:
::: note The KNIME Extension for Apache Spark only supports Livy for Spark2 Server, which uses Spark 2. The Livy for Spark Server component is not supported because it relies on Spark 1. :::
Amazon EMR
Amazon EMR already includes compatible versions of Apache Livy and Spark 2. Confirm the supported versions in the Amazon EMR compatibility section and select Livy in the software configuration of your cluster.
Downloads
Apache Livy downloads
CSDs for Cloudera CDH
Parcels for Cloudera CDH
CDH 5
| Platform | Parcel | SHA |
|---|---|---|
| RHEL 7 | parcel | sha |
| RHEL 6 | parcel | sha |
| RHEL 5 | parcel | sha |
| SLES 12 | parcel | sha |
| SLES 11 | parcel | sha |
| Ubuntu 16 (Xenial) | parcel | sha |
| Ubuntu 14 (Trusty) | parcel | sha |
| Ubuntu 12 (Precise) | parcel | sha |
| Debian 8 (Jessie) | parcel | sha |
| Debian 7 (Wheezy) | parcel | sha |
CDH 6
| Platform | Parcel | SHA |
|---|---|---|
| RHEL 7 | parcel | sha |
| RHEL 6 | parcel | sha |
| RHEL 5 | parcel | sha |
| SLES 12 | parcel | sha |
| SLES 11 | parcel | sha |
| Ubuntu 16 (Xenial) | parcel | sha |
| Ubuntu 14 (Trusty) | parcel | sha |
| Ubuntu 12 (Precise) | parcel | sha |
| Debian 8 (Jessie) | parcel | sha |
| Debian 7 (Wheezy) | parcel | sha |