Introduction

KNIME Business Hub is a customer-managed KNIME Hub instance.

Once you have a license for it and proceed with installation you will have access to Hub resources and will be able to customize specific features, as well as give access to these resources to your employees, organize them into Teams and give them the ability to manage specific resources.

Once you have access to a KNIME Business Hub instance available at your company, you can use KNIME Business Hub to perform a number of tasks such as:

  • collaborate with your colleagues,

  • test execution of workflows,

  • create and share data apps, schedules, and API services

  • keep track of changes with versioning.

The following is a guide for installing KNIME Business Hub into a single-node Kubernetes cluster running on a supported Linux distribution.

To administrate a KNIME Business Hub instance please refer instead to the following guide:

Software Prerequisites

  • kubectl: only required if installing into an existing cluster, or when remotely managing a cluster. When installing the embedded cluster with kURL kubectl is automatically installed on the host machine.

  • Helm: only required if uninstalling KNIME Business Hub.

Hardware Prerequisites

This guide covers the installation of KNIME Business Hub into a single-node cluster. For this, a single Ubuntu instance is required with certain port(s) exposed to inbound traffic.

Network ports for SSH, kubectl, and KOTS Admin Console should only be exposed to the IP address(es) of the system administrator(s) of the Ubuntu instance.

  • Operating System

    • Ubuntu Server 20.04 LTS

    • Ubuntu Server 22.04 LTS

    • RHEL 8.1, 8.2, 8.3, 8.4, 8.5, 8.6, 8.7

    • Amazon Linux 2

  • Resources

    • Single Node Installation

    • Highly-Available Multinode Installation (Three or more instances)

      • CPU Cores: 8+ CPU cores per instance

      • Memory: 16GB+ RAM per instance

      • Disk: 100GB+ per instance for the root volume

      • Additional Attached Disks: 1 or more additional, attached, unformatted disks are required for multinode installations to handle data replication between nodes

      • See the advanced install options for configuring highly available clusters and installing on instances with smaller sized root volumes

  • Network Ports

    • 80 (HTTP)

    • 443 (HTTPS)

    • 22 (SSH) ADMIN USE ONLY

    • 6443 (kubectl) ADMIN USE ONLY

    • 8800 (KOTS Admin Console) ADMIN USE ONLY

Security Warnings:

  • Ports 22, 6443, and 8800 are vulnerable access points for a KNIME Hub installation. If a malicious actor gained access to any of those ports, they would be able to perform destructive actions on the cluster and/or gain access to sensitive configuration. Access to these ports must be restricted to only the IP address(es) of the machine(s) which will administer the installation.

  • Security-Enhanced Linux (SELinux) is not currently supported. If enabled, the installer script will notify the user via a prompt and disable SELinux before proceeding.

Networking Prerequisites

The following domains need to be accessible from servers performing online installations:

Trusted Host Domain

KNIME

*.knime.com

Replicated

See the Firewall Openings for Online Installations guide.

KNIME Business Hub Installation

For the commands demonstrated below, replace anything shown in <brackets> with real values.

Connect to your Ubuntu instance

The first step is to connect to your Ubuntu instance and update it. If you are connecting via SSH, ensure that the machine you are using is permitted to connect to port 22 of the instance. Also ensure that the user you connect to the instance with has permissions to run commands as the superuser (i.e. sudo).

# Connect to your Ubuntu instance. This process/command may differ.
ssh -i "some-identity-key.pem" ubuntu@<instance-ip-address>

# Update the Ubuntu instance.
sudo apt-get update && sudo apt-get upgrade

Install the embedded cluster for KNIME Business Hub

The command below executes a hands-free installation of all of the supporting Kubernetes architecture required to host KNIME Business Hub. It will take circa 10-15 minutes to run in its entirety and will output a significant amount of logs as the process installs all necessary dependencies.

curl -sSL https://kurl.sh/knime-hub | sudo bash

For more advanced installation options with kURL please consult the kURL documentation. Please note: if you execute this command with any additional flags or environment variables set then please note them down in a document. The same flags and environment variables need to be present again when you update the kubernetes cluster version or KOTS Admin Console.

Once the process is complete, you should see something similar to the following output. This output contains very important URLs, usernames, passwords, and commands for your instance. Ensure that you save this output somewhere secure before proceeding.

image1

Access the KNIME Business Hub Admin Console

Navigate to the KOTS Admin Console URL provided in the embedded cluster installation output and take note of the password.

image2

The first page that will display is a warning regarding Transport Layer Security (TLS) configuration. Follow the on-screen instructions to proceed.

image3

You will then be prompted to provide your own TLS cert to secure traffic to the admin console, if desired.

image4

You should then see a prompt for a password. Enter the admin console password from the embedded cluster installation output to proceed (this password can be changed later).

image5

Provide a Replicated .yaml license file

After logging in, you should be prompted for a license file. This is the Replicated license file that your KNIME customer care representative has provided to you and has a .yaml extension. Please contact your customer care representative if you need assistance with your license.

image6

Configure the installation

If all prior steps were successful, you should now be prompted to configure your KNIME Business Hub installation. A number of settings will display for you to customize. Please note that all configuration settings in this view can be changed post-installation, except for the settings under “Initialization of KNIME Business Hub”.

image8

Provide a KNIME Business Hub .xml license file

In the "Global" section you can choose your KNIME Businesse Hub Deployment Name and Mountpoint ID, or leave the default values. Here you are also required to upload your KNIME Business Hub license. This is a different file than the Replicated .yaml license file. The KNIME Business Hub license file is a .xml file that is also provided to you by your KNIME customer care representative.

Configure KNIME Business Hub URLs

URLs for KNIME Business Hub need to have the structure of:

  • Base URL

    • <base-url> (e.g. hub.example.com).

    • The URL scheme (http:// or https://) should not be included in the Base URL.

    • The <base-url> must include the top-level domain (e.g. .com), and cannot be an IP address.

    • This is the URL you use to view the KNIME Business Hub in your browser.

    • Valid examples:

      • hub.example.com

      • example.com

    • Invalid examples:

  • Subdomains

    • api.<base-url>

    • apps.<base-url>

    • auth.<base-url>

    • storage.<base-url>

The Base URL is the only URL that can be customized. The rest of the URLs are generated automatically.

image9

If you are testing KNIME Business Hub without DNS configured, it is recommended to create /etc/hosts entries on your local machine pointing to the public IPv4 address of the instance running the cluster. This will redirect traffic from your local machine to the appropriate IPv4 address when you enter URLs such as http://hub.example.com/ into your browser.

Notice that the values in /etc/hosts below are for hub.example.com. The values must match the config in the URLs section of the Config tab in the KNIME Business Hub Admin Console, as demonstrated above. You can always use hub.example.com as the Base URL for local test installations.

<public ip> hub.example.com
<public ip> api.hub.example.com
<public ip> auth.hub.example.com
<public ip> storage.hub.example.com
<public ip> apps.hub.example.com

On Windows machines you can find the /etc/hosts file in <windows dir>\system32\drivers\etc\hosts.

Initialization of KNIME Business Hub

During the very first installation of KNIME Business Hub a number of one-time initializations are made, like creating an admin user, team, space, and execution context. Changing fields in this section after installation won’t have any effect on the deployed application. The admin user can change these after the installation in the browser.

image10

The execution context has minimal resources (1CPU, 2GB memory) and a default executor provided by KNIME, to enable basic execution. For any production use of execution you should configure the execution context and assign more resources or use a different executor docker image.

Preflight checks

The final step before installing is the preflight checks, which is a set of automated tests to help identify if KNIME Business Hub is ready for installation. It will check the Kubernetes distribution, Kubernetes version, resources available to the cluster, and other mission-critical settings.

It is highly recommended to never skip the pre-flight checks during installation or upgrades.

image11

Wait for the installation to complete

If the preflight checks all passed and you opted to continue, the only thing left to do is wait for a few minutes until KNIME Hub finishes installing! You should see the installation turn the Ready status (top left) to green after 15-20 minutes.

If you cannot access the KNIME Business Hub Webapp URL after the Ready status has turned green, the first troubleshooting step would be to check the Config tab in the KNIME Business Hub Admin Console and ensure the URLs are configured properly.

image13

Navigating to the Webapp URL should display the KNIME Business Hub landing page.

image14

Post-installation steps

Connecting kubectl or other external tools to your cluster

Executing the following command on the Ubuntu instance in which KNIME Business Hub is installed will output the kubeconfig file which is required for accessing your cluster from another machine.

Sometimes the KUBECONFIG environment variable is not set automatically after installation. Running bash -l will reload the shell and likely solve the issue. Otherwise, you can run kubectl config view --raw which is equivalent to cat $KUBECONFIG.

cat $KUBECONFIG

Note that the .clusters[0].cluster.server property is almost certainly set to the private IPv4 address of the cluster (incorrect) and not the public IPv4 address (correct). Update the property to match the public IPv4 address of the Ubuntu instance hosting KNIME Business Hub.

apiVersion: v1
clusters:
- cluster:
    certificate-authority-data: ...
    server: https://<replace-with-public-ip>:6443
  name: kubernetes
contexts:
- context:
    cluster: kubernetes
    user: kubernetes-admin
  name: kubernetes-admin@kubernetes
current-context: kubernetes-admin@kubernetes
kind: Config
preferences: {}
users:
- name: kubernetes-admin
  user:
    client-certificate-data: ...
    client-key-data: ...

Version updates & rollbacks

If you save any changes in the Config tab of the KNIME Business Hub Admin Console, or check for updates and see a new version that you can upgrade to, then the new version will be visible in the Version history tab. New versions of KNIME Business Hub will not be deployed automatically unless automatic updates have been configured. Preflight checks will execute prior to deployment and the deployment itself can be triggered by clicking the Deploy button.

image15

User registration

After initial installation, start the process of creating the first user by clicking the Sign In button.

image16

Next, click the Register button to proceed with creating a new account. You will be prompted for user information and will be logged in automatically.

image17

Users can change their password by going to auth.<base-url>/auth/realms/knime/account (e.g. http://auth.hub.example.com/auth/realms/knime/account) and navigating to Account Security → Signing In.

image18

Keycloak setup (IDP)

You can manage your Keycloak setup by going to auth.<base-url>/auth/ (e.g. http://auth.hub.example.com/auth/), clicking Administration Console and logging in with the Keycloak admin credentials. These credentials are stored in a kubernetes secret called credential-knime-keycloak in the knime namespace.

For configuring your Keycloak setup, eg for adding User Federation, consult the Keycloak Server Administration Guide: https://www.keycloak.org/docs/12.0/server_admin/.

Notifications

In order to configure the Notification Service to send emails, you have to supply configuration properties in the Mail Server Configuration field in the KNIME Business Hub Config. The table below shows some of the possible options. The Notification Service uses Jakarta Mail, see the Jakarta Mail API documentation for all possible parameters.

Name Value

mail.from

Address from which all mails are sent, required

mail.smtp.host

SMTP server host address

mail.smtp.port

SMTP port, default 25

mail.smtp.auth

Set to true if the mail server requires authentication

mail.smtp.user

Username for SMTP authentication; optional

mail.password

Password for authentication; optional

mail.smtp.starttls.enable

If true, enables the use of the STARTTLS command (if supported by the server) to switch the connection to a TLS-protected connection before issuing any login commands.

mail.smtp.ssl.enable

If set to true, use SSL to connect and use the SSL port by default.

image19

Enabling custom logos and other branding options

You can change the name of your KNIME Business Hub deployment from the global settings.

image23

To enable other branding options for your KNIME Business Hub instance, find the "Branding" section below and enable them.

image24

If customizing the logo, the file being uploaded must be an .svg file in XML format such as the one below.

example.svg

<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<!DOCTYPE svg PUBLIC "-//W3C//DTD SVG 1.1//EN" "http://www.w3.org/Graphics/SVG/1.1/DTD/svg11.dtd">
<svg width="100%" height="100%" viewBox="0 0 183 48" version="1.1" xmlns="http://www.w3.org/2000/svg" xmlns:xlink="http://www.w3.org/1999/xlink" xml:space="preserve" xmlns:serif="http://www.serif.com/" style="fill-rule:evenodd;clip-rule:evenodd;stroke-linejoin:round;stroke-miterlimit:2;">
    <g transform="matrix(0.673983,0,0,0.673983,-2.4399,8.02946)">
        <text x="6.739px" y="43.245px" style="font-family:'Arial-BoldMT', 'Arial', sans-serif;font-weight:700;font-size:54.619px;">EXAMPLE</text>
    </g>
</svg>

Once the configuration changes to the logo have been applied and deployed, the KNIME Business Hub webapp should automatically restart with the new branding configuration.

image26

Restart a node

Before rebooting a node please call the shutdown script suggested here on the node.

Otherwise, after a VM restart old pods might be in Failed or Shutdown state. If that is the case please delete the failed pods after the restart with the following command:

kubectl delete pod --field-selector=status.phase==Failed --all-namespaces

Update your KNIME Business Hub license

In order to deploy a new Business Hub license, please go to the Replicated console. There, navigate to the Config tab and find your current license file.

update license 1

Click “select a different file”, and choose the .xml file provided by your KNIME customer care representative. Afterwards, scroll all the way to the bottom to confirm the configuration change. Click “go to updated version” next. This brings you to the “version history”, where you need to click on “deploy” to switch to the new license.

update license 2

Advanced installation guide

This section covers advanced installation topics for detailed networking requirements, setting up highly-available (HA) clusters and other environmental considerations.

Highly-Available (HA) Embedded Cluster installation

A highly-available (HA) cluster consists of running multiple primary nodes which share the responsibility of acting as the control-plane, meaning any primary instance can ensure that all nodes in the cluster are properly managed and work is evenly distributed across them.

In an HA configured cluster where three or more nodes are running, any node can become unavailable without impacting the overall stability and health of the cluster. Furthermore, any processes running on a node that becomes unavailable will be automatically moved to an available node, allowing the cluster to automatically self heal.

Additionally, for a cluster to be highly-available, all data must be replicated dynamically between all nodes to ensure any migrated processes have access to all needed data. This is enabled by configuring the Rook volume provider add-on onto nodes during the installation process. Rook requires additional, unformatted block devices (disks) be attached to each node which it leverages for volume management. Additional information can be found here.

Installation overview

In the most basic HA scenario, three or more nodes are installed where each node is configured to act as a primary node. A primary node is a node that additionally runs processes for the Kubernetes control-plane. A secondary node is a node that only runs non control-plane processes.

Having all nodes configured as primary nodes ensures any node can become unavailable without affecting the stability of the Kubernetes cluster.

If more than three nodes are planned to be configured in a cluster, a minimum of three of them must be primary nodes. All additional nodes can be installed as secondary nodes if desired.

When installing the Kurl embedded Kubernetes cluster in an HA configuration, the installation process is fully run on an initial instance, creating the first primary node. Upon completion of the install, output will be generated and printed to the console which includes a "join command" that can be run on each subsequent instance to configure it as a new node and cluster it with existing nodes.

Note all relevant ports that should be open for nodes to communicate with each other in the advanced Networking Requirements.

Installing the first node

When installing the kURL embedded cluster, the command line script needs to be modified to pass in additional parameters to configure that node to install the Rook Volume Management add-on as well as configure the node to enable additional HA components.

This is achieved by setting certain additional flags on the installer command and using a YAML installer-patch.yaml file to alter the installation requirements.

To start, first create the following file on your instance where kURL will be installed.

installer-patch.yaml

## Installer Patch file to install Rook as the default storage provider
## Specify patch file via '-s installer-spec-file="./installer-patch.yaml"'
apiVersion: cluster.kurl.sh/v1beta1
kind: Installer
metadata:
  name: "knime-hub-installer-patch-rook"
spec:
  rook:
    version: "1.10.11"
#    blockDeviceFilter: sd[b-z]
#    cephReplicaCount: 3
    isBlockStorageEnabled: true
    storageClassName: "distributed"
    hostpathRequiresPrivileged: false
    bypassUpgradeWarning: false
In the installer-patch.yaml file, blockDeviceFilter is included, but commented out. When mounting additional block storage (disk) devices to a Linux instance, Rook will automatically look for any available, unformatted devices. You can further constrain what devices it looks for by enabling this filter. See the Rook add-on documentation for additional configuration options.

With the installer-patch.yaml file created, run the following install command to initialize the first node.

curl -sSL https://kurl.sh/knime-hub | sudo bash -s installer-spec-file="./installer-patch.yaml" ha ekco-enable-internal-load-balancer

The installer may prompt the user to indicate when additional configurations are being enabled or installed, but will otherwise proceed as normal.

This install command is similar to the one for the single-node install, but has three additional parameters (listed after the -s flag) to define the installer-patch.yaml installer override file, the ha configuration option and to enable an internal-load-balancer which makes the Kubernetes control-plane API a highly-available endpoint.

Installing additional nodes

Once the first node has completed installation, the standard output will be printed to the terminal for how to access the Kots Admin Console and more.

Among this output is a join command (including a dynamically generated token) which can be run on subsequent instances to install them as nodes and join them to the cluster.

The join command will specify a token after the -s flag, which is used to authenticate new nodes to the cluster during installation. The same installer-patch.yaml file and parameters (installer-spec-file="./installer-patch.yaml" ha ekco-enable-internal-load-balancer) will need to be appended to this command as additional flags when run on each subsequent instance.

Once all nodes have been intialized, the Kots Admin Console can be used to proceed with the installation as normal.

Each node and its status are viewable from the admin console’s Cluster Management tab once the KNIME Business Hub install is complete.

Networking requirements

Firewall openings for online installations

The following domains need to be accessible from servers performing online kURL installs. IP addresses for these services can be found in replicatedhq/ips.

Host Description

amazonaws.com

tar.gz packages are downloaded from Amazon S3 during embedded cluster installations. The IP ranges to allowlist for accessing these can be scraped dynamically from the AWS IP Address Ranges documentation.

k8s.gcr.io

Images for the Kubernetes control plane are downloaded from the Google Container Registry repository used to publish official container images for Kubernetes. For more information on the Kubernetes control plane components, see the Kubernetes documentation.

k8s.kurl.sh

Kubernetes cluster installation scripts and artifacts are served from kurl.sh. Bash scripts and binary executables are served from kurl.sh. This domain is owned by Replicated, Inc which is headquartered in Los Angeles, CA.

No outbound internet access is required for airgapped installations.

Host firewall rules

The kURL install script will prompt to disable firewalld. Note that firewall rules can affect communications between containers on the same machine, so it is recommended to disable these rules entirely for Kubernetes. Firewall rules can be added after or preserved during an install, but because installation parameters like pod and service CIDRs can vary based on local networking conditions, there is no general guidance available on default requirements. See Advanced Options for installer flags that can preserve these rules.

The following ports must be open between nodes for multi-node clusters:

Primary Nodes:

Protocol Direction Port Range Purpose Used By

TCP

Inbound

6443

Kubernetes API server

All

TCP

Inbound

2379-2380

etcd server client API

Primary

TCP

Inbound

10250

kubelet API

Primary

UDP

Inbound

8472

Flannel VXLAN

All

TCP

Inbound

6783 Weave

Net control

All

UDP

Inbound

6783-6784

Weave Net data

All

TCP

Inbound

9090

Rook CSI RBD Plugin Metrics

All

Secondary Nodes:

Protocol Direction Port Range Purpose Used By

TCP

Inbound

10250

kubelet API

Primary

UDP

Inbound

8472

Flannel VXLAN

All

TCP

Inbound

6783

Weave Net control

All

UDP

Inbound

6783-6784

Weave Net data

All

TCP

Inbound

9090

Rook CSI RBD Plugin Metrics

All

These ports are required for Kubernetes and Weave Net.

Available ports

In addition to the ports listed above that must be open between nodes, the following ports should be available on the host for components to start TCP servers accepting local connections.

Port Purpose

2381

etcd health and metrics server

6781

weave network policy controller metrics server

6782

weave metrics server

10248

kubelet health server

10249

kube-proxy metrics server

9100

prometheus node-exporter metrics server

10257

kube-controller-manager health server

10259

kube-scheduler health server

Installation on hosts with undersized root volumes

By default, the Kurl embedded cluster uses OpenEBS for volume provisioning, which leverages the hosts disk for persistence. This location defaults to /var/openebs/local. Additionally, the host disk is used for caching container images and other artifacts.

If the host disk does not have sufficient capacity for installation, an additional disk can be mounted and configured for use.

Follow the recommended best practice for your hardware, infrastructure provider and Linux distribution to add a new disk and ensure a partition and filesystem have been created on it and that it is mounted.

The following steps can then be used to configure that disk for persistence with KNIME Business Hub.

Update installer script

Next, download the installer script, but don’t execute it.

curl -sSL https://kurl.sh/knime-hub > kurl-installer.sh

Inside the installer script is a line starting with Environment="KUBELET_CONFIG_ARGS=, which specifies flags to be passed to the kubelet process that runs on the node. An additional flag (--root-dir) needs to be appended to these arguments to point to /var/lib/replicated/kubelet.

This line in the installer script can be manually updated, or the following sed command can be run to automatically apply the change.

sed -i 's/\/var\/lib\/kubelet\/config\.yaml/\/var\/lib\/kubelet\/config\.yaml --root-dir=\/var\/lib\/replicated\/kubelet/g' kurl-installer.sh

Once edited, the complete line should look like the following:

Environment="KUBELET_CONFIG_ARGS=--config=/var/lib/docker/kubelet/config.yaml --root-dir=/var/lib/replicated/kubelet"

Install Kurl with argument overrides

To complete the install, chmod is used to make the shell script executable, then the script is executed with any needed arguments.

The kurl-install-directory="/var/lib/replicated/kurl" argument must be specified. Other arguments can additionally be added as needed. Note that, unlike standard install command which uses curl to download the installation script and pipe it directly into a bash shell, the -s flag is not needed before specifying arguments when the script is directly executed.

chmod +x kurl-installer.sh
./kurl-installer.sh kurl-install-directory="/var/lib/replicated/kurl"

Uninstalling KNIME Business Hub

Uninstalling KNIME Business Hub is a highly destructive action that can have permanent implications. Please ensure you are connected to the right cluster and 100% sure you want to uninstall all resources related to KNIME Business Hub before proceeding. Please ensure you have retrieved all data or backups from the cluster that you want to preserve.

To completely remove a KNIME Business Hub instance, the following commands can be run on the Ubuntu instance hosting KNIME Business Hub.

Both kubectl and Helm must be installed to successfully run the commands, and please ensure that the proper Kubernetes context is set before executing.

# Remove the KNIME Business Hub app from the KOTS Admin Console.
kubectl kots remove knime-hub -n default --force

# Delete all helm releases in the default namespace.
helm ls -a -n default | awk 'NR > 1 { print "-n "$2, $1}' | xargs -L1 helm delete

# Delete all namespaces associated with KNIME Business Hub.
kubectl delete namespace istio-system hub hub-execution knime

Afterwards all KNIME Business Hub resources have been removed from the cluster. You could now re-install KNIME Business Hub by going to the KOTS Admin Console in a browser and following the steps above again.

Removing kubernetes from a VM

You can uninstall all KOTS resources or remove everything related to kubernetes from a VM by following the documentation provided by Replicated under "Delete the Admin Console".

Additional resources

Further documentation can be found here: