KNIME Integrated Deployment Guide

Download PDF

Introduction

The traditional data science process starts with raw data and ends up to the creation of a model. That model is then usually moved into daily production. Integrated Deployment allows not only the model but all the associated preparation and post-process steps to be identified and reused in production in an automatic way, and without having to shift between different tools.

The Integrated Deployment nodes allow you to capture the fragments of the workflow needed for running in a production environment, the model or library itself as well as the data preparation. These captured subsets are saved automatically as workflows with all the relevant settings and transformations and can be run at any time, both on KNIME Analytics Platform for model validation and on KNIME Server for model deployment.

In this guide we will explain how to use the nodes that are part of the KNIME Integrated Deployment Extension, as well as other useful nodes that allow the use of this feature in different environments.

You can access more material about Integrated Deployment on this website section.

Installation

You can install KNIME Integrated Deployment Extension from:

KNIME Hub: go to the KNIME Integrated Deployment Extension page on KNIME Hub. Here, drag and drop the squared yellow icon to the workbench of KNIME Analytics Platform.
KNIME Analytics Platform: go to File → Install KNIME Extensions… in the toolbar and find KNIME Integrated Deployment under KNIME Labs Extensions or type Integrated Deployment in the search bar.

For more detailed instructions refer to the Installing Extensions and Integrations section on the KNIME Analytics Platform Installation Guide.

Now the nodes are available in the node repository under KNIME Labs → Integrated Deployment, as shown in Figure 1.

02 integrated deployment nodes repository

Figure 1. The Integrated Deployment nodes in the node repository

Create a workflow

In this section we will explain how to build a workflow, capture some fragments, and finally combine them into a new workflow. In this new workflow you can also store the input and output data.

All the workflows shown in this section are available on KNIME Hub.

Capture fragments of a workflow

With the Integrated Deployment nodes you can capture the fragments of the workflow that you need. To do this you can use Capture Workflow Start and Capture Workflow End nodes, as shown in Figure 2.

Figure 2. A fragment of a workflow is captured

Capture Workflow Start node marks the start of a workflow fragment to be captured. Capture Workflow End node marks the end of a workflow fragment to be captured. The entire workflow fragment within the scope of these two nodes is then available at the workflow output port of the Capture Workflow End node. Nodes that have out-going connections to a node that is part of the scope but are not part of the scope themselves are represented as static inputs but not captured.

In Figure 3 an example of the corresponding captured workflow is shown. For an explanation on how to generate the captured workflow please refer to the Export a captured workflow section.

Figure 3. The captured workflow

Configuration of the Capture Workflow End node

Input data

You can store input data with your captured workflow. These data will be used by default when running the captured workflow unless you provide a different input. Storing input data will add data to the Container Input (Table) node in the captured workflow. In this way data are saved with the workflow when writing it out or deploying it to KNIME Server and are available when executing the workflow locally or on KNIME Server.

To store input tables, right-click the Capture Workflow End node and choose Configure… from the context menu. In the node configuration dialog that opens, shown in Figure 4, check Store input tables option. Here you can also choose the maximum number of rows to store with the captured workflow.

Figure 4. The Capture Workflow End node configuration dialog

Add or remove input or output ports in Capture Workflow nodes

In Capture Workflow Start and Capture Workflow End nodes you can choose to have how many input and output ports of different types as necessary, as shown in Figure 5.

Figure 5. Add/remove input/output ports to and from Capture Workflow Start/End nodes

To add input ports:
1. Click the three dots in the left-bottom corner of the Capture Workflow Start/End node
2. Choose Add Captured workflow inputs port from the context menu that opens
3. Choose the port type from the drop-down menu in the window that opens
To remove input ports:
1. Click the three dots in the left-bottom corner of the Capture Workflow Start/End node
2. Click Remove Captured workflow inputs port.

Rename captured workflow and input/output ports

In the Capture Workflow End node you can change the name of the captured fragment of the workflow, as well as inputs and output ports. Right-click the Capture Workflow End node and choose Configure… from the context menu. In the node configuration dialog that opens, you can give the captured workflow a custom name in the Settings tab, as shown in Figure 4. To rename input and output ports instead go to Input and Output IDs tab, as shown in Figure 6.

Figure 6. The Capture Workflow End node configuration dialog, Input and Output IDs tab.

Combine captured workflow fragments

You can capture more than one fragment of the workflow with multiple couples of Capture Workflow Start/End nodes and combine all the different captured fragments into a single new workflow. To combine the different captured fragments use the Workflow Combiner node. The Workflow Combiner node can take as many as necessary workflow models as input and generates a combined captured workflow. To add or remove input model ports click the three dots in the left-bottom corner of the Workflow Combiner and choose Add(Remove) workflow model port in the context menu. An example with the relative captured workflow is shown in Figure 7.

Figure 7. Different fragments of a workflow are captured and combined in a new workflow

Input/output port mapping

Free output ports from one workflow fragment are connected to the free input ports of the consecutive workflow fragment. However the pairing of the output and input ports can be manually configured.

Workflow fragments connected to consecutive ports to the Workflow Combiner node form a pair. For each pair you can select if and how to connect the outputs of the first workflow fragment to the inputs of the consequent workflow fragment. By default the inputs of each workflow fragment are automatically connected with the outputs of the predecessor. In case the default pairing cannot be applied, e.g. because of a non-matching number of inputs and outputs or incompatible port types, the node requires manual configuration in order to be executed.

If the default configuration is not applicable or you want to manually configure it, right-click the Workflow Combiner node and choose Configure… from the context menu to open the node configuration dialog, shown in Figure 8

Figure 8. The Workflow Combiner node configuration dialog

Here, you can choose the pairing manually. For example, the first pane in Figure 8 shows the pairing between a workflow fragment (Preprocessed) that has two data output ports preprocessed_out1 and preprocessed_out2 with a workflow fragment (String To Number) connected to the consequent port of the Workflow Combiner node. From the drop-down menu you can choose to connect any or <NONE> of the output ports of the Preprocessed workflow fragment to the input port of the String To Number workflow fragment.

Please, note that only compatible port types are shown and can be connected.

Save and execute a captured workflow

In this section we will explain how to save, deploy and execute the captured workflow, in a local or remote file system or on KNIME Server.

All the workflows shown in this section are available on KNIME Hub.

The captured workflow can be:

Written locally or to a file system using the Workflow Writer node
Deployed to KNIME Server with the Deploy Workflow to Server node
Directly executed with the Workflow Executor node

Export a captured workflow

Export a captured workflow to local workspace

You can write the newly captured workflow locally or to a remote file system using the Workflow Writer node. Connect the Workflow Writer node editor to the output workflow model port (squared solid black port) of the workflow you want to write, i.e. the output port of Workflow Combiner node or Capture Workflow End node, as shown in Figure 9.

Figure 9. An example workflow to write a captured workflow to the local file system

Now, you can open the Workflow Writer node configuration dialog, shown in Figure 10.

Figure 10. The Workflow Writer node configuration dialog

Here, you can setup:

Output location: You can choose to write a captured workflow with a specific path relative to the local file system, a specific mountpoint or relative to current mount point, current workflow or current workflow data area.
Workflow: You can choose to overwrite the workflow if one with the same name already exists in the chosen location or to fail. You can also give the captured workflow a custom name.
Deployment options: You can choose between writing the workflow, writing the workflow and opening it in explorer or export the workflow as knwf archive.

In the Inputs and outputs tab of the Workflow Writer node configuration dialog, shown in Figure 11, you can also choose to add input and output nodes to a captured workflow.

Figure 11. The Workflow Writer node configuration dialog, Input and outputs tab

Export a captured workflow to a remote file system

You can also connect the Workflow Writer node to a remote file system, e.g. SharePoint, Amazon S3, Google Cloud Storage. First you need to authenticate to the chosen remote file system and then connect to it. Then, right-click the three dots in the left-bottom corner of the Workflow Writer node and from the context menu choose Add File System Connection port. A squared solid cyan connection port will appear and it can be connected to the relative output file system connection port of the chosen remote file system connector node. An example workflow with SharePoint remote file system is shown in Figure 12.

Figure 12. An example workflow to write a captured workflow to a remote file system

Now the Workflow Writer node configuration dialog shows the remote file system in the Output location section of the Settings tab and you can choose to save the captured workflow into a location on that file system.

Please note that if you are writing to a remote file system the workflow can only be written or exported as knwf archive, and it is not possible to prompt its opening as a new explorer tab on KNIME Analytics Platform.

Deploy a captured workflow to KNIME Server

You can deploy a captured workflow to KNIME Server with the Deploy Workflow to Server node, following these steps:

Set up a connection to the server with the KNIME Server Connection node
Connect the KNIME Server Connection node to the Deploy Workflow to Server node via the squared solid light blue KNIME Server connection port
Connect the Deploy Workflow to Server node to the output workflow model port of the workflow you want to write, i.e. the squared solid black output port of Workflow Combiner node or Capture Workflow End node, as shown in Figure 13.

Figure 13. An example workflow to deploy a workflow to KNIME Server
Open the Deploy Workflow to Server configuration dialog, shown in Figure 14, by right-clicking the node and choosing Configure… from the context menu.

Figure 14. The Deploy Workflow to Server node configuration dialog

In the Deploy Workflow to Server node configuration dialog you can setup the following options:

Choose folder on KNIME Server: You can choose the folder on the server file system to which deploy the newly captured workflow.
Workflow: You can choose to overwrite a workflow if one with the same name already exists in the chosen location or to fail. You can also give the written workflow a custom name.
Deployment options: You can choose to create a snapshot and add an optional snapshot comment in order to keep track of the changes made to the workflow on the server.

In the Inputs and outputs tab of the Deploy Workflow to Server node configuration dialog, shown in Figure 15, you can also choose to add input and output nodes to the workflow. This will add Container Input/Output (Table) nodes to the captured workflow deployed to KNIME Server.

Figure 15. The Deploy Workflow to Server node configuration dialog, Input and outputs tab

If the captured workflow deployed to Server is executed on the Server (see Execute a captured workflow deployed to KNIME Server section) via Call Workflow node the Container Input/Output (Table) nodes are mandatory.

Execute a captured workflow deployed to KNIME Server

You can execute a captured workflow:

On KNIME Server or on KNIME WebPortal
On KNIME Server from the KNIME Analytics Platform client. In this way you can also provide additional data in case you want to run the captured workflow on new data. To do this you can use a Call Workflow node, connected to the KNIME Server Connection node via the KNIME Server connection port. If needed, i.e. if no input data was stored with the captured workflow, input data have to be provided, as shown in Figure 16.

Figure 16. An example workflow to execute the captured workflow from KNIME Server with some input data

Execute a captured workflow locally

You can also execute the workflow locally by using the Workflow Executor node.

The Workflow Executor node can only execute a workflow that is captured from the same workflow the Workflow Executor node is part of.

The node ports match the input and output ports of the workflow. Connect the Workflow Executor node to the output workflow model port of the workflow you want to write, i.e. the output workflow model port of Workflow Combiner node or Capture Workflow End node, and, if needed, the input data port to a data table. An example is shown in Figure 17.

Figure 17. An example workflow to execute the captured workflow locally

You can adjust the needed input and output ports to match those of the workflow to be executed by opening the node configuration dialog shown in Figure 18, and clicking Auto-adjust ports (carried out on apply).

Figure 18. The Workflow Executor node configuration dialog