This guide describes the first steps to take after starting KNIME Analytics Platform and points you to the resources available in the KNIME Workbench for building workflows. It also explains how to customize the workbench and configure KNIME Analytics Platform to best suit specific needs. In the last part of this guide we introduce data tables.
When you start KNIME Analytics Platform, the KNIME Analytics Platform launcher window appears and you are asked to define the KNIME workspace, as shown in Figure 1.
|The KNIME workspace is a folder on the local computer to store KNIME workflows, node settings, and data produced by the workflow.|
The workflows and data stored in the workspace are available through the KNIME Explorer in the upper left corner of the KNIME Workbench.
After selecting a workspace for the current project, click Launch. The KNIME Analytics Platform user interface - the KNIME Workbench - opens.
It is typically organized as shown in Figure 2.
In the next few sections we explain the functionality of these components of the workbench:
The welcome page shown in Figure 3 is located in the middle of the KNIME Workbench.
This page links to information, for example available updates and the latest KNIME news, upcoming events, and tips and tricks.
After closing the welcome page, if no previously created workflows are available, you need to create an empty workflow editor, as explained in the next section.
Workflow editor & nodes
The workflow editor is where workflows are assembled. Workflows are made up of individual tasks, represented by nodes.
Create a new workflow editor going to File → New… and selecting the New KNIME Workflow option in the window that opens. Then click Next and give the new workflow a name in the field next to Name of the workflow to create: and click Finish. Other options are available as explained in the Building workflows section.
In the new empty workflow editor, create a workflow by dragging nodes from the node repository to the workflow editor, then connecting, configuring, and executing them.
In KNIME Analytics Platform, individual tasks are represented by nodes. Nodes can perform all sorts of tasks, including reading/writing files, transforming data, training models, creating visualizations, and so on.
Facts about nodes
Each node is displayed as a colored box with input and output ports, as well as a status, as shown in Figure 4
The input port(s) hold the data that the node processes, and the output port(s) hold the resulting datasets of the operation
The data is transferred over a connection from the output port of one to the input port of another node.
|For simplicity we refer to data when we refer to node input and output ports, but nodes can also have input and output ports that hold a model, a database query, or another type explained in Node Ports.|
Changing the status of a node
The status of a node can be changed, either configuring, executing, or resetting it. All these options can be found in the context menu of a node shown in Figure 5.
Open the context menu by right clicking a node. From the context menu it is also possible to open output tables and views, as well as copy nodes, along with some more advanced node options.
Identifying the node status
The traffic light below each node shows the status of the node. When a node is configured, the traffic light changes from red to yellow, i.e. from "not configured" to "configured".
When a new node is first added to the workflow editor, its status is "not configured" - shown by the red traffic light below the node.
Configuring the node
The node can be configured by adjusting the settings in its configuration dialog.
Open the configuration dialog of a node by either:
Double clicking the node
Right clicking a node and selecting Configure… in the context menu
Or, selecting the node and pressing F6
In addition to the task specific settings, each node configuration dialog has a:
Executing the node
Some nodes have the status "configured" already when they are created. These nodes are executable without adjusting any of the default settings.
Execute a node by either:
Right clicking the node and selecting Execute
Or, selecting the node and pressing F7
If execution is successful, the node status becomes "executed", which corresponds to a green traffic light. If the execution fails, an error sign will be shown on the traffic light, and the node settings and inputs will have to be adjusted as necessary.
Canceling execution of the node
To cancel the execution of a node, right click it and select Cancel or select it and press F9.
Resetting the node
To reset a node, right click it and select Reset or select it and press F8.
|Resetting a node also resets all of its subsequent nodes in the workflow. Now, the status of the node(s) turns from "executed" into "configured", the nodes' outputs are cleared.|
A node may have multiple input ports and multiple output ports. A collection of interconnected nodes, using the input ports on the left and output ports on the right, constitutes a workflow. The input ports consume the data from the output ports of the predecessor nodes, and the output ports provide data to the successor nodes in the workflow.
An output port can only be connected to an input port of the same type - data to data, model to model, and so on.
Some input ports can be empty, like the data input port of the Decision Tree View node in Figure 6. This means that the input is optional, and the node can be executed without the input. The mandatory inputs, shown by filled input ports, have to be provided to execute the node.
A tooltip gives a short explanation of the input and output ports. If the node is executed, the dimensions of the outcoming data are shown in its data output port. A more detailed explanation of the input and output ports is in the node description.
How to select, move, copy, and replace nodes in a workflow
Nodes can be moved into the workflow editor by dragging and dropping them. To copy nodes between workflows, select the chosen nodes, right click the selection, and select Copy in the menu. In the destination workflow, right click the workflow editor, and select Paste in the menu.
To select a node in the workflow editor, click it once, and it will be surrounded by a border. To select multiple nodes, either press "Ctrl" and select nodes by mouse click, or draw a rectangle over the nodes with the mouse.
Replace a node by dragging a new node onto an existing node. Now the existing node will be covered with a colored box with an arrow and boxes inside as shown in Figure 7. Releasing the mouse replaces the node.
Workflow editor settings
Change the visual properties of the workflow editor by clicking the "Workflow Editor Settings" button in the toolbar shown in Figure 9.
In the dialog that opens you can change the size of the grid or remove the grid lines completely. You can also change the connection style from angular to curved, and make the connections thicker or narrower.
The changes will only apply to the currently active workflow editor. To change the default workflow editor settings, go to File → Preferences → KNIME → KNIME GUI → Workflow Editor.
To view a full list of keyboard shortcuts, choose Help → Show Active Keybindings from the toolbar. Here, it is also possible to modify the bindings, and create personalized shortcuts.
The KNIME Explorer is where you can manage workflows, workflow groups, and server connections. By default only the local workspace, the EXAMPLES server and the link to connect to your personal KNIME Hub space are visible in the KNIME Explorer.
Mount points are workflow repositories that are accessible from KNIME Analytics Platform. They can be displayed as root directories in the KNIME Explorer view.
Each mount point consists of the location of the workflow repository, and a mount ID. For a local workflow repository, the location is the path to the folder, and for a server it is the address of the server. The mount ID is used to reference files and workflows under the mount point.
KNIME Explorer toolbar
At the top of the KNIME Explorer are several icons arranged in a toolbar shown in Figure 11.
The functions of the icons are explained in Table 1 below:
Refreshes the view, in case it is out of sync with the underlying file system
Selects the workflow that is open in the workflow editor
Add text to the field and press "Enter". The KNIME Explorer will only show items that contain the text in their name or are in a workflow group containing the text in its name.
Opens the explorer preference page, allowing to add/remove/edit mount points
KNIME Explorer content
The types of content that you can see in the KNIME Explorer are described in Table 2.
A collection of nodes used to analyze data in KNIME
A folder within the KNIME Explorer, which can be used to store workflows, data files, components, and metanodes.
Dragging a data file from the KNIME Explorer to the workflow editor automatically creates the correct node to read the file type. Storing data files in the currently active workspace allows for defining file paths relatively to their location in the KNIME Explorer.
Components and metanodes contain a pre-configured sub-workflow, which can be integrated in any part of a workflow. Components are nodes that encapsulate and abstract functionality. Metanodes, instead, are used to organize the workflow, collapsing part of it to hide that part of the workflow’s functionality.
|A more comprehensive overview on components and metanodes is available in the KNIME Components Guide.|
Dragging and dropping elements
In the KNIME Explorer, elements can be moved between the repositories in the same way as in any other file explorer. Besides that, operations can be applied that affect the workflows stored in the KNIME Explorer: create nodes to read different file types and use a component or a metanode within a workflow. These operations are summarized in Table 3.
To move an item, simply drag it and drop it to the desired location
Copying an item is the same process as moving it. Keep the "Ctrl"-key pressed during the drag and drop step. A small plus-sign next to the mouse cursor indicates the copy operation. Additionally, "Ctrl" + "c"/"v" shortcuts can be used to copy and paste elements from one repository to another.
Drop a data file into the workflow editor. KNIME will create the appropriate file reading node automatically and preconfigures the node.
A component or a metanode can be saved in the KNIME Explorer for later reuse. To do this, right click any component or metanode and select Component (or Metanode) → Share… . The resulting dialog gives you the possibility to choose a destination and the link type. To use a component or metanode stored in the KNIME Explorer, drag and drop it to the workflow editor.
Creating a new workflow
To create an empty workflow, right click anywhere in the local workspace, and
select New KNIME Workflow… in the menu, or use one of the options
explained in Building Workflows.
Give the workflow a name, and define the destination of the new workflow.
Click Finish, and the new workflow will appear in the selected workflow group in the KNIME Explorer.
To learn how to build a workflow, take a look at the next section Building Workflows, follow the steps in the Quickstart Guide, or check the video Workflows and Workflow Groups.
To create a workflow, you need an empty workflow editor. To create a new empty workflow editor take any of these actions:
Navigate to File → New…, and select New KNIME Workflow
Click the leftmost icon in the toolbar
Right click in the local workspace and select New KNIME Workflow…
A workflow is built by dragging nodes from the node repository to the workflow editor and connecting them. To add a node from the node repository or from the workflow coach to the workflow editor, you have two options as shown in Figure 13:
Drag and drop the node into the workflow editor
Double click the node
Once two nodes are added to the workflow editor, they can be connected in any of these three ways:
Click the output port of the first node and release the mouse at the input port of the second node. Now, the nodes are connected.
Select a node in the workflow editor, and then double click the next node in the node repository. This double click creates a new node, and connects it to the selected node in the workflow editor.
Select the nodes to connect in the workflow editor and press "Ctrl + L"
To add a node between two nodes in a workflow, drag the node from the node repository, and release it at its place in the workflow when the connector has turned red, as shown in Figure 14. The red connection means that it is ready to accept the new node. Release the mouse and the node is put in place.
Multiple workflows can be organized into workflow groups. Workflow groups are folders in the KNIME workspace that can include multiple workflows, as well as associated datafiles, shared components and metanodes, and even other workflow groups.
The workflow groups are in the currently active local workspace under the
LOCAL mount point in the KNIME Explorer.
You have three ways to create a new, empty workflow group:
Right click in the local workspace in the KNIME Explorer, and select New Workflow Group… in the menu
Click the arrow next to the leftmost icon in the toolbar and select New KNIME Workflow Group
Navigate to File → New…, select New KNIME Workflow Group in the list, and click Next.
In the dialog that opens, give the folder a name, and define where to save the folder in the local workspace. Click Finish. Now the new folder will appear in the selected destination in the KNIME Explorer.
Import/export workflows and workflow groups
You have three options to export a workflow or a workflow group:
Export it as a file
Save it into your personal KNIME Hub space, either in your public or in your private space
Or, deploy it to a server (requires a license)
To save workflows into your personal KNIME Hub space you need to first sign in. In the KNIME Explorer right click My-KNIME-Hub (hub.knime.com) and click Connect to KNIME Hub. Please, be aware that when saving a workflow group to your public folder on the KNIME Hub including data they become publicly available.
In the same way, you can import a workflow to your local workspace in the following ways:
Import a file containing a workflow to your local workspace
Save a workflow that is on a server to your local workspace. For example, you can access the EXAMPLES server (no credentials required) and save any workflow located there to your local workspace.
How to import and export a workflow (or workflow group)
You can import or export a workflow or a workflow group in the following ways:
Right click anywhere in the local KNIME workspace, and select Import(Export) KNIME Workflow…, as shown in Figure 15.
Go to File menu and select Import (Export) KNIME Workflow…
The dialog shown in Figure 16 opens.
Importing a workflow
In the upper part of the "Import" dialog, select the items to import, i.e define the file or folder path to import. In the "Destination" field underneath, define the destination folder in the KNIME workspace to import to.
Importing a workflow group will show a list of elements inside the workflow group in the lower part of the dialog. Here you can select single elements to import them.
Exporting a workflow
In the upper part of the "Export" dialog, select the workflow (or group) to export. In the "Destination" field underneath, define the path to the destination folder on the local system, and the name of the file.
In "Options" you can choose to reset the workflow(s) before exporting. After resetting a node, the node status changes from "executed" to "configured", and the output of the node is no longer available. When exporting a workflow in an executed state, the data used in the workflow are exported as well. See the section on Reset and Logging for more information.
When exporting a workflow group, you can select the elements that you want the exported file to contain.
Importing and exporting workflows are also introduced in this video: Import/Export Workflows.
The file extension for a KNIME workflow, is
.knwf(KNIME workflow file)
The file extension for a workflow group, is
.knar(KNIME archive file)
KNIME Workflow Comparison
The Workflow Comparison feature provides tools and views that compare workflow structures and node settings. Workflow Difference allows you to view changes in different versions of a workflow. The feature allows users to spot insertions, deletions, substitutions or similar/combined changes of nodes. The node settings comparison makes it possible to track changes in the configuration of a node.
A Workflow Comparison can be triggered from every view that shows multiple workflows, e.g. KNIME Explorer and Server History.
In order to compare two workflows, select them in the KNIME Explorer, with "Ctrl"+click, and select Compare from the context menu, as shown in Figure 17.
It is also possible to compare a workflow with itself. With this option, you are given a list of nodes from which you can choose the two nodes you want to compare (see Node comparison section below).
The comparison of two workflows or of a workflow with itself, opens a tab in the KNIME Workbench, as shown in Figure 18.
To make it easier to see which changes have been identified there are three buttons in the upper right corner of the Workflow Comparison view.
These buttons (from left to right) filter the list to:
Perform a node comparison of the selected nodes (see Node comparison section below)
Show the added or removed nodes only
Hide the nodes with equal settings
Additionally you can use the search field, to check a special node or node type for changes. The last icon clears the search field and displays all nodes.
Workflow Comparison is structure based. This means, that not only workflows, but also components, snapshots, metanodes, i.e. all items that act as a workflow can be compared with each other. This basically includes everything that can be seen in KNIME Explorer and Server History (except data).
Components inside a workflow, when comparing workflows, on the other hand, are treated as normal nodes, and their content does not appear in the view. This is not true instead for metanodes inside a workflow that are expanded when comparing the workflow. In Figure 21, the comparison between two workflows containing a component and a metanode, respectively, is shown.
On the left column the component contained in the first workflow is shown as a node while the metanode contained in the second workflow, on the right column, is expanded.
If a component has been changed, it is highlighted in the Workflow Compare view, and the node comparison will show two additional entries: "Component Content Hash" and "Component Internal Settings Hash". These two numbers change whenever the content of the Component (e.g. node insertion/substitution) or the settings of an internal node change, respectively.
|The structural comparison is based on a sequential alignment with respect to the attributes (neighborhood, settings, etc.) of a node. It is designed to identify changes in a workflow. It might still be used to find similarities/common parts in any two workflows, however the usefulness of the results of these comparisons is often limited.|
Node Comparison is an additional view to Workflow Comparison. To compare two nodes select the nodes in the Workflow Comparison view and then click either the first button in the view or right-click and select Compare Highlighted in the context menu.
The selected nodes are written in bold and include a green checkmark on the icon.
Node Comparison shows the settings of the nodes in two trees. Differing values and settings, which are not present in both nodes, are highlighted in red. If a changed setting is nested, the parent setting is highlighted in gray to indicate the hidden change. Click the arrow to expand the setting and show all nested settings.
To look for a specific setting the user can type the name into the search field in the upper right corner. This filters the list to show only those settings that contain the search query.
To compare the settings of two nodes in the same workflow (for example to compare two similar branches) the user can either compare the workflow with itself to retrieve the list of nodes, or open the workflow in the editor, select the two nodes of interest, right-click and select Compare Nodes in the context menu. This opens the Node Comparison view as a tab close to Console and Node Monitor views as shown in Figure 22. This view is identical to the one in Workflow Comparison, but is an independent view.
A subtle but powerful difference between Node Comparison in Workflow Comparison and the independent view is the toolbar, which for the independent view has two additional buttons. The right button refreshes the view, retrieving the settings from the workflow again. This is useful for example if you find a difference between two nodes which should actually be identical. After identifying and changing the setting, click the refresh button to show the new settings and confirm the new equality. The second button enables you to find the compared nodes in the workflow. If the workflow is still open in an editor, the nodes are selected and scrolled into the viewport.
Please note that starting from KNIME Analytics Platform
version 4.3 most of the nodes have been updated to work with a new
File Handling framework and
the below described
knime:// protocol is a protocol specific to KNIME that allows to specify file
paths relatively to the KNIME workspace or even the location of the currently
The first element in the file path after
knime:// is the base for the path.
It is either the workflow itself, the current mount point or a specific
mount point like
LOCAL in the following example:
The portable file path options are explained in the subsections below and in this video: The knime:// Protocol.
Absolute URLs are defined relative to a specific mount point. The following
file path is defined using the absolute path to the file based on the mount point
The file path would now work on any system where the workflow is saved in the local workspace, and the file path inside the local workspace folder is the same.
Because of the
LOCAL term in the absolute path, accessing the file with the
absolute URL is not possible, if the workflow is deployed to a server.
To enable access to a data file both locally and on a server, select the path to the file relative to the currently active mount point.
To do this, change the
LOCAL term in the file path to
as in this file path:
In the mountpoint-relative file path, the
knime.mountpoint refers to the
uppermost folder level, which can be
LOCAL or the mount ID of a server.
The most flexible portable file path is the workflow-relative path. A workflow-relative path defines the file path relative to the currently executing workflow. Using this file path you can access data files in workflows in local workspaces on different systems, or on a server, as long as the folder structure between the workflow and the data file is the same.
Compared to the absolute path and mountpoint-relative path, the name of the
folder containing the workflow does not have to be the same in the different
locations. That’s because an upper folder level is denoted by
/../ instead of
the name of the folder.
Save workflows with data
|Please note that starting from KNIME Analytics Platform version 4.3 most of the nodes have been updated to work with a new File Handling framework and the below described process has been substituted by a new way of addressing standard file systems. For example, with the new File Handling framework nodes xou can easily save your workflows with data by using a Writer node and choosing the write to Relative to > Current workflow data area output location when configuring the node.|
You can easily include data into your workflow by using the workflow-relative
paths as described above. First, access the workflow in your
KNIME workspace from your operating system, then manually create a folder called
data, and place your data inside this folder. In this way you can easily
reference your data within nodes using the workflow-relative path,
which makes sure that your data will remain with your workflow whenever you archive it,
export it, or upload it to a KNIME Server or the KNIME Hub.
You can explore the example workflows, which includes also some real-world use cases, on the public EXAMPLES server.
Inspect the workflow groups for different categories by expanding the EXAMPLES mount point in the KNIME Explorer, and then double clicking the text below as shown in Figure 24.
You can download an example workflow by drag and drop, or copy and paste of the
workflow into the local workspace. Double click the downloaded copy of the
example workflow to open and edit it like any other workflow.
Alternatively, double click the example workflow directly on the EXAMPLES server to open it in the workflow editor. Save it to the local workspace via "File" and then "Save As…".
The video The EXAMPLES server provides a more detailed introduction to the EXAMPLES server.
The workflow coach shown in Figure 25 provides node recommendations. If a node is selected in the workflow editor, the workflow coach shows the most popular nodes to follow the selected node. Otherwise, the recommendations represent the most popular nodes to start a workflow.
The recommendations are based on KNIME community usage statistics about workflows built in KNIME Analytics Platform. Nodes can be added from the workflow coach to the workflow editor in the same way as from the node repository, by drag and drop, or by a double click.
Note: start or stop sending anonymous usage data any time by checking the option Yes, help improve KNIME. in the "KNIME" dialog in Preferences.
Customizing node recommendations
Customize the node recommendations in the "Workflow Coach" dialog, under File → Preferences → KNIME → Workflow Coach. You have the following three options:
Add node recommendations based on workflows in the currently active local workspace by enabling the Workspace Node Recommendations option in the "Workspace Recommendations" dialog
Add node recommendations based on the workflows on a server by selecting the KNIME Server in the "Server Recommendations" dialog
Disable the node recommendations by the community by unchecking the Node Recommendations by the Community option in the "Workflow Coach" dialog
The video Workflow Coach: The Wisdom of the KNIME Crowd provides a more detailed introduction to node recommendations.
Currently installed nodes are available in the node repository where they are organized under different categories. You can add a node from the node repository into the workflow editor by drag and drop, or by a double click, as explained in the section Building Workflows.
Search for a node by expanding the categories or by typing a search term in the search field on top of the node repository, as shown in Figure 26. The default search mode is crisp search. Using this search mode, the interface returns all the nodes that either have the search term in their names, or are in a subcategory whose name includes the search term.
Switch the search mode to fuzzy search by clicking the icon next to the search field. In this search mode the interface returns all the nodes that are related to the search term.
An introduction to the node repository is also available in the video Node Repository.
KNIME Hub view
Within the KNIME Hub view you have access to the following features:
Search: Enter a search term or sentence in the search field on the top of the view, press "Enter" and navigate the KNIME Hub. The search on the KNIME Hub looks for nodes, extensions, components, and workflows, among KNIME example workflows and components as well as workflows and components built and uploaded by the community. The search results display detailed information, e.g. where to find a specific node, links to documentation or external links to useful blog posts.
You can filter your search results based on the following categories:
Nodes or components: you can add a node or a component to a currently open workflow, configure it, and run it. If the node is part of an extension that is still not installed in the KNIME Analytics Platform in use, a message will prompt as shown in Figure 28, and you can automatically proceed installing the missing extension. The same will happen if the component you drag and drop to the workflow editor contains a node that is part of a not installed extension.Figure 28. Message prompted in case a node dragged from the KNIME Hub is missing an extension
Workflows: you can download workflows or drag and drop them (use the icon) to your local workspace to open them directly in the workflow editor
Extensions: you can drag and drop the extension you want to install (use the icon). In case the update site required for the installation of the extension is not enabled you will be asked to enable it.
Sign in: Click Sign in button on the top right and you can enter your Username or email address and Password, or Create account in case you do not have one, yet. Once you sign in you have access from the view to your own profile and spaces. Click your icon on the top right of the KNIME Hub view and choose Profile or Spaces from the drop-down menu.
If you sign in to KNIME Hub from the view is independent from the sign in to KNIME Hub from the browser as well as the sign in to you KNIME Hub mount point in KNIME Explorer.
For more information on how to use KNIME Hub please refer to the KNIME Hub About page.
The description panel on the right of the KNIME Workbench shown in Figure 2 provides a description of the currently active workflow, or a node selected in the node repository or workflow editor. For a workflow, the first part is a general description, followed by tags and links to other resources related to the workflow. For a node, the first part is a general description, followed by the available setting options, and finally a list of input and output ports.
The Node Monitor tab is located on the same panel of the console tab on the bottom part of the KNIME Workbench shown in Figure 29. It is especially useful to inspect intermediate output tables in the workflow.
|The Node Monitor tab is shown by default since KNIME Analytics Platform version 4.2. For KNIME Analytics Platform version <4.1, or in case you closed the Node Monitor tab and you want to restore it, go to View in the toolbar and select Node Monitor from the menu.|
Here you can choose to show the flow variables or a preview of the output data at any port of a selected node in the active workflow. To choose what kind of output you want to visualize, click the three vertical dots on the right-upper side of the node monitor tab, and select the You can also pin the monitor view to a specific node independently of the selection in the workflow editor, in order, for example, to follow the evolution of data flow and help with debugging. To do this select the node you want to pin, and click the green pin symbol on the right-upper side of the node monitor tab.
In the outline, on the bottom part of the KNIME Workbench shown in Figure 30, you find an overview of the currently active workflow. If the whole workflow does not fit in the workflow editor, you can change the active area by scrolling the blue, transparent rectangle.
The console tab on the bottom part of the KNIME Workbench shown in Figure 30 shows all warning and error messages related to the workflow execution. To debug and log information messages to be reported in the console, change the console log level in File → Preferences → KNIME → KNIME GUI.
Customizing the KNIME Workbench
Reset and logging
When a node is reset, the node status changes from "executed" to "configured" and the output of the node is not available any more. When saving a workflow in an executed state, the data used in the workflow are saved as well. That is, the larger the dataset, the larger the file size. Therefore, resetting workflows before saving them is recommended in case the dataset can be accessed without any restrictions.
A reset workflow only saves the node configurations, and not any results.
However, resetting a node does not undo the operation executed before. All
operations done during creation, configuration, and execution of a workflow are
reported in the
To inspect the
knime.log file you go to View → Open KNIME log.
The log file opens in the workflow editor. The
knime.log file has a
limited size, and after reaching it the rows will be overwritten from the top.
knime.log file is also located in the
knime-folder inside the
.metadata-folder, in the KNIME workspace folder defined when launching KNIME
Show heap status
The heap status panel shows the memory usage during the execution of a workflow, and helps to monitor memory usage for the project. To add the heap status panel to the workbench, go to File → Preferences. In the dialog that opens, click General, select Show heap status, and click Apply and Close.
A heap status bar showing the memory usage appears in the bottom right part of the status bar, directly below the console panel. Next to the heap status bar is the "Run Garbage Collector" button. Click it to manually allocate and free up memory.
Configuring KNIME Analytics Platform
In Preferences you can adjust the default settings of KNIME Analytics Platform. Go to File → Preferences, and a list of subcategories is displayed in the dialog that opens. Each category contains a separate dialog for specific settings like database drivers, available update sites, and appearance.
Selecting KNIME in the list of subcategories, allows you to define the log file log level. By default it is set to DEBUG. This log level helps developers to find reasons for any unexpected behavior.
Directly below, you can define the maximum number of threads for all nodes. Separate branches of the workflow are distributed to several threads to optimize the overall execution time. By default the number of threads is set to twice the number of CPUs on the running machine.
In the same dialog, you can also define the folder for temporary files.
Check the last option Yes, help improve KNIME. to agree to sending us anonymous usage data. This agreement activates the node recommendations by community in the Workflow Coach.
The KNIME category, contains a subcategory KNIME GUI. In this dialog, you can define the console view log level. By default it is set to "WARN", because more detailed information is only useful for diagnosis purposes.
Further below, you can select which confirmation dialogs are shown when using KNIME Analytics Platform. Choose from the following:
Confirmation after resetting a node
Deleting a node or connection
Replacing a connection
Saving and executing workflow
Loading workflows created with a nightly build
In the same dialog, you can define what happens if an operation requires executing the previous nodes in the workflow. You have these three options:
Execute the nodes automatically
Always reject the node execution
Show a dialog to execute or not
The following options allow you to define whether workflows should be saved automatically and after what time interval, also whether linked components and metanodes should be automatically updated. You can also define visual properties such as the border width of workflow annotations.
Any credentials in use in a workflow can be encrypted using a master key. Once, for example, you enter credentials for different database connections in a workflow, you do not need to save them together with the workflow, nor you need to enter them every time the workflow is opened. Instead, you just need to provide the master key.
In order to optimize the use of main memory in KNIME Analytics Platform as cell elements in a table are represented by Java objects, reviewing the underlying data representation. Starting with KNIME Analytics Platform version 4.3 a new Columnar Table Backend is introduced. This extension addresses these issues by using a different underlying data layer (backed by Apache Arrow), which is based on a columnar representation.
To work with the Columnar Table Backend you need to first install the extension. From the KNIME Analytics Platform go to File → Install KNIME Extensions… and select KNIME Columnar Table Backend extension, under KNIME Labs Extensions category.
The type of table backend used is defined at the workflow level. Right click any workflow in the KNIME Explorer and select Configure… from the context menu, as shown in Figure 31.
The parameters relative to memory usage of the Columnar Table Backend can also be configured. Go to File → Preferences and select Table Backend → Columnar Storage (Labs) under KNIME in the left pane of the preferences window, as shown in Figure 32.
Some default values are already set automatically, based on the system specifications where the current KNIME Analytics Platform is installed. However, unchecking the Use default values option activates the fields below, where the advanced configuration options can be set. Be aware that changes to these settings can seriously impact the performance of KNIME Analytics Platform and overall system stability.
In the Columnar Table Backend, there are currently three caches, the size and behavior of which can be configured via the Columnar Table Backend preference page as well as, eventually, through the knime.ini.
Caching strategy for complex data: The Complex Data Cache holds data in the Java Virtual Machine’s heap region of memory and can be configured to minimize memory usage or maximize performance.
Size of small table cache (in MB) and Size up to which table is considered small (in MB): The Small Table Cache holds recently used small tables in the off-heap memory region. The threshold up to which a table is considered small and the size of the cache (in megabytes) can be configured through the Preference page
Size of data cache (in MB): The General Data Cache holds recently used chunks of arbitrarily-sized tables up to a configurable total size (in megabytes) in the off-heap memory region.
Note that the caches that reside in the off-heap memory region require an amount
of memory in addition to whatever memory you have allotted to the heap space of
your KNIME’s Java Virtual Machine via the
-Xmx parameter in the
When altering the sizes of these cache via the preference page, make sure not to
exceed your system’s physical memory size as otherwise you might encounter system
instability or even crashes.
|For a more detailed explanation of the Columnar Table Backend technical background please refer to this post on KNIME Blog.|
Setting up knime.ini
When installing KNIME Analytics Platform, configuration options are set to their defaults. The configuration options, i.e. options used by KNIME Analytics Platform, range from memory settings to system properties required by some extensions.
You can change the default settings in the
knime.ini file. The
is located in the installation folder of KNIME Analytics Platform.
To locate the
knime.ini file with any plaintext editor, such as Notepad
(Windows), TextEdit (MacOS) or gedit (Linux).
-Xmx1024m in the
knime.ini file specifies how much memory KNIME
Analytics Platform is allowed to use. The setting for this value will depend on
how much memory is available in the running machine. We recommend setting it to
approximately one half of the available memory, but this value can be modified and
For example, if the computer has 16GB of memory, the entry might be set to
Besides the memory available, you can define many other settings in the
knime.ini file. Find an overview of some of the most common settings in
Table 4 or in this
of the configuration options.
Sets the maximum amount of memory available for KNIME Analytics Platform.
Determines which compression algorithm (if any) to use when writing temporary tables to disk.
This setting defines the size of a "small table". Small tables are attempted to be kept in memory, independent of the Table Caching strategy. By increasing the size of a small table, the number of swaps to the disk can be limited, which comes at the cost of reducing memory space available for other operations.
Determines whether to attempt to cache large tables (i.e., tables that are not considered to be "small"; see setting
When trying to connect or read data from an URL, this value defines a timeout for the request. Increase the value if a reader node fails. A too high timeout value may lead to slow websites blocking dialogs in KNIME Analytics Platform.
A data table is organized by columns and rows, and it contains a number of equal-length rows. Elements in each column must have the same data type.
The data table shown in Figure 34 is produced by the File Reader node, which is one of the many nodes with a black triangle output port for data output. To open the table, right click the node and select the last item File Table in the menu. The output table has unique row IDs and column headers. The row IDs are automatically created by the reader node, but they can also be defined manually. The row IDs and the column headers can therefore be used to identify each data cell in the table. Missing values in the data are shown by a question mark.
Besides the data table, the node output contains the following tabs:
The "Table" tab shows the contents of the table
The "Spec" tab shows the meta information of the table, including the column name, column type, and optional properties like the domain of the values in the column
The "Properties" tab, shows metadata related to some columns, for example the width of the histogram in the "Histogram" column produced by the Statistics node
The "Flow Variables" tab shows the available flow variables in the node output and their current values.
In the video Data Table Structure we introduce the data organization and data representation in KNIME Analytics Platform in more detail.
The basic data types in KNIME Analytics Platform are
String, along with other supported data types such as
Blob. KNIME Analytics Platform also supports customized data types, for
example, a representation of a molecule.
Click the "Spec" tab in an output table, to see the data types of the columns in the data table, as well as the domain of the values in the columns, as shown in Figure 35. For numerical values, only the range of the values in the data is shown. For string values, the different values appearing in the data are shown.
The reader nodes in KNIME Analytics Platform assign a data type to each column based on their interpretation of the content. If the correct data type of a column is not recognized by the reader node, the data type can be corrected afterwards. There are nodes available to convert data types. For example: String to Number, Number to String, Double to Int, String to Date&Time, String to JSON, and String to URI.
Many of the special data types are recognized as
String by the reader nodes.
To convert these
String columns to their correct data types, use the
Column Type Auto Cast node.
When you use the File Reader node to read a file you can convert the column types directly via the node configuration dialog. To do this double click a column header in the preview and change the column type in the dialog that opens, as shown in Figure 36.
Rows in the table view output can be sorted by values in one column by clicking the column header and selecting Sort Descending or Sort Ascending as shown in Figure 37. Note that this sorting only affects the current output view and has no effect on the node output.
To sort rows in an output table permanently, use the Sorter node. Use the Column Resorter node to reorder columns.
In a table view output, you can also change the way in which numeric values are displayed in a data table. For example, it is possible to display numeric values as percentages, with full precision, or replace digits by a color scale or bars. To see these and other rendering options for a column, right click the column header, and select Available Renderers as shown in Figure 38. Note that these changes are temporary and have no effect on the node output.
When executed, many KNIME nodes generate and provide access to tabular data at their output ports. These tables might be small or large and, therefore, might fit into the main memory of the executing machine or not. Several options are available for configuring which tables to hold in memory as well as when and how to write tables to disk. These options are outlined in this section.
KNIME Analytics Platform differentiates between small and large tables.
Tables are considered to be small (large) when they are composed of up to (more than) 5000 cells.
This threshold of 5000 cells can be adjusted via the
parameter in the
KNIME Analytics Platform always attempts to hold small tables in memory, flushing
them to disk only when memory becomes scarce.
In addition, KNIME Analytics Platform attempts to keep recently used large tables in memory while sufficient memory is available. However, it writes these tables asynchronously to disk in the background, such that they can be dropped from memory when they have not been accessed for some time or when memory becomes scarce. You can configure the memory consumption of a specific node to never attempt to hold its tables in memory and, instead, write them to disk on execution. This is helpful if you know that a node will generate a table that cannot be held in memory or if you want to reduce the memory footprint of a node.
Alternatively, by putting the line
-Dknime.table.cache=SMALL into the
KNIME Analytics Platform can be globally configured to use a less memory-consuming,
albeit much slower caching strategy. This strategy only ever keeps small tables in memory.
KNIME Analytics Platform compresses tables written to disk to reduce the amount of occupied disk space.
By default, KNIME Analytics Platform uses the Snappy compression
algorithm to compress its tables.
However, you can configure KNIME Analytics Platform to use GZIP compression or
no compression scheme at all via the
-Dknime.compress.io parameter in the
Columnar Table backend
Starting with KNIME Analytics Platform version 4.3 a new Columnar Table Backend is introduced. This extension addresses these issues by using a different underlying data layer (backed by Apache Arrow), which is based on a columnar representation.
For information on how to set up this type of backend please refer to the Table backend section.