Skip to content

Connect to Databases

Introduction

The KNIME Database Extension provides a set of KNIME nodes that allow connecting to JDBC-compliant databases. These nodes reside in the DB category in the Node Repository, where you can find a number of database access, manipulation and writing nodes.

The database nodes are part of every KNIME Analytics Platform installation. It is not necessary to install any additional KNIME Extensions.

This guide describes the KNIME Database extension, and shows, among other things, how to connect to a database, and how to perform data manipulation inside the database.

010 overviewdb

Figure 1. Example workflow using DB nodes

Port Types

010 port

Figure 2. Two types of Database port

There are two types of ports in the Database extension, the DB Connection port (red) and the DB Data port (dark red).

The DB Connection port stores information about the current DB Session, e.g data types, connection properties, JDBC properties, driver information, etc.

The DB Data port gives you access to a preview of the data.

Outport views

After executing a DB node, you can inspect the result in the outport view by right clicking the node and selecting the outport to inspect at the bottom of the menu. For more information on how to execute a node, please refer to the Quickstart Guide.

DB Connection outport view

The outport view of a DB Connection has the DB Session tab, which contains the information about the current database session, such as database type, and connection URL.

DB Data outport view

When executing a database manipulation node that has a DB Data outport, for example a DB GroupBy node, what the node does is to build the necessary SQL query to perform the GroupBy operation selected by the user and forward it to the next node in the workflow. It does not actually execute the query. However, it is possible to inspect a preview of a subset of the intermediate result and its specification.

To do so, select the node and click Fetch 100 table rows in the node monitor at the bottom of the UI.

By default only the first 100 rows are cached, but you can select also other options by opening the dropdown menu of the Fetch button. Hoewever, be aware that, depending on the complexity of the SQL query, already caching only the first 100 rows might take a long time.

010 outport cache

Figure 3. DB Outport View with retrieved rows

The table specification can be inspected by using the DB Data Spec Extractor node The output of this node will show the list of columns in the table, with their database types and the corresponding KNIME data types.

For more information on the type mapping between database types and KNIME types, please refer to the Type Mapping section.

The generated SQL query can be extracted by using the DB Query Extractor node.

Session Handling

The DB Session life cycle is managed by the Connector nodes. Executing a Connector node will create a DB Session, and resetting the node or closing the workflow will destroy the corresponding DB Session and with it the connection to the database.

To close a DB Session during workflow execution the DB Connection Closer node can be used. This is also the preferred way to free up database resources as soon as they are no longer needed by the workflow. To close a DB Session simply connect it to the DB Connection Closer node which destroys the DB Session and with it the connection to the database as soon as it is executed. Use the input flow variable port of the DB Connection Closer node to executed it once it is save to destroy the DB Session.

Connecting to a database

The DB Connection subcategory in the Node Repository contains

a set of database-specific connector nodes for commonly used databases such as Microsoft SQL Server, MySQL, PostgreSQL, H2, etc. as well as the generic Database Connector node.

A Connector node creates a connection to a database via its JDBC driver. In the configuration dialog of a Connector node you need to provide information such as the database type, the driver, the location of the database, and the authentication method if available.

Most of the database-specific connector nodes already contain the necessary JDBC drivers and provide a configuration dialog that is tailored to the specific database. It is recommended to use these nodes over the generic DB Connector node, if possible.

Connecting to predefined databases

The following are some databases that have their own dedicated Connector node:

Amazon Redshift Amazon Athena Google BigQuery H2 Microsoft Access Microsoft SQL Server MySQL Oracle PostgreSQL Snowflake SQLite Vertica

Some dedicated Connector nodes, such as Google BigQuery or Amazon Redshift, come without a JDBC driver due to licensing restriction. If you want to use these nodes, you need to register the corresponding JDBC driver first. Please refer to the Register your own JDBC drivers section on how to register your own driver. For Amazon Redshift, please refer to the Third-party Database Driver Plug-in section.

If no dedicated connector node exists for your database, you can use the generic DB Connector node. For more information on this please refer to the Connecting to other databases section.

After you find the right Connector node for your database, double-click on the node to open the configuration dialog. In the Connection Settings window you can provide the basic parameters for your database, such as the database driver, location, or authentication. Then click Ok and execute the node to establish a connection.

If you select "Use latest driver version available" upon execution the node will automatically use the driver with the latest (highest) driver version that is available for the current database type. This has the advantage that you do not need to touch the workflow after a driver update. However, the workflow might break in the rare case that the behavior of the driver, e.g. type mapping, changes with the newer version.

KNIME Analytics Platform in general provides three different types of connector nodes the File-based Connector node , the Server-based Connector node and the generic Connector nodes which are explained in the following sections.

File-based Connector node

020 h2The figure on the left side shows an example of the node dialog for a file-based database, such as SQLite, H2, or MS Access. The most important node settings are described below: Configuration: In the configuration window you can choose the registered database dialect and driver. Location: The location to the database. You can provide either the path to an existing database, or choose in-memory to create a temporary database that is kept in memory if the database supports this feature.

Figure 4. H2 Connector configuration dialog

Server-based Connector node

020 mssqlserverThe figure on the left side shows an example of the node dialog for a server-based database, such as MySQL, Oracle, or PostgreSQL. The most important node settings are described below. Configuration: In the configuration window you can choose the registered database dialect and driver. Location: The location to the database. You should provide the hostname and the port of the machine that hosts the database, and also the name of the database which might be optional depending on the database. Authentication: Login credentials can either be provided via credential flow variables, or directly in the configuration dialog in the form of username and password. Kerberos authentication is also provided for databases that support this feature, e.g Hive or Impala. For more information on Kerberos authentication, please refer to the Kerberos User Guide.

Figure 5. MS SQL Server Connector configuration dialog

For more information on the JDBC Parameters and Advanced tab, please refer to the JDBC Parameters and Advanced Tab section. The Type Mapping tabs are explained in the Type Mapping section.

Third-party Database Driver Plug-in

As previously mentioned, the dedicated database-specific connector nodes already contain the necessary JDBC drivers. However, some databases require special licensing that prevents us from automatically installing or even bundling the necessary JDBC drivers with the corresponding connector nodes. For example, KNIME provides additional plug-ins to install the Oracle Database driver, official Microsoft SQL Server driver or the Amazon Redshift driver which require special licenses.

To install the plug-ins, go to File Install KNIME Extensions. In the Install window, search for the driver that you need (Oracle, MS SQL Server or Redshift), and you will see something similar to the figure below. Then select the plug-in to install it. If you don't see the plug-in in this window then it is already installed. After installing the plug-in, restart KNIME. After that, when you open the configuration dialog of the dedicated Connector node, you should see that the installed driver of the respective database is available in the database driver list.

02 driver marker

Figure 6. Install Window

Connecting to other databases

The generic DB Connector node can connect to arbitrary JDBC compliant databases. The most important node settings are described below.

020 dbconnector

Figure 7. Database Connector configuration dialog

Database Type: Select the type of the database the node will connect to. For example, if the database is a PostgreSQL derivative select Postgres as database type. if you don't know the type select the default type.

Database Dialect: Select the database dialect which defines how the SQL statements are generated.

Database Driver: Select an appropriate driver for your specific database or select the "Use latest driver version available" option in which case upon execution the node will automatically use the driver with the latest (highest) driver version that is available for the current database type. If there is no matching JDBC driver it first needs to be registered, see Register your own JDBC drivers. Only drivers that have been registered for the selected database type will be available for selection.

Database URL: A driver-specific JDBC URL. Enter the database information in the placeholder, such as the host, port, and database name.

Authentication: Login credentials can either be provided via credential flow variables, or directly in the configuration dialog in the form of username and password. Kerberos authentication is also provided for databases that support this feature, e.g Hive or Impala. For more information on Kerberos authentication, please refer to the Kerberos User Guide.

The selected database type and dialect determine which data types, statements such as insert, update, and aggregation functions are supported.

If you encounter an error while connecting to a third-party database, you can enable the JDBC logger option in the Advanced Tab. If this option is enabled all JDBC operations are written into the KNIME log which might help you to identify the problems. In order to tweak how KNIME interacts with your database e.g. quotes identifiers you can change the default settings under the Advanced Tab according to the settings of your database. For example, KNIME uses " as the default identifier quoting, which is not supported by default by some databases (e.g Informix). To solve this, simply change or remove the value of the identifier delimiter setting in the Advanced Tab.

Register your own JDBC drivers

For some databases KNIME Analytics Platform does not contain a ready-to-use JDBC driver. In these cases, it is necessary to first register a vendor-specific JDBC driver in KNIME Analytics Platform. Please consult your database vendor to obtain the JDBC driver. A list of some of the most popular JDBC drivers can be found below.

The JDBC driver has to be JDBC 4.1 or above compliant.

To set up JDBC drivers on KNIME Server, please refer to the section JDBC drivers on KNIME Hub and KNIME Server.

To register your vendor-specific JDBC driver, go to File Preferences KNIME Databases.

020 adddriver

Figure 8. DB Preference page

Clicking Add will open a new database driver window where you can provide the JDBC driver path and all necessary information, such as:

ID: The unique ID of the JDBC driver consisting only of alphanumeric characters and underscore. Name: The unique name of the JDBC driver. Database type: The database type. If you select a specific database type e.g. MySQL the driver will be available for selection in the dedicated connector node e.g. MySQL Connector. However if your database is not on the list, you can choose default, which will provide you with all available parameters in the Advanced Tab. Drivers that are registered for the default type are only available in the generic DB Connector node. Description: Optional description of the JDBC driver. URL template: The JDBC driver connection URL format which is used in the dedicated connector nodes. If you select a database other than default in the Database type, the URL template will be preset with the default template for the selected database. Please refer to the URL Template syntax information below or the JDBC URL Template section for more information. URL Template syntax information: Clicking on the question mark will open an infobox which provides information about the URL template syntax in general. Additionally, if you select a database other than default in the Database type, one or more possible URL template examples will be provided for the selected database type, which you can copy and paste in the URL template field. Classpath: The path to the JDBC driver. Click Add file if the driver is provided as a single .jar file, or Add directory if the driver is provided as a folder that contains several .jar files. Some vendors offer a .zip file for download, which needs to be unpacked to a folder first.

If the JDBC driver requires native libraries e.g DLLs you need to put all of them into a single folder and then register this folder via the Add directory button in addition to the JDBC driver .jar file.

Driver class: The JDBC driver class and version will be detected automatically by clicking Find driver classes. Please select the appropriate class after clicking the button.

If your database is available in the Database type drop down list, it is better to select it instead of setting it to default. Setting the Database type to default will allow you to only use the generic DB Connector node to connect to the database, even if there is a dedicated Connector node for that database.

020 adddriver2

Figure 9. Register new database driver window

KNIME Server can distribute JDBC drivers automatically to all connected KNIME Analytics Platform clients (see JDBC drivers on KNIME Hub and KNIME Server).

JDBC URL Template

When registering a JDBC driver, you need to specify its JDBC URL template, which will be used by the dedicated Connector node to create the final database URL. For example, jdbc:oracle:thin:@<host>:<port>/<database> is a valid driver URL template for the Oracle thin driver. For most databases you don't have to find the suitable URL template by yourself, because the URL Template syntax information provides at least one URL template example for a database.

The values of the variables in the URL template, e.g <host>, <port>, or <database> can be specified in the configuration dialog of the corresponding Connector node.

Tokens:

Mandatory value (e.g. <database>): The referenced token must have a non-blank value. The name between the brackets must be a valid token name (see below for a list of supported tokens). Optional value (e.g. [database]): The referenced token may have a blank value. The name between the brackets must be a valid token name (see below for a list of supported tokens). Conditions (e.g. [location=in-memory?mem:<database>]): This is applicable for file-based databases, such as H2, or SQLite. The first ? character separates the condition from the content that will only be included in the URL if the condition is true. The only explicit operator available currently is =, to test the exact value of a variable. The left operand must be a valid variable name, and the right operand the value the variable is required to have for the content to be included. The content may include mandatory and/or optional tokens (<database>/[database]), but no conditional parts. It is also possible to test if a variable is present. In order to do so, specifying the variable name e.g. database as the condition. E.g. jdbc:mydb://<host>:<port>[database?/databaseName=<database>] will result in jdbc:mydb://localhost:10000/databaseName=db1 if the database name is specified in the node dialog otherwise it would be jdbc:mydb://localhost:10000.

For server-based databases, the following tokens are expected:

host: The value of the Hostname field on the Connection Settings tab of a Connector node. port: The value of the Port field on the Connection Settings tab of a Connector node. database: The value of the Database name field on the Connection Settings tab of a Connector node.

For file-based databases, the following tokens are expected:

location: The Location choice on the Connection Settings tab of a Connector node. The file value corresponds to the radio button next to Path being selected, and in-memory to the radio button next to In-memory. This variable can only be used in conditions. file: The value of the Path field on the Connection Settings tab of a Connector node. This variable is only valid if the value of the location is ile. database: The value of the In-memory field on the Connection Settings tab of a Connector node. This variable is only valid if the value of the location is in-memory.

Field validation in the configuration dialog of a Connector node depends on whether the (included) tokens referencing them are mandatory or optional (see above).

List of common JDBC drivers

Below is a selected list of common database drivers you can add among others to KNIME Analytics Platform:

Apache DerbyExasolGoogle BigQueryIBM DB2 / InformixSAP HANA

The list above only shows some example of database drivers that you can add. If your driver is not in the list above, it is still possible to add it to KNIME Analytics Platform.

Deprecated JDBC Drivers

With version 5.3.0, KNIME Analytics Platform introduced the concept of deprecating JDBC drivers. Older version of JDBC drivers become deprecated if they are not supported by the database vendor anymore or if they contain security vulnerabilities. Deprecated drivers are clearly marked in the node dialog and if selected will display a warning in the workflow editor.

020 deprecatedDriverDialog020 deprecatedDriverWarning

Figure 10. Tooltip of a deprecated driver in the driver list | Figure 11. Node warning if a deprecated driver is selected

After a certain time, deprecated drivers will be removed from the platform. If such a driver is still referenced in a workflow, KNIME Analytics Platform will display an exception indicating which extension needs to be installed to make the driver available again.

To install a deprecated driver extension you need to open the installation dialog as described here. In the installation dialog deselect the _Group items by category option, search for the mentioned extension and install it. After restarting KNIME Analytics Platform the deprecated driver will be available again.

In the example below the deprecated Snowflake driver is installed.

020 deprecatedDriverInstallation

Figure 12. Install deprecated Snowflake drivers

Advanced Database Options

JDBC Parameters

The JDBC parameters allow you to define custom JDBC driver connection parameter. The value of a parameter can be a constant, variable, credential user, credential password, KNIME URL or path flow variable. In case of a path flow variable only standard file systems are supported but no connected file systems. For more information about the supported connection parameter please refer to your database vendor.

The figure below shows an example of SSL JDBC parameters with different variable types. You can set a Boolean to enable or disable SSL, you can also use a KNIME relative URL to point to the SSLTrustStore location, or use a credential input for the trustStorePassword parameter.

020 jdbcparam

Figure 13. JDBC Parameters Tab

Please be aware that when connecting to PostgreSQL with SSL the key file has to first be converted to either pkcs12 or pkcs8 format. For more information about the supported driver properties see the PostgreSQL documentation.

Advanced Tab

The settings in the Advanced tab allow you to define KNIME framework properties such as connection handling, advanced SQL dialect settings or query logging options. This is the place where you can tweak how KNIME interacts with the database e.g. how the queries should be created that are send to the database. In the Metadata section you can also disable the metadata fetching during configuration of a node or alter the timeout when doing so which might be necessary if you are connected to a database that needs more time to compute the metadata of a created query or you are connected to it via a slow network.

020 advanced

Figure 14. Advanced Tab

The full available options are described as follow:

Connection

Automatically close idle connection timeout: Time interval in seconds that a database connection can remain idle before it gets closed automatically. A value of 0 disables the automatic closing of idle connections. Automatically reconnect to database: Enables or disables the reconnection to the database if the connection is invalid. Connection depending object will no longer exist after reconnection. Reconnect to database timeout: Time interval in seconds to wait before canceling the reconnection to the database. A value of 0 indicates the standard connection timeout. Restore database connection: Enables or disables the restoration of the database connection when an executed connector node is loaded. Validation query: The query to be executed for validating that a connection is ready for use. If no query is specified KNIME calls the Connection.isValid() method to validate the connection. Only errors are checked, no result is required.

Dialect capabilities

CASE expressions: Whether CASE expressions are allowed in generated statements. CREATE TABLE CONSTRAINT name: Whether names can be defined for CONSTRAINT definitions in CREATE TABLE statements. DROP TABLE statement: Whether DROP TABLE statements are part of the language. Derived table reference: Whether table references can be derived tables. Insert into table from query: Whether insertion into a table via a select statement is supported, e.g. INSERT INTO T1 (C1) (SELECT C1 FROM T2).

Dialect syntax

CREATE "temporary" TABLE syntax: The keyword or keywords for creating temporary tables. CREATE TABLE "if not exists" syntax: The syntax for the table creation statement condition "if not exists". If empty, no such statement will automatically be created, though the same behavior may still be non-atomically achieved by nodes. Delimit only identifier with spaces: If selected, only identifiers, e.g. columns or table names, with spaces are delimited. Identifier delimiter (closing): Closing delimiter for identifier such as column and table name. Identifier delimiter (opening): Opening delimiter for identifier such as column and table name. Identifier non-word character replacement: The replacement for non-word characters in identifiers when their replacement is enabled. An empty value results in the removal of non-word characters. Replace non-word characters in identifiers: Whether to replace non-word characters in identifiers, e.g. table or column names. Non-word characters include all characters other than alphanumeric characters (a-z, A-Z, 0-9) and underscore (_). Table reference keyword: The keyword before correlation names in table references.

JDBC logger

Enable: Enables or disables logger for JDBC operations.

JDBC parameter

Append JDBC parameter to URL: Enables or disables appending of parameter to the JDBC URL instead of passing them as properties. Append user name and password to URL: Enables or disables appending of the user name and password to the JDBC URL instead of passing them as properties. JDBC URL initial parameter separator: The character that indicates the start of the parameters in the JDBC URL. JDBC URL parameter separator: The character that separates two JDBC parameter in the JDBC URL.

JDBC statement cancellation

Enable: Enables or disables JDBC statement cancellation attempts when node execution is canceled. Node cancellation polling interval: The amount of milliseconds to wait between two checking of whether the node execution has been canceled. Valid range: [100, 5000].

Metadata

Flatten sub-queries where possible: Enables or disables sub-query flattening. If enabled sub-queries e.g. SELECT _ FROM (SELECT _ FROM table) WHERE COL1 > 1 will become SELECT * FROM table WHERE COL1 > 1. By default this option is disabled since query flattening is usually the job of the database query optimizer. However some database either have performance problems when executing sub-queries or do not support sub-queries at all. In this case enabling the option might help. However not all queries are flatten so even enabled sub-queries might be send to the database. List of table types to show in metadata browser: Comma separated list of table types to show in metadata browser. Some databases e.g. SAP HANA support more than the standard TABLE and VIEW type such as CALC VIEW, HIERARCHY VIEW and JOIN VIEW. Retrieve in configure: Enables or disables retrieving metadata in configure method for database nodes. Retrieve in configure timeout: Time interval in seconds to wait before canceling a metadata retrieval in configure method. Valid range: [1, ).

Transaction

Enabled: Enables or disables JDBC transaction operations.

Misc

Fail if WHERE clause contains any missing value: Check every value of a WHERE clause (e.g., update, delete or merge) and fail if one is missing. Fetch size: Hint for the JDBC driver about the number of rows that should be fetched from the database when more rows are needed. Valid range: [0,). Support multiple databases: Enables or disables support for multiple databases in a single statement.

Dedicated DB connectors (e.g. Microsoft SQL Server Connector) and built-in drivers usually show only a subset of the above mentioned options since most options are predefined, such as whether the database supports CASE statements, etc.

Examples

In this section we will provide examples on how to connect to some widely-known databases.

Connecting to Oracle

The first step is to install the Oracle Database JDBC driver which is provided as a separate plug-in due to license restrictions. Please refer to Third-party Database Driver Plug-in for more information about the plug-in and how to install it.

It is also possible to use your own Oracle Database driver if required. For more details refer to the Register your own JDBC drivers section.

Once the driver is installed you can use the dedicated Oracle Connector node. Please refer to Connecting to predefined databases on how to connect using dedicated Connector nodes.

Kerberos authentication

To use this mode, you need to select Kerberos as authentication method in the Connection Settings tab of the Oracle Connector. For more information on Kerberos authentication, please refer to the Kerberos User Guide. In addition, you need to specify the following entry in the JDBC Parameters tab: oracle.net.authentication_services with value (KERBEROS5). Please do not forget to put the value in brackets. For more details see the Oracle documentation.

020 oracle kerberos parameter

Figure 15. JDBC Parameters tab with Kerberos settings

Reading from a database

030 reading

Figure 31. Reading from a database

The figure above is an example on how to read from a database. In this example we want to read the flights dataset stored in an H2 database into a KNIME data table.

First you need a connector node to establish a connection to the database, in the example above it is an H2 database. There are several dedicated connector nodes depending on which database we want to connect to. For further details on how to connect to a database refer to the Connecting to a database section .

030 tableselector hiddenAfter the connection is established, the next step is to use the DB Table Selector node that allows selecting a table or a view interactively based on the input database connection. The figure on the left side shows the configuration dialog of the DB Table Selector node. At the top part you can enter the schema and the table/view name that you want to select, in this example we want to select the "flights" table.

Figure 32. DB Table Selector configuration dialog

Pressing the Select a table button will open a Database Metadata Browser window that lists available tables/views in the database.

In addition, ticking the Custom Query checkbox will allow you to write your own custom SQL query to narrow down the result. It accepts any SELECT statement, and the placeholder #table# can be used to refer to the table selected via the Select a table button.

The Input Type Mapping tab allows you to define mapping rules from database types to KNIME types. For more information on this, please refer to the section Type Mapping.

The output of this node is a DB Data connection that contains the database information and the SQL query automatically build by the framework that selects the entered table or the user entered custom query.

To read the selected table or view into KNIME Analytics Platform, you can use the DB Reader node. Executing this node will execute the input SQL query in the database and the output will be the result stored in a KNIME data table which will be stored on the machine where KNIME Analytics Platform is running.

Database Metadata Browser

030 metadata colsThe Database Metadata Browser shows the database schema, including all tables / views and their corresponding columns and column data types. At first opening it fetches the metadata from the database and caches it for subsequent use. By clicking on an element (schema/table/view) it shows the contained elements. To select a table or view select the name and click OK or double click the element. The search box at the top of the window allows you to search for any table or view inside the database. At the bottom there is a refresh button to re-fetch the schema list with a time reference on how long ago the schema was last refreshed.

Figure 33. Database Metadata Browser

If you have just created a table and you cannot find it in the schema list, it might be that the metadata browser cache is not up to date, so please try to refresh the list by clicking the refresh button at the lower right corner.

Query Generation

040 queryoverviewOnce you have successfully connected to your database, there is a set of nodes that provide in-database data manipulation, such as aggregating, filtering, joining etc. The database nodes come with a visual user interface and automatically build a SQL query in the background according to the user settings in the configuration window, so no coding is required to interact with the database. The output of each node is a SQL query that corresponds to the operation(s) that are performed within the node. The generated SQL query can be extracted by using the DB Query Extractor node.

Figure 34. DB Query nodes

Visual Query Generation

040 querygeneration

Figure 35. Example of a workflow that performs in-database data manipulation

The figure above shows an example of in-database data manipulation. In this example, we read the flights dataset from a H2 database. First we filter the rows so that we take only the flights that fulfil certain conditions. Then we calculate the average air time to each unique destination airport. Finally we join the average values together with the original values and then read the result into KNIME Analytics Platform.

The first step is to connect to a database and select the appropriate table we want to work with.

DB Row Filter

040 rowfilter3

Figure 36. DB Row Filter configuration dialog

After selecting the table, you can start working with the data. First we use the DB Row Filter node to filter rows according to certain conditions. The figure above shows the configuration dialog of the DB Row Filter. On the left side there is a Preview area that lists all conditions of the filter to apply to the input data. Filters can be combined and grouped via logical operators such as AND or OR. Only rows that fulfil the specified filter conditions will be kept in the output data table. At the bottom there are options to:

Add Condition: add more condition to the list Group: Create a new logical operator (AND or OR) Ungroup: Delete the currently selected logical operator Delete: Delete the selected condition from the list

To create a new condition click on the Add_Condition button. To edit a condition select in the condition list which will show the selected condition in the condition editor on the right. The editor consists of at least two dropdown lists. The most left one contains the columns from the input data table, and the one next to it contains the operators that are compatible with the selected column type, such as =, !=, <, >. Depending on the selected operation a third and maybe fourth input field will be displayed to enter or select the filter values. The button next to the values fields fetches all possible values for the selected column which will then be available for selection in the value field.

Clicking on a logical operator in the Preview list would allow you to switch between AND or OR, and to delete this operator by clicking Ungroup.

As in our example, we want to return all rows that fulfil the following conditions:

Originate from the Chicago O'Hare airport (ORD) OR Hartsfield-Jackson Atlanta Airport (ATL) AND occur during the month of June 2017 AND have a mild arrival delay between 5 and 45 minutes

DB GroupBy

040 groupby2

Figure 37. DB GroupBy: Manual Aggregation

The next step is to calculate the average air time to each unique destination airport using the DB GroupBy node. To retrieve the number of rows per group tick the Add Count(*) checkbox in the Advanced Settings. The name of the group count column can be changed via the result column name field.

040 groupby

Figure 38. DB GroupBy: Group Settings

To calculate the average air time for each destination airport, we need to group by the Dest column in the Groups tab, and in Manual Aggregation tab we select the ActualElapsedTime column (air time) and AVG as the aggregation method.

DB Joiner

040 joiner

Figure 39. DB Joiner: Joiner Settings

To join the result back to the original data, we use the DB Joiner node, which joins two database tables based on joining column(s) of both tables. In the Joiner settings tab, there are options to choose the join mode, whether Inner Join, Full Outer Join, etc, and the joining column(s).

040 joiner2

Figure 40. DB Joiner: Column Selection

In the Column Selection tab you can select which columns from each of the table you want to include in the output table. By default the joining columns from bottom input will not show up in the output table.

Advanced Query Building

Sometimes, using the predefined DB nodes for manipulating data in database is not enough. This section will explain some of the DB nodes that allow users to write their own SQL queries, such as DB Query, DB Query Reader, and Parametrized DB Query Reader node.

040 advancedquery

Figure 41. Example workflow with advanced query nodes

Each DB manipulation node, that gets a DB data object as input and returns a DB data object as output, wraps the incoming SQL query into a sub-query. However some databases don't support sub-queries, and if that is the case, please use the DB Query Reader node to read data from the database.

The figure below shows the configuration dialog of the DB Query node. The configuration dialog of other advanced query nodes that allow user to write SQL statements provide a similar user experience. There is a text area to write your own SQL statement, which provides syntax highlighting and code completion by hitting Ctrl+Space. On the lower side there is an Evaluate button where you can evaluate the SQL statement and return the first 10 rows of the result. If there is an error in the SQL statement then an error message will be shown in the Evaluate window. On the left side there is the Database Metadata Browser window that allows you to browse the database metadata such as the tables and views and their corresponding columns. The Database Column List contains the columns that are available in the connected database table. Double clicking any of the items will insert its name at the current cursor position in the SQL statement area.

040 dbquery2

Figure 42. Configuration dialog of the DB Query node

DB Query

The DB Query node modifies the input SQL query from an incoming database data connection. The SQL query from the predecessor is represented by the place holder #table# and will be replaced during execution. The modified input query is then available at the outport.

DB Query Reader

Executes an entered SQL query and returns the result as KNIME data table. This node does not alter or wrap the query and thus supports all kinds of statements that return data.

This node supports other SQL statements beside SELECT, such as DESCRIBE TABLE.

Parameterized DB Query Reader

This node allows you to execute a SQL query with different parameters. It loops over the input KNIME table and takes the values from the input table to parameterise the input SQL query. Since the node has a KNIME data table input it provides a type mapping tab that allows you to change the mapping rules. For more information on the Type Mapping tab, please refer to the Type Mapping section.

DB Looping

This node runs SQL queries in the connected database restricted by the possible values given by the input table. It restricts each SQL query so that only rows that match the possible values from the input table are retrieved whereas the number of values per query can be defined. This node is usually used to execute IN queries e.g.

sql SELECT * FROM table WHERE Col1 IN ($)

During execution, the column placeholder $ will be replaced by a comma separated list of values from the input table.

Since the node has a KNIME data table input it provides a type mapping tab that allows you to change the mapping rules. For more information on the Type Mapping tab, please refer to the Type Mapping section.

An example of the usage of this node is available on KNIME Hub.