Joins: Connecting the Dots for Data Clarity
So far, we’ve explored how to query data using Q-SQL and how to translate those queries into their corresponding functional forms. Now, let’s take it a step further and dive into joining datasets with KDB/Q. Joins are one of the most powerful tools in KDB/Q, and they play a significant role in its unmatched efficiency for big data analysis. Compared to other languages, KDB/Q joins are not only faster and more efficient but also uniquely versatile, featuring specialized joins like the asof join and window join.
While other databases are beginning to incorporate these advanced join types into their capabilities, KDB/Q has been perfecting them for decades, enabling you to answer complex questions about your data that would otherwise be challenging to tackle. Without further ado, let’s explore this fascinating topic!
Why we use Joins
Before we start exploring joins in greater detail, let’s take a step back and consider why we use them in the first place. Why not just store all our data in one massive table and eliminate the need for joins entirely? While possible, such an approach would violate many best practices in software development and lead to significant inefficiencies. The schema for such a table would be absolutely horrendous, with potentially hundreds of columns, making it difficult to manage and query.
This brings us to one of the core principles of database design: data normalization. Normalization is the process of organizing a relational database into structured tables following specific rules, called normal forms, to reduce redundancy and improve data integrity. While a deep dive into normalization is beyond the scope of this post, the gist is simple - data is divided into smaller, related tables using keys and reassembled when needed using joins.
Normalization eliminates duplicate data, saving storage space and ensuring consistency. Joins, in turn, restore the flat, rectangular structure needed for effective data analysis. For example, trading data might be stored in one table, while reference data like instrument details resides in another. Joins allow you to combine these datasets seamlessly, enabling meaningful analysis while adhering to good database design principles.
One of the key reasons KDB/Q excels in big data analytics is its exceptional efficiency in performing joins. Additionally, KDB/Q has featured as-of joins for nearly three decades - a groundbreaking innovation introduced by Arthur Whitney that enables users to answer questions that would otherwise remain unresolved.
In KDB/Q, joins can be broadly categorized into two main types: Equi (or Exact) Joins and As-of Joins. In the following sections we will look at both categories in more detail and walking through numerous examples to demonstrate their power and versatility.
Basic Joins
Technically, there’s a third category of joins in KDB/Q, though it’s more accurately described as a concatenation of tables or columns. Since a table in KDB/Q is essentially a list of conforming dictionaries (if this concept is unfamiliar, check out my dedicated blog post on tables here) , and because tables are first-class citizens in KDB/Q, you can use the simple concatenate operator , to merge tables or leverage the each-both iterator ' in combination with , to join all columns of two tables using ,'.
Let's look at some basic examples to illustrate how this works. We are using the simplified trade and quote tables for these examples:
Merge Tables Effortlessly: Joining Records with Concatenate ,
q)show trade:([] sym:`AAPL`MSFT`GOOG`IBM; price:123.0 45.9 234.4 20.0; qty: 100 45 200 300)
sym price qty
--------------
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
q)show quote:([] sym:`AAPL`MSFT`GOOG`IBM; bid:123.1 46.0 245.9 21.9; ask:124.8 46.1 246.2 23.9)
sym bid ask
----------------
AAPL 123.1 124.8
MSFT 46 46.1
GOOG 245.9 246.2
IBM 21.9 23.9
The first thing you’ll notice when attempting to use the concatenate operator to combine the trade table with the quote table is that it results in a mismatch error. This behavior is expected. Recall that a table in KDB/Q is essentially a list of conforming column dictionaries, and in this case, the trade and quote tables do not conform. The trade table has the columns sym, price and qty, while the quote table contains sym, bid, and ask.
q)trade,quote
'mismatch
[0] trade,quote
^
If we attempt to concatenate or join the trade table with itself, it works as expected. Since we are effectively joining the same table to itself, the schemas align perfectly, resulting in a table where each record appears twice. It’s important to note that the records are not sorted in any specific order—they appear sequentially based on how the two individual tables are joined.
q)trade,trade
sym price qty
--------------
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
The 0N! operator is a powerful tool when it comes to debugging or inspecting the underlying structure of a KBD/Q data type
The compatibility of the schemas becomes more apparent when you use the 0N! operator to inspect the underlying data structure of the trade table.
q)0N!trade,trade
+`sym`price`qty!(`AAPL`MSFT`GOOG`IBM`AAPL`MSFT`GOOG`IBM;123 45.9 234.4 20 123 45.9 234.4 20;100 45 200 300 100 45 200 300)
sym price qty
--------------
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
q)(trade;trade)
+`sym`price`qty!(`AAPL`MSFT`GOOG`IBM;123 45.9 234.4 20;100 45 200 300)
+`sym`price`qty!(`AAPL`MSFT`GOOG`IBM;123 45.9 234.4 20;100 45 200 300)
As long as the table schemas align, you can concatenate more than two tables without any issues.
q)trade,trade,trade
sym price qty
--------------
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
This type of join is particularly useful when combining two tables with identical schemas. However, it’s important to note that this join does not match on any keys, which means it can result in duplicate records.
Combining Columns: Mastering Concatenate and Each-Both Joins
Another useful technique is joining all columns of two tables. To achieve this, we use the concatenate operator , in combination with the each-both iterator ', resulting in ,'. Since a table in KDB/Q is essentially a list of conforming dictionaries, the concatenation is applied element-wise to each record from both tables. By leveraging the each-both iterator, the first record of the trade table (a dictionary) is joined with the first record of the quote table (also a dictionary), and so on, resulting in the desired merged output.
If you need a quick refresher on KDB/Q dictionaries and tables, check out my blog post here
Let's take a closer look at how this works in practice. First, let's confirm that a table is indeed a list of conform dictionaries. We can do so by simply creating a list of conform dictionaries:
q)(`sym`price`qty!(`AAPL;123;100);`sym`price`qty!(`MSFT;45.9;45);`sym`price`qty!(`GOOG;234.4;200);`sym`price`qty!(`IBM;20;300))
sym price qty
--------------
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
alternatively, a table can be seen as a flipped column dictionary.
q)flip (`sym`price`qty!(`AAPL;123;100);`sym`price`qty!(`MSFT;45.9;45);`sym`price`qty!(`GOOG;234.4;200);`sym`price`qty!(`IBM;20;300))
sym | AAPL MSFT GOOG IBM
price| 123 45.9 234.4 20
qty | 100 45 200 300
Building on the concept above, we can use the concatenate operator , alongside the each-both iterator ' to join each record from both tables using ,'. This operation adds any keys and values from the second dictionary that are missing in the first. However, if a key exists in both dictionaries, the value from the right dictionary takes precedence. The result of this operation is a list of conforming dictionaries, which, as we know, forms a valid table.
q)trade,'quote
sym price qty bid ask
--------------------------
AAPL 123 100 123.1 124.8
MSFT 45.9 45 46 46.1
GOOG 234.4 200 245.9 246.2
IBM 20 300 21.9 23.9
There are two key observations to make:
- If a key exists in both dictionaries, the value from the right dictionary takes precedence.
- For this type of join to work, both tables must have the same number of records, which is logical given the element-wise nature of the operation.
// Reverse quote to illustrate that the sym column values of the right table take precedence
q)(update symTrade:sym from trade),'reverse quote
sym price qty symTrade bid ask
-----------------------------------
IBM 123 100 AAPL 21.9 23.9
GOOG 45.9 45 MSFT 245.9 246.2
MSFT 234.4 200 GOOG 46 46.1
AAPL 20 300 IBM 123.1 124.8
// Use non conform lengths
q)trade,'2#quote
'length
[0] trade,'2#quote
^
With these two fundamental implicit joins covered, let’s move on to the real strength of KDB/Q native joins.
Equi (or Exact) Joins
The first category of joins, Equi (or Exact) Joins, is used when you have one or more columns that uniquely identify the records in a table. These identifiers serve as the basis for matching and combining data from two different datasets. The shape and content of the resulting table depend on the specific type of join used. In the following sections, we’ll explore all the exact joins: left join (lj), inner join (ij), union join (uj), plus join (pj), and equi join (ej). The common pattern across all these joins is the fact that they rely on one or more key columns to match and align records.
In our examples, we’ll reuse the trade and quote tables from the previous section. To enhance understanding, we’ll illustrate the outcome of each join with graphical representations, making it easier to grasp the mechanics behind each join.
q)trade
sym price qty
--------------
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
q)quote
sym bid ask
----------------
AAPL 123.1 124.8
MSFT 46 46.1
GOOG 245.9 246.2
IBM 21.9 23.9

Left Join lj
If you’re familiar with set theory, the concept of a left join (lj) will feel intuitive. However, for completeness, let’s break it down. Given an unkeyed or keyed table as the left operand and a keyed table as the right operand, a left join matches rows from the right table where the key column(s) align with those in the left table. The result includes all rows from the left table, with matching data from the right table added to it.

Syntax
table1 lj table2
where
- table1 is either a table or keyed table
- table2 is a keyed table
If the right table lacks data for any record in the left table, the corresponding new columns will contain nulls. However, if the source and target tables have duplicate non-key columns, the operation follows upsert semantics. This means the values in the right operand (target) columns will take precedence over those in the left operand (source). A key detail to remember is that the resulting table will always have the same number of records as the left table, neither expanding nor shrinking.
A practical use case for a left join might involve appending bid and ask prices to trade records or enriching trade data with reference information.
Let's look at the example where we join bid and ask prices to trade records:
Remember: The right operand needs to be a keyed table
// we key the quote table on the first column (sym)
q)trade lj 1!quote
sym price qty bid ask
--------------------------
AAPL 123 100 123.1 124.8
MSFT 45.9 45 46 46.1
GOOG 234.4 200 245.9 246.2
IBM 20 300 21.9 23.9
As mentioned above, if there are common non-key columns, the left join has upset semantics and the values of the right table prevail. Let's illustrate this concept:
// Adding a qty column to quote so we have a common column
q)trade lj 1!select sym,bid,ask,qty:10000 from quote where sym in `AAPL`GOOG
sym price qty bid ask
----------------------------
AAPL 123 10000 123.1 124.8
MSFT 45.9 45
GOOG 234.4 10000 245.9 246.2
IBM 20 300

// Adding bid and ask column to trades for illustration
q)(update bid:10.0,ask:11.0 from trade) lj 1!select sym,bid,ask,qty:10000 from quote where sym in `AAPL`GOOG
sym price qty bid ask
----------------------------
AAPL 123 10000 123.1 124.8
MSFT 45.9 45 10 11
GOOG 234.4 10000 245.9 246.2
IBM 20 300 10 11

In the first example above, we added the qty column to the quotes table to create a shared non-key column. As observed, the resulting join updated the values in the qty column wherever the keys matched (to ensure not all keys matched, we selected only Apple and Google records from the quotes table). For records without a match, the corresponding columns were populated with null values (evident in the bid and ask columns for Microsoft and IBM).
In the second example, we added bid and ask columns to the trade table to demonstrate that only records with matching key columns are updated, while all non-matching records remain unchanged.
When performing a left join on two keyed tables, the result is as expected, with the distinction that the resulting table remains keyed rather than unkeyed.
q)(1!trade)lj 1!quote
sym | price qty bid ask
----| ---------------------
AAPL| 123 100 123.1 124.8
MSFT| 45.9 45 46 46.1
GOOG| 234.4 200 245.9 246.2
IBM | 20 300 21.9 23.9
Although KDB/Q does not have a dedicated right join, you can achieve the same result as a SQL right join by simply swapping the arguments of a left join.
// Swapping the arguments to obtain a right join
q)quote lj 1!trade
sym bid ask price qty
--------------------------
AAPL 123.1 124.8 123 100
MSFT 46 46.1 45.9 45
GOOG 245.9 246.2 234.4 200
IBM 21.9 23.9 20 300
Inner Join ij
The Inner Join ij works similarly to the left join, with one key difference: the resulting table includes only the records present in both the left (source) and right (target) table. As with a left join, the right operand must be a keyed table (target), while the left operand can be a table or keyed table (source) with column(s) that serve as foreign key(s) to the target or match the target's key column(s) in name and type. Matching is performed on the common column name(s) between the source and the key column(s) of the target, returning only those records with matching keys in both tables.

Syntax
table1 ij table2
where
- table1 is either a table or keyed table
- table2 is a keyed table
q)trade ij 1!update sym:`X from quote where sym in `AAPL`IBM
sym price qty bid ask
--------------------------
MSFT 45.9 45 46 46.1
GOOG 234.4 200 245.9 246.2

Union Join uj
As the name suggests, a union join merges data from two tables. In terms of set theory, the union join is analogous to the union of two sets. It vertically and horizontally combines two tables or keyed tables into a single table. The resulting table is expanded to include new columns and rows from the right operand that do not exist in the left operand, with the same name and type. Records from the left operand appear first in the result, with null values in any newly added columns. Records from the right operand follow, with their field values placed in the corresponding columns. Let’s explore this concept with some examples — it will make things clearer.

Syntax
Unlike left and inner joins, the union join operates exclusively on two unkeyed tables or two keyed tables - it does not support combining a keyed table with an unkeyed table.
table1 uj table2
keyedTable1 uj keyedTable2
Union Join for unkeyed table
When neither table has a key, the union join appends the right table to the left table, filling non-common columns with null values.
q)trade uj quote
sym price qty bid ask
--------------------------
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
AAPL 123.1 124.8
MSFT 46 46.1
GOOG 245.9 246.2
IBM 21.9 23.9
q)quote uj trade
sym bid ask price qty
--------------------------
AAPL 123.1 124.8
MSFT 46 46.1
GOOG 245.9 246.2
IBM 21.9 23.9
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
// Add a common column qty to quote
q)trade uj update qty:10 20 30 40 from quote
sym price qty bid ask
--------------------------
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
AAPL 10 123.1 124.8
MSFT 20 46 46.1
GOOG 30 245.9 246.2
IBM 40 21.9 23.9

As demonstrated in the examples above, the union join of two unkeyed tables combines the tables both vertically and horizontally, filling any missing cells with null values. This behavior becomes particularly evident when we examine the first record of the resulting table.
q)first trade uj update qty:10 from quote
sym | `AAPL
price| 123f
qty | 100
bid | 0n
ask | 0n
Union Join for keyed tables
When performing a union join on two keyed tables, the operation follows upsert semantics for both rows and columns. This behavior is best understood through an example.
q)(1!trade) uj 1!quote
sym | price qty bid ask
----| ---------------------
AAPL| 123 100 123.1 124.8
MSFT| 45.9 45 46 46.1
GOOG| 234.4 200 245.9 246.2
IBM | 20 300 21.9 23.9
Similar to other join operations, if there are common non-key columns, the values from the right operand (table) take precedence.
q)trade
sym price qty
--------------
AAPL 123 100
MSFT 45.9 45
GOOG 234.4 200
IBM 20 300
q)quote
sym bid ask
----------------
AAPL 123.1 124.8
MSFT 46 46.1
GOOG 245.9 246.2
IBM 21.9 23.9
q)update qty:10 20 from quote where sym in `MSFT`GOOG
sym bid ask qty
--------------------
AAPL 123.1 124.8
MSFT 46 46.1 10
GOOG 245.9 246.2 20
IBM 21.9 23.9
q)(1!trade) uj 1!update qty:10 20 from quote where sym in `MSFT`GOOG
sym | price qty bid ask
----| ---------------------
AAPL| 123 123.1 124.8
MSFT| 45.9 10 46 46.1
GOOG| 234.4 20 245.9 246.2
IBM | 20 21.9 23.9

The Danger of Union Joins: Handle with Care

Like Uncle Ben wisely told Peter Parker in Spider-Man: "With great power comes great responsibility.", the same principle applies to union joins. While incredibly powerful, they require careful handling. The upsert semantics of a union join do not enforce data type conformity, meaning it will upsert values regardless of their type. Let’s examine this in detail.
q)show t1:([] a:"ABC"; b:1 2 3; c:("tick";"tack";"toe"))
a b c
----------
A 1 "tick"
B 2 "tack"
C 3 "toe"
q)show t2:([] a:`A`B`C; b:1 2 3f; c:3?0Ng)
a b c
----------------------------------------
A 1 8c6b8b64-6815-6084-0a3e-178401251b68
B 2 5ae7962d-49f2-404d-5aec-f7c8abbae288
C 3 5a580fb6-656b-5e69-d445-417ebfe71994
q)t1 uj t2
a b c
-------------------------------------------
"A" 1 "tick"
"B" 2 "tack"
"C" 3 "toe"
`A 1f 8c6b8b64-6815-6084-0a3e-178401251b68
`B 2f 5ae7962d-49f2-404d-5aec-f7c8abbae288
`C 3f 5a580fb6-656b-5e69-d445-417ebfe71994
As shown in the example above, a union join on two unkeyed tables completely disregards data types. If not approached carefully, this can result in a production run-time nightmare. To avoid such pitfalls, be the (super)hero of your KDB/Q application! Use your power wisely by opting for upsert instead of a union join when your primary goal is to upsert data. Upsert operates by using the amend functionality, either modifying the existing data structure in place or creating a new copy. In doing so, it ensures data type conformity and will throw an error if there is a type mismatch.
q)`t1 upsert t2
'type
[0] `t1 upsert t2
^
q)upsert
.[;();,;]
q)1 2 3 upsert 3 4 5f
'type
[0] 1 2 3 upsert 3 4 5f
^
q).[1 2 3;();,;4 5 6f]
'type
[0] .[1 2 3;();,;4 5 6f]
^
q)1 2 3 union 4 5 6f
1
2
3
4f
5f
6f
q)union
?,
q)1 2 3,4 5 6f
1
2
3
4f
5f
6f
The examples above clearly illustrate that upsert enforces data type conformity and raises a type error if any violations occur. A union join on keyed tables ensures data type conformity for the key column and will only succeed if the key column's data types are consistent. Unfortunately, data type conformity for non-key columns is not enforced.
q)show t3:([] a:"ABC"; b:3 4 5f; c:3?0Ng)
a b c
----------------------------------------
A 3 409031f3-b19c-6770-ee84-6e9369c98697
B 4 52cb20d9-f12c-9963-2829-3c64d8d8cb14
C 5 cddeceef-9ee9-3847-9172-3e3d7ab39b26
q)(1!t1) uj 1!t3
a| b c
-| --------------------------------------
A| 3 409031f3-b19c-6770-ee84-6e9369c98697
B| 4 52cb20d9-f12c-9963-2829-3c64d8d8cb14
C| 5 cddeceef-9ee9-3847-9172-3e3d7ab39b26
q)meta t1
c| t f a
-| -----
a| c
b| j
c| C
q)meta t2
c| t f a
-| -----
a| s
b| f
c| g
q)(1!t1) uj (1!t2)
'type
[0] (1!t1) uj (1!t2)
^