Introduction
Subqueries, also known as sub-selects or nested-selects, are queries nested inside another SQL query statement. They are a powerful tool in SQL, particularly when handling intricate data and logic. Subqueries offer a structured methodology, reducing code repetition, allowing dynamic filtering, facilitating complex operations, and enabling filtering before joining. They also present a modular structure that can potentially enhance query performance. This article delves into the world of subqueries, their benefits, and how to master them to write more effective SQL queries.
Benefits of Using Subqueries
Structured approach: Subqueries break down complex logic into smaller, bite-sized pieces, making the code easier to understand and follow. This is especially helpful for collaborative work or revisiting code later.
Reduced repetition: You can avoid duplicating complex calculations or conditions by extracting them into subqueries. This keeps the code concise and reduces the risk of errors.
Dynamic conditions: Subqueries allow you to filter data based on dynamic criteria derived from other tables or calculations. This enables sophisticated data analysis and manipulation.
Complex operations: You can perform calculations or comparisons on data returned by the subquery, achieving results that might not be possible with single-query methods.
Filtering before joining: When properly optimized, subqueries can help avoid unnecessary joins on large datasets. Filtering data through subqueries before joining can lead to faster query execution.
Modular structure: Subqueries provide modularity, which can help the database engine independently optimize different parts of the query. This can sometimes lead to improved performance compared to complex single-query approaches.
Types of Sub Queries
There are three main types of subqueries:
Expression Subqueries
Table Subqueries
Volatile Subqueries
Expression Subquery
These are used in a query whenever expressions are valid. They return a single value as opposed to a column or table. There are four types of expression subqueries:
Scalar Subquery
Array Subquery
In Subquery
Exists Subquery
Scalar Subquery
These are used in common table expressions. It is essentially a nested query within a larger SQL query that produces a single result, often used as a value within an expression or condition.
For illustration, I will use the flight dataset that I used in this post.
select sub.origin_city,sub.destination_city
from
(select * from flight f where destination_city ='Atlanta' ) sub
where sub.airline_code ='DL'
How this works is the sql script runs the 'inner query'- the one in the parenthesis and then uses the result of the inner query as a table to run the outer query.
Array Subquery
An array subquery is a SQL query that is nested within another query and returns its results as an array.
SELECT (ARRAY(SELECT origin_city FROM flight))[1:3] AS Origin_city;
In this subqery, it is to return an array of three cities from the flight table, from the first to the third city in that order.
IN Subquery
An IN subquery is a special type of subquery used in SQL to filter data based on whether it matches values returned by another query. Think of it as comparing your data against a "whitelist" created by the subquery. It usually returns a boolean value.
Here's how it works:
The outer query selects specific data.
Within the WHERE clause, it uses the IN keyword.
Following IN, a subquery is enclosed in parentheses. This subquery retrieves a set of values.
The outer query's data is filtered based on whether each row's relevant column value is present in the subquery's results.
Let us see an example
select 'Atlanta' in (select origin_city from flight f)
Exists Subquery
This returns true if the subquery returns one or more rows and returns false if the subquery returns zero rows.
select exists(select origin_city from flight where origin_city='Detriot');
Table Subquery
A table subquery, also known as a nested query or inner query, is a query that is embedded within another SQL (Structured Query Language) query, typically in the FROM or WHERE clause. The result of the table subquery is used by the outer query to perform further operations or filtering.
select results.origin_city
from(
select
day:: Date as day, origin_city,count(*)as num_flights from flight
group by day, origin_city)results
Volatile Subquery
This type does not always produce the same result over the same input. That it is non-deterministic.
Conclusion
Mastering subqueries can greatly enhance your SQL querying capabilities. Subqueries provide a structured approach to writing complex queries, reducing repetition, enabling dynamic conditions, facilitating complex operations, and allowing for filtering before joining. They also offer a modular structure that can potentially improve query performance. Understanding the different types of subqueries - expression, table, and volatile - and their uses can unlock new levels of data analysis and manipulation.