I have written A LOT of SQL in my career in data, and have picked up a bunch of tricks to help me work faster and more efficiently. Here are three of them for beginners:
1. Use “1=1” as your first ‘WHERE’ clause.
I learned this one about five years into my career. After reviewing a co-workers code, I asked him why he always starts his WHERE clause with 1=1. After he explained why I immediately understood it and adopted it from that day on.
In essence, using WHERE 1=1 allows for more convenient development and makes the query look more visually appealing.
The reason it’s convenient is because any time you are developing a new query, conducting an ad-hoc analysis, or inspecting new data, you are constantly tinkering with your WHERE clause by adding and removing conditions. Since every condition other than the first one is preceded by an AND, it helps to make your first condition 1=1 so that your first real condition has the same structure as the rest of them, making it easier to back out of clauses.
For example, consider the following query that doesn’t have 1=1 in the WHERE clause:
SELECT *
FROM [table] as t
WHERE t.column_a = 'condition_a'
AND t.column_b = 'condition_b'
AND t.column_c <> 'condition_c'
Because the first clause (t.column_a = ‘condition_a’) is the only clause not preceded with an AND, it becomes inconvenient and messy looking when you need to back out of it:
SELECT *
FROM [table] as t
WHERE /*t.column_a = 'condition_a'
AND */ t.column_b = 'condition_b'
AND t.column_c <> 'condition_c'
Compared with the same query using 1=1 in the WHERE clause, you can see how it might be more convenient as well as better looking:
SELECT *
FROM [table] as t
WHERE 1=1
--AND t.column_a = 'condition_a'
AND t.column_b = 'condition_b'
AND t.column_c <> 'condition_c'
Additionally, adding this 1=1 clause does not effect query performance.
2. Use more temporary tables.
Hopefully most SQL developers know this, but some junior developers might not. I didn’t when I first started. If you are tinkering with a complex or computationally expensive query, you will save a ton of time by creating a temp table of the data you know you need, opposed to constantly running an expensive query, tinkering with it, running it again, etc.
The temp table temporary caches the results of your query, meaning you can now just query against the temp table rather than the source tables. If a query takes 10 minutes to run for example, instead of running the query multiple times as you tinker with it, just create a temp table that does all of the work and then query against that. It will save you a lot of time.
Of course, if you need to put this code into production, you will not be able to use a temp table, as they only last the duration of your connection instance.
CREATE OR REPLACE TEMP TABLE [table_name] as
[Your Query Here]
3. Predicate Pushdown
I learned this one the hard way when I was a junior business intelligence analyst, and I wrote a query containing several nested subqueries. Except all of the filtering was being done in the outer-most query, resulting in the database working about 100 times harder than it needed to. When doing code review with my data architect, he gave me the same look my dad used to when instead of trying to learn the piano during lessons he paid for, I would just smash the keys violently.
Queries are computationally evaluated from the inside-out. So you want to “push down” parts of the query to the inner-most query where the data is being drawn from. This will make the query much more efficient and save you time and resources.
For example, in the following query, filtering is unnecessarily being applied in the outer query instead of the inner query, and is not using predicate pushdown:
SELECT *
FROM
(
SELECT *
FROM [table_a] t1
) t2
JOIN [table_b] t3 on [join conditions]
WHERE 1=1
AND t2.column_a = 'condition'
In this example, the WHERE clause is outside of the subquery, resulting in unnecessary excessive time and computation. We should “push” that clause down to the “predicate” query to fix this:
SELECT *
FROM
(
SELECT *
FROM [table_a] t1
WHERE 1=1
AND t1.column_a = 'condition ) t2
JOIN [table_b] t3 on [join conditions]
Thank you for reading, hope this helps!