Offset window functions are very useful tools you should know when querying databases in Microsoft SQL Server.
There are several different window functions available to us. You should know all of them if you want to be the world’s best database developer.
In this tutorial, we’re going to discuss the four Offset window functions: FIRST_VALUE, LAST_VALUE, LAG and LEAD.
If you missed the introduction to window functions, you should definitely check that out. We’ll be using terms like “window partition” and “window frame”, which are all discussed in the full introduction to window functions in SQL Server. You can find the intro here:
SQL Server Window Functions: An introduction for beginners
Also, make sure you download your FREE Ebook:
FREE Ebook on SQL Server Window Functions!
This FREE EBook discusses absolutely everything you need to know about window functions in SQL Server, including everything discussed in this tutorial. This EBook will definitely be a great resource for you to reference throughout your career as a data professional. Get it today!
In this tutorial, we will discuss the following topics about offset window functions:
- What is an Offset window function?
- Setting up some data
- Understanding FIRST_VALUE and LAST_VALUE
- Understanding LAG and LEAD
- Tips, tricks, and links
Let’s get into it:
1. What is an Offset window function?
Offset functions allow us to return an element from a row that is at the beginning or end of a window frame, or is a defined offset from the current row.
Here are some rules you should know about offset window functions:
- All Offset functions require a window order clause
- The FIRST_VALUE and LAST_VALUE functions require a window frame clause. You can choose to leave out the frame and SQL will use
RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
by default. - LAG and LEAD do not support using a window frame
- For all Offset functions, a window partition clause is optional.
Not sure what I’m talking about? It’s because you didn’t read the introduction to window functions!
There are four offset window functions available to us: FIRST_VALUE, LAST_VALUE, LAG, and LEAD.
2. Setting up some data
To understand how the different offset window functions work, we need to set up some data. If you have been following along with any of the other window function tutorials, you will already have this data set up in your environment.
Run these SELECT statements to see if you already have the data:
SELECT * FROM Customers SELECT * FROM Products SELECT * FROM Orders
The data should look like this:
If you don’t have the data, let’s go ahead and add it. Here’s the Customers table:
--Create Customers table CREATE TABLE Customers ( CustID INT PRIMARY KEY IDENTITY(50,5), FirstName VARCHAR(20), LastName VARCHAR(20) ) --Populate Customers it with data INSERT INTO Customers(FirstName, LastName) VALUES ('Joshua','Porter'), ('Andrew','Bluefield'), ('Jack','Donovan'), ('Cindy','Thatcher'), ('Gordon','Acres'), ('Gretchen','Hamilton')
The Products table:
--Create Products table CREATE TABLE Products ( ProductID INT PRIMARY KEY IDENTITY(20,2), ProductName VARCHAR(20), Price DECIMAL(5,2) ) --Populate Products table with data INSERT INTO Products (ProductName, Price) VALUES ('Large Bench',198.00), ('Small Bench',169.40), ('Coffee Table',220.00), ('Side Tables',265.20), ('Coat Rack',45.00)
Finally, the Orders table:
--Create Orders table CREATE TABLE Orders ( OrderID INT PRIMARY KEY IDENTITY(100,10), CustID INT, ProdID INT, Qty TINYINT, Orderdate DATETIME ) --Populate Orders table with data INSERT INTO Orders(CustID, ProdID, Qty, Orderdate) VALUES (55, 22, 1, '6/1/2021'), (60, 28, 2, '6/6/2021'), (75, 26, 1, '6/13/2021'), (50, 20, 1, '7/1/2021'), (55, 28, 1, '7/6/2021'), (65, 24, 1, '7/14/2021'), (55, 26, 1, '7/18/2021'), (50, 26, 1, '7/24/2021'), (70, 24, 1, '8/6/2021'), (70, 26, 1, '8/6/2021'), (70, 22, 3, '9/1/2021')
Notice the Orders table contains unofficial foreign key links to both the Customers and Products tables. I say they are “unofficial” because I didn’t actually create a foreign key constraint between the tables. It’s not necessary for this tutorial.
Let’s run a quick query with a couple of JOINS to see all the data laid out nicely for us:
SELECT C.CustID, C.FirstName, C.LastName, P.ProductName, P.Price as 'Product Price', O.Qty, O.OrderDate FROM Orders as O INNER JOIN Customers AS C ON O.CustID = C.CustID INNER JOIN Products as P ON O.ProdID = P.ProductID
This query will give us the following result:
Nice. In this result set, we can see some really great details about each order. We can see the following details:
- Who made the order
- What they bought
- How much that product costs
- How many of that product they bought
- The date they made the order
Ok, now that we have some data set up, we can show you examples of using offset window functions.
3. Understanding FIRST_VALUE and LAST_VALUE
The FIRST_VALUE and LAST_VALUE window functions will return an element from the first or last row in a window frame, respectively.
An element is simply a column value.
The FIRST_VALUE and LAST_VALUE functions take one argument, which is the name of the column for which you want to see details (a.k.a. the element).
Understanding FIRST_VALUE
Let’s think about an example of where the FIRST_VALUE function might be useful. What if we want to know the very first product each customer purchased from us? In other words, we want to see what product originally peaked their interest and got them in the door. We might consider marketing those products better so that we get more people in the door.
As mentioned earlier, the FIRST_VALUE function takes one argument, which is the name of the column for which you want to see details (a.k.a. the element).
Here’s an example:
Let’s talk about what we have outlined in red in the query above:
- The FIRST_VALUE function takes one argument: The name of the element you want to see. In our case, we want to see the name of the product the customer bought.
- In the OVER clause, we have a PARTITION clause because we want to see a breakdown by customer (we want to partition the data by customer).
- Also in the OVER clause, we have our window order clause, which in our case is “ORDER BY O.OrderDate“. If you think about it, the task of getting a first value doesn’t make sense if your data isn’t ordered in some way. This is why a window order clause is required when writing a query using FIRST_VALUE (or any offset window function).
- Finally, we also have a window frame in the OVER clause. The FIRST_VALUE function requires that you use a window frame.
Understanding the window frame when using FIRST_VALUE
If you ask me, the only frame that makes sense is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
(which can be written in a shorthand version:Â ROWS UNBOUNDED PRECEDING
) . When working with a window function, SQL will process rows one at a time. When using FIRST_VALUE, you want to make sure when it’s processing a row, it considers everything before the current row. We do this by specifying UNBOUNDED PRECEDING
as the “preceding” delimiter. And we don’t care about anything after the current row, which is why we specify CURRENT ROW
as the “following” delimiter and not UNBOUNDED FOLLOWING
. Why force SQL Server consider those rows when we know it won’t gather anything from them anyway?
Also, you can either use the ROWS window unit or the RANGE window unit depending on your needs. You’ll see me use them interchangeably in this tutorial, but you should definitely understand the difference between them (read the intro tutorial!)
Examining the results
So for our customer Joshua Porter, for example, it looks like the first product he purchased was the Large Bench. Of course, the product name is repeated for every row in his partition, but that’s ok.
For our customer Gordon Acres, it looks like the first product he purchased was the Coffee Table.
You get the idea.
I want to point out that we could have left out the PARTITION clause and the entire result set would have been treated as one large partition. In that case, our “FirstItemPurchased” column would show the first product ever purchased. Period. Here’s what it looks like:
(I’ve also demonstrated how you can leave off the window frame and SQL defaults to using the frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
).
As you can see, the first value of “Small Bench” is repeated for all rows in the partition, which is ok!
Understanding LAST_VALUE
Folks, LAST_VALUE is the same idea, except is returns details about the last row in the frame.
You need to think about the window frame you want to use with LAST_VALUE. You basically want to make sure the ending delimiter is UNBOUNDED FOLLOWING
. Remember, when working with a window function, SQL will process rows one at a time. You want to make sure when it’s on a row, it considers everything after the current row too. We do this by specifying UNBOUNDED FOLLOWING
.
Here’s an example:
Notice the order of the OrderDates. Now, they are in descending order. In other words, in each partition, it’s showing the last (a.k.a. most recent) item purchased at the top.
Again, notice the frame. I could have used ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
and gotten the same results, but when it comes to LAST_VALUE, we don’t really care about anything before the current row, right? So why make SQL Server consider those rows when we know it won’t gather anything from them anyway?
Gnomesayin?
4. Understanding LAG and LEAD
LAG and LEAD will operate on each row in a partition, starting from the top and working it’s way down. They will return an element that is a specified offset from before or after the current row, respectively.
An element is simply a column value.
The LAG and LEAD functions take three arguments:
- The element you want to return
- Offset value, which is just an integer. You can leave out this argument and it will default to 1.
- A default value if there is no element at the specified offset. You can leave out this argument and it will default to NULL.
Understanding LAG
Let’s think of an example of some data we might want to see. What if we want to see a comparison between how much money a customer spent during a visit to our store versus how much money they spent during their previous visit to our store.
Ideally, we want to see that customers are spending more money every time they come to our store. We want to see that a customer spent more this time than they did last time.
LAG and LEAD are great for doing trend analysis like that.
Ok, let’s take a look at a good query to show us that information:
There are a few details to discuss about this query:
- The element in our LAG function is the Price column. This means the function will return the Price value from the offset row.
- I omitted the offset and default arguments in our LAG function call. They become 1 and NULL respectively. This means for every row in a partition, SQL will look for the Price value located 1 row behind the current row we’re on. If there simply is no row 1 row behind the current, NULL will be returned.
- We’re partitioning by CustID. This means we’re seeing a breakdown for each individual customer.
- Remember, neither LAG nor LEAD support a window frame.
LAG and LEAD operate on one row at a time, from top to bottom, in each partition
Let’s look specifically at rows for our customer Joshua Porter:
Let’s walk down these rows, one at a time, like SQL will do. It starts at the first row:
So remember, our LAG function is looking for the Price value 1 row behind the current row. Notice there simply is no row behind the current, so that’s why NULL is put as the value in our “Cost of previous Product purchased” column.
Then SQL moves on to the next row:
For this row, we once again want to grab the Price value 1 row behind us. This time, there is a value there, and it’s $198.00. We plug that value into our “Cost of previous product purchased” column.
Then we’re done with this partition. SQL will move on to the next one!
For giggles, let’s talk about the first row in the next partition (which contains rows for Andrew Bluefield)
We understand why we see NULL in our LAG column, right?
This is the first row in this partition. There simply is no row behind this row in this partition. Therefore, NULL is used!
Understanding LEAD
Folks, LEAD is very similar to LAG, except of course the difference is LEAD will return an element that is a specified offset from after (a.k.a in front of) the current row.
Let’s look at an example. This time, I’ll specify all three arguments:
I still want an offset of just 1, but instead of NULL as the default value, I chose to use 0.00.
(IMPORTANT: You should know that the default value you specify needs to be the same data type as your element. In my case, the element is Price, which is a decimal data type).
Again folks, it’s same same idea, but now we’re looking at the next row in the partition.
Let’s go down the list of orders for our customer Gordon Acres, starting with his first row:
If we look 1 row ahead of the current row, the Price is $265.20. We put that value in our LEAD column, which is called “Cost of next Product purchased“.
Move on to the next row:
One row ahead of this row has a Price of $169.40. Plug that value into our column.
Finally, the last row:
In this partition, there simply is no row one row ahead of the row we’re on, so the default value of 0.00 get’s plugged into our column.
5. Tips, tricks, and links
Here is a list of a few tips and tricks you should know when working with the different offset window functions in SQL Server:
Tip # 1: FIRST_VALUE and LAST_VALUE were introduced in SQL Server 2012
If you try to use FIRST_VALUE or LAST_VALUE in a SQL Server version that is older than 2012, it won’t work. These functions were introduced with SQL Server 2012, so they will only work with that version or higher.
There is, however, a roundabout way of getting a first value without using the FIRST_VALUE function. It involves creating a table expression (like a CTE) that uses the ROW_NUMBER window function, then filtering the results of that table expression to only display rows where the value in the ROW_NUMBER column is 1.
Here is a great Stack Overflow article that talks about how to do it: Alternate to FIRST_VALUE in SQL Server 2008.
Tip # 2: When using FIRST_VALUE or LAST_VALUE, you should always specify a window frame
Earlier, I demonstrated the use of FIRST_VALUE without a window frame. I discussed how when the frame is omitted from your OVER clause, SQL will use the frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
by default. The thing about the RANGE
window unit is that it’s not as efficient as the ROWS
window unit. Microsoft makes that very clear. So do yourself a favor and just always specify your own frame!
Tip # 3: When using LAG or LEAD, you can leave out a default value if you want to
This will work fine:
I specified the element name and the offset value, but NOT the default value. That’s fine.
Understand you can’t specify just an element and a default value, however.
Tip # 4: Offset functions can only be used in the SELECT list or ORDER BY clause.
You can only use FIRST_VALUE, LAST_VALUE, LAG, or LEAD in the SELECT list or an ORDER BY clause. This is actually true for all window functions.
Links:
Everything discussed in this tutorial can be found in the following FREE Ebook:
Download your FREE SQL Server Window Functions EBook!
This FREE EBook discusses absolutely everything you need to know about window functions in SQL Server, including everything discussed in this tutorial. A proper understanding of window functions is essential for anyone looking to enter the field of data science. This Ebook will definitely be a great resource for you to reference throughout your career. Get it today!
Next Steps:
Leave a comment if you found this tutorial helpful!
Now that you know the four Offset window functions, you should learn all about the four ranking window functions available to us in SQL Server. You should know all the different window functions if you want to be a great data scientist! Check out the tutorial here:
SQL Server Ranking Window Functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE
Also, if you missed the introduction to window functions, it discusses the different aggregate window functions you can use in SQL Server. That tutorial is a MUST-READ, because it discusses all the parts involved in a window function that you need to know in order to write a window function query correctly. Don’t skip that tutorial, you can find it here:
SQL Server Window Functions: An introduction for beginners
Thank you very much for reading!
Make sure you subscribe to my newsletter to receive special offers and notifications anytime a new tutorial is released!
If you have any questions, or if you are struggling with a different topic related to SQL Server, I’d be happy to discuss it. Leave a comment or visit my contact page and send me an email!