SQL Server Offset Window Functions: FIRST_VALUE, LAST_VALUE, LAG, LEAD

offset window functions featured image
Reading Time: 12 minutes

Offset window functions are very useful tools you should know when querying databases in Microsoft SQL Server.



There are several different window functions available to us. You should know all of them if you want to be the world’s best database developer.

In this tutorial, we’re going to discuss the four Offset window functions: FIRST_VALUE, LAST_VALUE, LAG and LEAD.

If you missed the introduction to window functions, you should definitely check that out. We’ll be using terms like “window partition” and “window frame”, which are all discussed in the full introduction to window functions in SQL Server. You can find the intro here:

SQL Server Window Functions: An introduction for beginners

Also, make sure you download your FREE Ebook:

FREE Ebook on SQL Server Window Functions!

This FREE EBook discusses absolutely everything you need to know about window functions in SQL Server, including everything discussed in this tutorial. This EBook will definitely be a great resource for you to reference throughout your career as a data professional. Get it today!

In this tutorial, we will discuss the following topics about offset window functions:

  1. What is an Offset window function?
  2. Setting up some data
  3. Understanding FIRST_VALUE and LAST_VALUE
  4. Understanding LAG and LEAD
  5. Tips, tricks, and links

Let’s get into it:

1. What is an Offset window function?

Offset functions allow us to return an element from a row that is at the beginning or end of a window frame, or is a defined offset from the current row.

Here are some rules you should know about offset window functions:

  • All Offset functions require a window order clause
  • The FIRST_VALUE and LAST_VALUE functions require a window frame clause. You can choose to leave out the frame and SQL will use RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW by default.
  • LAG and LEAD do not support using a window frame
  • For all Offset functions, a window partition clause is optional.

Not sure what I’m talking about? It’s because you didn’t read the introduction to window functions!

There are four offset window functions available to us: FIRST_VALUE, LAST_VALUE, LAG, and LEAD.



2. Setting up some data

To understand how the different offset window functions work, we need to set up some data. If you have been following along with any of the other window function tutorials, you will already have this data set up in your environment.

Run these SELECT statements to see if you already have the data:

SELECT * FROM Customers
SELECT * FROM Products
SELECT * FROM Orders

The data should look like this:

first_value setting up some data

If you don’t have the data, let’s go ahead and add it. Here’s the Customers table:

--Create Customers table
CREATE TABLE Customers
(
CustID INT PRIMARY KEY IDENTITY(50,5),
FirstName VARCHAR(20),
LastName VARCHAR(20)
)

--Populate Customers it with data
INSERT INTO Customers(FirstName, LastName)
VALUES
('Joshua','Porter'),
('Andrew','Bluefield'),
('Jack','Donovan'),
('Cindy','Thatcher'),
('Gordon','Acres'),
('Gretchen','Hamilton')

The Products table:

--Create Products table
CREATE TABLE Products
(
ProductID INT PRIMARY KEY IDENTITY(20,2),
ProductName VARCHAR(20),
Price DECIMAL(5,2)
)

--Populate Products table with data
INSERT INTO Products (ProductName, Price)
VALUES
('Large Bench',198.00),
('Small Bench',169.40),
('Coffee Table',220.00),
('Side Tables',265.20),
('Coat Rack',45.00)

Finally, the Orders table:

--Create Orders table
CREATE TABLE Orders
(
OrderID INT PRIMARY KEY IDENTITY(100,10),
CustID INT,
ProdID INT,
Qty TINYINT,
Orderdate DATETIME
)

--Populate Orders table with data
INSERT INTO Orders(CustID, ProdID, Qty, Orderdate)
VALUES
(55, 22, 1, '6/1/2021'),
(60, 28, 2, '6/6/2021'),
(75, 26, 1, '6/13/2021'),
(50, 20, 1, '7/1/2021'),
(55, 28, 1, '7/6/2021'),
(65, 24, 1, '7/14/2021'),
(55, 26, 1, '7/18/2021'),
(50, 26, 1, '7/24/2021'),
(70, 24, 1, '8/6/2021'),
(70, 26, 1, '8/6/2021'),
(70, 22, 3, '9/1/2021')

Notice the Orders table contains unofficial foreign key links to both the Customers and Products tables. I say they are “unofficial” because I didn’t actually create a foreign key constraint between the tables. It’s not necessary for this tutorial.

Let’s run a quick query with a couple of JOINS to see all the data laid out nicely for us:

SELECT C.CustID, C.FirstName, C.LastName, P.ProductName, P.Price as 'Product Price', 
O.Qty, O.OrderDate
FROM Orders as O
INNER JOIN Customers AS C
ON O.CustID = C.CustID
INNER JOIN Products as P
ON O.ProdID = P.ProductID

This query will give us the following result:

first_value window functions first query

Nice. In this result set, we can see some really great details about each order. We can see the following details:

  • Who made the order
  • What they bought
  • How much that product costs
  • How many of that product they bought
  • The date they made the order

Ok, now that we have some data set up, we can show you examples of using offset window functions.



3. Understanding FIRST_VALUE and LAST_VALUE

The FIRST_VALUE and LAST_VALUE window functions will return an element from the first or last row in a window frame, respectively.

An element is simply a column value.

The FIRST_VALUE and LAST_VALUE functions take one argument, which is the name of the column for which you want to see details (a.k.a. the element).

Understanding FIRST_VALUE

Let’s think about an example of where the FIRST_VALUE function might be useful. What if we want to know the very first product each customer purchased from us? In other words, we want to see what product originally peaked their interest and got them in the door. We might consider marketing those products better so that we get more people in the door.

As mentioned earlier, the FIRST_VALUE function takes one argument, which is the name of the column for which you want to see details (a.k.a. the element).

Here’s an example:

window function first value example

Let’s talk about what we have outlined in red in the query above:

  1. The FIRST_VALUE function takes one argument: The name of the element you want to see. In our case, we want to see the name of the product the customer bought.
  2. In the OVER clause, we have a PARTITION clause because we want to see a breakdown by customer (we want to partition the data by customer).
  3. Also in the OVER clause, we have our window order clause, which in our case is “ORDER BY O.OrderDate“. If you think about it, the task of getting a first value doesn’t make sense if your data isn’t ordered in some way. This is why a window order clause is required when writing a query using FIRST_VALUE (or any offset window function).
  4. Finally, we also have a window frame in the OVER clause. The FIRST_VALUE function requires that you use a window frame.

Understanding the window frame when using FIRST_VALUE

If you ask me, the only frame that makes sense is ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW(which can be written in a shorthand version:  ROWS UNBOUNDED PRECEDING ) . When working with a window function, SQL will process rows one at a time. When using FIRST_VALUE, you want to make sure when it’s processing a row, it considers everything before the current row. We do this by specifying UNBOUNDED PRECEDINGas the “preceding” delimiter. And we don’t care about anything after the current row, which is why we specify  CURRENT ROW as the “following” delimiter and not UNBOUNDED FOLLOWING. Why force SQL Server consider those rows when we know it won’t gather anything from them anyway?

Also, you can either use the ROWS window unit or the RANGE window unit depending on your needs. You’ll see me use them interchangeably in this tutorial, but you should definitely understand the difference between them (read the intro tutorial!)

Examining the results

So for our customer Joshua Porter, for example, it looks like the first product he purchased was the Large Bench. Of course, the product name is repeated for every row in his partition, but that’s ok.

For our customer Gordon Acres, it looks like the first product he purchased was the Coffee Table.

You get the idea.

I want to point out that we could have left out the PARTITION clause and the entire result set would have been treated as one large partition. In that case, our “FirstItemPurchased” column would show the first product ever purchased. Period. Here’s what it looks like:

first_value no partition

(I’ve also demonstrated how you can leave off the window frame and SQL defaults to using the frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW).

As you can see, the first value of “Small Bench” is repeated for all rows in the partition, which is ok!

Understanding LAST_VALUE

Folks, LAST_VALUE is the same idea, except is returns details about the last row in the frame.

You need to think about the window frame you want to use with LAST_VALUE. You basically want to make sure the ending delimiter is UNBOUNDED FOLLOWING. Remember, when working with a window function, SQL will process rows one at a time. You want to make sure when it’s on a row, it considers everything after the current row too. We do this by specifying UNBOUNDED FOLLOWING.

Here’s an example:

first_value window functions last value

Notice the order of the OrderDates. Now, they are in descending order. In other words, in each partition, it’s showing the last (a.k.a. most recent) item purchased at the top.

Again, notice the frame. I could have used ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING and gotten the same results, but when it comes to LAST_VALUE, we don’t really care about anything before the current row, right? So why make SQL Server consider those rows when we know it won’t gather anything from them anyway?

Gnomesayin?



4. Understanding LAG and LEAD

LAG and LEAD will operate on each row in a partition, starting from the top and working it’s way down. They will return an element that is a specified offset from before or after the current row, respectively.

An element is simply a column value.

The LAG and LEAD functions take three arguments:

  1. The element you want to return
  2. Offset value, which is just an integer. You can leave out this argument and it will default to 1.
  3. A default value if there is no element at the specified offset. You can leave out this argument and it will default to NULL.

Understanding LAG

Let’s think of an example of some data we might want to see. What if we want to see a comparison between how much money a customer spent during a visit to our store versus how much money they spent during their previous visit to our store.

Ideally, we want to see that customers are spending more money every time they come to our store. We want to see that a customer spent more this time than they did last time.

LAG and LEAD are great for doing trend analysis like that.

Ok, let’s take a look at a good query to show us that information:

window functions lag 2

There are a few details to discuss about this query:

  • The element in our LAG function is the Price column. This means the function will return the Price value from the offset row.
  • I omitted the offset and default arguments in our LAG function call. They become 1 and NULL respectively. This means for every row in a partition, SQL will look for the Price value located 1 row behind the current row we’re on. If there simply is no row 1 row behind the current, NULL will be returned.
  • We’re partitioning by CustID. This means we’re seeing a breakdown for each individual customer.
  • Remember, neither LAG nor LEAD support a window frame.

LAG and LEAD operate on one row at a time, from top to bottom, in each partition

Let’s look specifically at rows for our customer Joshua Porter:

window functions lag joshua porter

Let’s walk down these rows, one at a time, like SQL will do. It starts at the first row:

window functions lag joshua porter first row

So remember, our LAG function is looking for the Price value 1 row behind the current row. Notice there simply is no row behind the current, so that’s why NULL is put as the value in our “Cost of previous Product purchased” column.

Then SQL moves on to the next row:

window functions lag joshua porter second row

For this row, we once again want to grab the Price value 1 row behind us. This time, there is a value there, and it’s $198.00. We plug that value into our “Cost of previous product purchased” column.

Then we’re done with this partition. SQL will move on to the next one!

For giggles, let’s talk about the first row in the next partition (which contains rows for Andrew Bluefield)

lag and lead first row for andrew bluefield

We understand why we see NULL in our LAG column, right?

This is the first row in this partition. There simply is no row behind this row in this partition. Therefore, NULL is used!

Understanding LEAD

Folks, LEAD is very similar to LAG, except of course the difference is LEAD will return an element that is a specified offset from after (a.k.a in front of) the current row.

Let’s look at an example. This time, I’ll specify all three arguments:

window functions lead

I still want an offset of just 1, but instead of NULL as the default value, I chose to use 0.00.

(IMPORTANT: You should know that the default value you specify needs to be the same data type as your element. In my case, the element is Price, which is a decimal data type).

Again folks, it’s same same idea, but now we’re looking at the next row in the partition.

Let’s go down the list of orders for our customer Gordon Acres, starting with his first row:

window functions lead gordon first

If we look 1 row ahead of the current row, the Price is $265.20. We put that value in our LEAD column, which is called “Cost of next Product purchased“.

Move on to the next row:

window functions lead gordon acres second

One row ahead of this row has a Price of $169.40. Plug that value into our column.

Finally, the last row:

window functions lead gordon acres third row

In this partition, there simply is no row one row ahead of the row we’re on, so the default value of 0.00 get’s plugged into our column.



5. Tips, tricks, and links

Here is a list of a few tips and tricks you should know when working with the different offset window functions in SQL Server:

Tip # 1: FIRST_VALUE and LAST_VALUE were introduced in SQL Server 2012

If you try to use FIRST_VALUE or LAST_VALUE in a SQL Server version that is older than 2012, it won’t work. These functions were introduced with SQL Server 2012, so they will only work with that version or higher.

There is, however, a roundabout way of getting a first value without using the FIRST_VALUE function. It involves creating a table expression (like a CTE) that uses the ROW_NUMBER window function, then filtering the results of that table expression to only display rows where the value in the ROW_NUMBER column is 1.

Here is a great Stack Overflow article that talks about how to do it: Alternate to FIRST_VALUE in SQL Server 2008.

Tip # 2: When using FIRST_VALUE or LAST_VALUE, you should always specify a window frame

Earlier, I demonstrated the use of FIRST_VALUE without a window frame. I discussed how when the frame is omitted from your OVER clause, SQL will use the frame RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW by default. The thing about the RANGE window unit is that it’s not as efficient as the ROWS window unit. Microsoft makes that very clear. So do yourself a favor and just always specify your own frame!

Tip # 3: When using LAG or LEAD, you can leave out a default value if you want to

This will work fine:

lag and lead no default value

I specified the element name and the offset value, but NOT the default value. That’s fine.

Understand you can’t specify just an element and a default value, however.

Tip # 4: Offset functions can only be used in the SELECT list or ORDER BY clause.

You can only use FIRST_VALUE, LAST_VALUE, LAG, or LEAD in the SELECT list or an ORDER BY clause. This is actually true for all window functions.

Links:

Everything discussed in this tutorial can be found in the following FREE Ebook:

Download your FREE SQL Server Window Functions EBook!

This FREE EBook discusses absolutely everything you need to know about window functions in SQL Server, including everything discussed in this tutorial. A proper understanding of window functions is essential for anyone looking to enter the field of data science. This Ebook will definitely be a great resource for you to reference throughout your career. Get it today!

Next Steps:

Leave a comment if you found this tutorial helpful!

Now that you know the four Offset window functions, you should learn all about the four ranking window functions available to us in SQL Server. You should know all the different window functions if you want to be a great data scientist! Check out the tutorial here:

SQL Server Ranking Window Functions: ROW_NUMBER, RANK, DENSE_RANK, NTILE

Also, if you missed the introduction to window functions, it discusses the different aggregate window functions you can use in SQL Server. That tutorial is a MUST-READ, because it discusses all the parts involved in a window function that you need to know in order to write a window function query correctly. Don’t skip that tutorial, you can find it here:

SQL Server Window Functions: An introduction for beginners



Thank you very much for reading!

Make sure you subscribe to my newsletter to receive special offers and notifications anytime a new tutorial is released!

If you have any questions, or if you are struggling with a different topic related to SQL Server, I’d be happy to discuss it. Leave a comment or visit my contact page and send me an email!

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *