The difference between COUNT(*) and COUNT(column): Answered!

Count vs count column
Reading Time: 6 minutes

When I was first starting out with SQL Server, I had a hard time understanding the difference between COUNT(*) and COUNT(column), and how they are used with the GROUP BY clause.



Many times, the COUNT(*) function would return the same results as the COUNT(<column>) function, so I just told myself “Who cares what the difference is, they’re returning the same results anyway“.

Here is “who cares”: Your boss, and your clients if your queries end up returning inaccurate information.

It is definitely important to understand the difference between COUNT(*) and COUNT(<column>) in the real world.

In this tutorial, we’ll talk about the difference between the two aggregates and when one should be used over the other. We’ll cover these topics:

  1. What’s the difference between COUNT(*) and COUNT(<column>)?
  2. Examples of COUNT(*) vs COUNT(<column>)
  3. Tips, tricks, and links

So, let’s do it.

(By the way, not sure why there is a lady vampire as the featured image? According to ancient folklore, vampires suffer from an obsessive compulsive condition known as Arithmomania, which is an obsession with counting. Link)



1. What’s the difference between COUNT(*) and COUNT(<column>)

COUNT(*) is used when you simply want to count the number of rows per group, no matter what values exist in the columns. COUNT(<column>) will count the number of rows per group that have a non-NULL value in the specified column.

Most books on the topic of querying databases using T-SQL will say the difference is simply this: COUNT(<column>) excludes NULL while COUNT(*) does not.

When I was first learning T-SQL, this wasn’t thorough enough for me. Now that I do understand the difference, I see what they mean.

We’re going to give you some examples next, which is honestly the best way to understand the difference.

2. Examples of COUNT(*) vs COUNT(<column>)

Let’s look at an example of each

The COUNT(*) aggregate function

As I said in the last point, the COUNT(*) aggregate function will simply count the number of rows per group.

It really doesn’t care what values are in your other columns.

(When I say “other” columns, I mean columns you are NOT grouping by)

Say we have a table called EmployeeDetails with the following data:

query before count

Let’s say you want to write a simple query to see the number of Employees you have in each Country.

Pop Quiz: What would we be “grouping by” in a query like that?

Answer: The different Country values.

Need a rundown on the GROUP BY clause? Click the link!

When thinking about the COUNT(*) function, you really have to put blinders on to ignore the other column values.

What if we look at the same result set from the query above, but we hide the other column values besides the Country column:

result set with blinders

This is how the COUNT(*) function, grouping by the Country column, would see the data.

It just counts the number of rows per group.

We already established the “groups” are the different Country values. So it looks like we have the following breakdown:

USA – 3 rows

UK – 2 rows

India – 3 rows

NULL – 1 row

So if we run our query with the GROUP BY clause, grouping by Country and with the COUNT(*) function, we should see the same breakdown:

count star

Nice.

The COUNT(<column>) aggregate function

When you use the COUNT(<column>) aggregate function, you are basically saying: For each group, count the number of rows that don’t have NULL in the specified column.

Let’s think about a good example.

What if you want to count the number of people in each Country that have a middle name?

Since we’re still counting something, we still need to use one of our COUNT functions and a GROUP BY clause, but now we are, in fact, concerned with what values are in another column (the MiddleName column).

The query looks like this:

count column query

Going back to our blinders, SQL Server will look at the data in the EmployeeDetails table like this now:

count column result set with blinders

We’re still going to group the results by Country. But now, for each country, we’re going to count the number of rows that have a NON-NULL value in the MiddleName column.

So, for the group USA, how many rows have a MiddleName that is not NULL?

Answer: 2

For group India, how many rows have a MiddleName that is not NULL?

Answer: 1

You get the idea.

Here is the output of our new query:

number of middle names per country

I hope you’re not thrown off by the Country ‘NULL‘.

In the EmployeeDetails table, we saw there is an employee with a Country of NULL. We still include that as one of our groups. And for that employee, there is a MiddleName value, so we count it.

AKA: The NULL group has 1 person with a middle name.

As you can see, it’s important to know the difference between the two COUNT functions. They can definitely give you different results if you don’t know how they work!



3. Tips, Tricks and Links

Here is a list of a few tricks and tricks you should know when working with COUNT(*) and COUNT(<column>)

Tip # 1: When using either COUNT aggregate function, you actually don’t need to use it with the GROUP BY clause

If the only expression in your SELECT list is the aggregate function (and no other table columns), you don’t need a GROUP BY clause. The “group” in that case would be the one entire result set.

For example, what if you simply want to know the number of people with Middle Names in your EmployeeDetails table? You could definitely run a query like this:

no group by clause with count

This factoid is true for other aggregate functions, too, and not just the COUNT aggregates.

Tip # 2: When using COUNT(<column>), you can include the words ALL or DISTINCT next to the column name.

The syntax would be either COUNT(ALL <column>)or  COUNT(DISTINCT <column>)

COUNT(ALL <column>) is actually the default behavior. When you use COUNT(<column>),  it is actually doing a COUNT(ALL <column>)behind the scenes. It will count the number of non-NULL rows for each group, even if there are multiple rows with the same column value.

But what if you wanted to not count duplicate values? In that case, you should choose to use COUNT(DISTINCT <column>)

Let’s look at an example. What if you want to know how many different countries your Employees are in? You start with this:

count all

Remember, COUNT(<column>) is actually doing a COUNT(ALL <column>) behind the scenes. So this query is just counting the number of rows in our table with a non-NULL value in the Country column.

Period.

But the number 8 isn’t really the answer to our question, is it?

Sure, there are 8 non-NULL Country values, but many of them are repeated. We’re really looking for the number of distinctly different Countries in our EmployeeDetails table.

This is where you would use the keyword DISTINCT:

count distinct

That’s better.

Links

There is a great book called T-SQL Fundamentals written by Itzik Ben-Gan that goes over several core concepts you should know about SQL Server, including a discussion about aggregate functions. You won’t regret owning this book, trust me. Definitely get it today!

Next Steps:

Leave a comment if you found this tutorial helpful!

Are you making these 7 mistakes with NULL? Click the link to find out?

If you’re not already familiar with the HAVING clause, you should read up on that, too. We basically use it to filter the results of an aggregation. Click the link below to learn more:

The HAVING Clause: How it works

Thank you very much for reading!



Make sure you subscribe to my newsletter to receive special offers and notifications anytime a new tutorial is released!

If you have any questions, please leave a comment. Or better yet, send me an email!

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *