Second Normal Form: A Beginner’s Guide

Second normal form featured image
Reading Time: 9 minutes

Normalization is a very important process of designing tables in such a way that they maintain data integrity and reduce data redundancy. We were introduced to the concept of Normalization in our previous tutorial about First Normal Form. If you missed that tutorial, definitely check it out!



In this tutorial, we’re going to learn about Second Normal Form, which introduces a new set of rules you should follow when designing tables in your database.

Also, after reading this tutorial, don’t forget to learn about the final normal form: Third Normal Form.

In this tutorial, we’ll discuss the following key points you should know about Second Normal Form:

  1. A table must first be in First Normal Form
  2. What is a “dependency”?
  3. Table must not contain any partial dependencies
  4. A table with a single-column primary key is automatically in Second Normal Form
  5. Links

Everything discussed in this tutorial can also be found in the following FREE Ebook:

FREE Ebook on SQL Server Table Normalization!

This FREE Ebook provides an excellent summary of all the rules you need to know when normalizing your tables to First, Second, and Third Normal Form. It will definitely be a great resource to keep and reference throughout your career as a data professional. Make sure you download the guide today!

Let’s jump right in:

1. A table must first be in First Normal Form

The first rule to remember to satisfy Second Normal Form is that your table must first satisfy First Normal Form.

Seems logical, right? You can’t just skip step 1.

First Normal Form basically states the following rules:

  1. Each row is a table must be unique
  2. Columns must be “atomic”
  3. There should be no repeating column groups
  4. Values in a column must all be the same data type

Everything you need to know about First Normal Form (and also why we need to normalize our tables to begin with) is discussed in the full beginner-friendly tutorial found here:

First Normal Form: An Introduction to Table Normalization

Definitely check it out!



2. What is a “dependency”?

To understand the idea of Second Normal Form, we need to understand what a dependency is.

The idea is basically this: If you change a primary key value, the non key attributes also need to change.

Let’s look at an example. Take the following Employees table, where we say the Primary Key is the SSN column:

second normal form SSN

Since the SSN column is the primary key, it needs to contain unique values for each row. This shouldn’t be a problem because every person alive has a SSN that is unique to them.

Think about what we would need to do if we changed someone’s SSN. For example, what if we changed 222-54-0009 to 222-54-0008? Would we be talking about a completely different person? Yes, we would.

Therefore, we would also need to change the FirstName and LastName values.

That is what a dependency is. If you change a primary key value, all other non key columns also need to change.

If we change a SSN, even just a little, we need to completely change the rest of the data. The values for FirstName and LastName depend on what the SSN is.



3. Table must not contain any partial dependencies

The rules for Second Normal Form apply to a table with a composite primary key, meaning a primary key that is composed of multiple columns. Let’s set up some interesting tables to demonstrate Second Normal Form.

Let’s say we are in the business of Hollywood, and we need some tables to track celebrity and movie information.

We’ll create an Actors table and add a few rows to it:

CREATE TABLE Actors
(
ActorID INT PRIMARY KEY IDENTITY,
FirstName VARCHAR(20),
LastName VARCHAR(20)
)

INSERT INTO Actors (FirstName, LastName)
VALUES
('Brad','Pitt'),
('Leonardo','Dicaprio'),
('Jamie','Fox'),
('Jennifer','Lawrence'),
('Matt','Damon'),
('Ben','Affleck'),
('Christian','Bale')

Nice, so we have an Actors table where we created a simple ActorID column we’ll use as our primary key. It uses the handy IDENTITY property to populate the table with a new number for us on inserts.

Ok, then we’ll create a Movies table. It has a MovieID column we’ll use as the primary key. It also uses the IDENTITY property:

CREATE TABLE Movies
(
MovieID INT PRIMARY KEY IDENTITY(10,5),
MovieTitle VARCHAR(50)
)

INSERT INTO Movies (MovieTitle)
VALUES
('Once Upon a Time in Hollywood'),
('Django Unchained'),
('The Revenant'),
('The Hunger Games'),
('American Hustle'),
('Good Will Hunting')

Ok, let’s run a couple of SELECT statements to see the data in our tables:

second normal form movies and actors

Nice. Now let’s set up one more table with a composite primary key.

A linker table

Let’s say we want a table to sort-of link the Actors and Movies tables together. Let’s create a table that keeps track of the role each actor played in a movie. We’ll outline the actors role (like ‘Leading Actor’ or ‘Supporting Actor’) and the name of the character they played. Here’s the table, called CharacterDetails:

CREATE TABLE CharacterDetails
(
ActorID INT,
MovieID INT,
Role VARCHAR(20),
CharacterName VARCHAR(40),
CONSTRAINT pk_CharacterDetails PRIMARY KEY(ActorID, MovieID)
)

INSERT INTO CharacterDetails (ActorID, MovieID, Role, CharacterName)
VALUES (2, 10, 'Leading Actor', 'Rick Dalton')

Notice the primary key we created for this table. It’s on the ActorID and MovieID columns.

This means that each row is uniquely identified by the combination of ActorID and MovieID. There will never be two or more rows with exactly the same ActorID/MovieID combination.

(This makes sense, right? An actor has only one role in a movie. An actor doesn’t play multiple parts)

(And we’re not gonna bring up Eddie Murphy….)

Here’s the data in the table so far:

second normal form characterdetails table start

So if we know what’s in our Actors and Movies tables, we basically understand that we’re looking at Leonardo DiCaprio’s role in Once Upon a Time in Hollywood.

Leonardo DiCaprio was the Leading Actor in that movie, and he played a character named Rick Dalton. Those are the details we see in our CharacterDetails table.

Great.

This is an example of a table that is already in Second Normal Form

Most tutorials first show you a table that violates Second Normal Form, then show you how to fix it. I think it would be more helpful to show you a table that satisfies Second Normal Form first.

Our CharacterDetails table is already in Second Normal Form.

Remember, our primary key is the combination of ActorID and MovieID. Earlier, we said a dependency means that if you change a primary key value, all other non key columns also need to change.

So let’s think about what would happen if we changed our primary key value. We’ll only change part of it to start.

Let’s leave the MovieID alone, but let’s change the ActorID to 1:

second normal form update actorID

Again, remember the data in our Actors table. ActorID # 1 is Brad Pitt.

Knowing that, do the Role and CharacterName values need to change? Yes, they freaking do.

Brad Pitt wasn’t the lead actor in that movie. He was the supporting actor. And his character name was Cliff Booth.

So, since we changed the primary key (even just part of it), it means we need to also change all non key columns:

second normal form brad pitt role

So again folks, since all non key columns depend on the entire primary key, we can say this table is in Second Normal Form.

Driving the point home

I’m going to put the data back how it was originally to do another demonstration:

second normal form back to original

So now we’re back to Leonardo DiCaprio’s role in “Once Upon a Time in Hollywood”.

What if we just changed the MovieID? Again, this is part of the primary key. What if we changed it to 15, which is the MovieID for Django Unchained.

Well, we would need to change the Role and CharacterName. We’re still talking about Leonardo DiCaprio, but it’s for a different movie now.

For this Actor/Movie combination, the Role would need to be ‘Supporting Actor‘, and the CharacterName would need to be ‘Calvin Candie‘.

So again, since we changed the primary key (even just part of it), we need to also update all the non key columns.

And of course, what if we changed both the ActorID and the MovieID?

Still, the Role and CharacterNames need to change.

For example, if we changed ActorID to 4 and MovieID to 30, we would be talking about Jennifer Lawrence’s role in American Hustle.

She was a ‘Supporting Actor‘ in that movie, and her character was ‘Roselyn Rosenfeld‘.

Get it? Again folks, this table is already in Second Normal Form. Now that you know what it means for a table to be in Second Normal Form, it will be easy to spot a table that violates Second Normal Form.

Violating Second Normal Form

I’m going to set up the data to match that last example, where we have details for Jennifer Lawrence’s role in American Hustle:

second normal form american hustle set up

Let’s add another column to our CharacterDetails table called MovieDuration. This column will represent the length of a movie in minutes. We’ll go ahead and update our row in the table to populate this column:

ALTER TABLE CharacterDetails ADD MovieDuration INT

UPDATE CharacterDetails SET MovieDuration = 138
WHERE ActorID = 4 and MovieID = 30

Here’s the table:

second normal form movie duration

Ok, so let’s think about what would happen if we changed the ActorID again. What if we changed it to 7, which is the ID for Christian Bale? We know the role would change to ‘Leading Actor‘, and the character would change to ‘Irving Rosenfeld‘…..

But what about the new MovieDuration column? That has nothing to do with the Actor. That would remain unchanged.

Ok, well what if we left the actor as Jennifer Lawrence, but we changed the MovieID to 25? That’s the ID for The Hunger Games.

In that case, yes, the MovieDuration would change (along with the Role and CharacterName).

So the value of the MovieDuration column really only depends on the value of the MovieID, right?

Folks, this is what we call a partial dependency. The value in the MovieDuration column only depends on the value of MovieID, which is only part of the primary key.

For a table to be in Second Normal Form, there can be no partial dependencies. In other words, all non key columns must functionally depend on the entire primary key.

So, how do we fix it? Maybe you guessed we should move the MovieDuration column to the Movie table. After all, a MovieDuration is an attribute of a movie, right?

ALTER TABLE CharacterDetails DROP COLUMN MovieDuration

ALTER TABLE Movies ADD MovieDuration INT

When you are normalizing a table to satisfy Second Normal Form, it will probably be necessary to move columns around like we have done, or maybe create a new table altogether. The goal is to make sure there are no partial dependencies!

4. A table with a single-column primary key is automatically in Second Normal Form

The concept of Second Normal Form only applies to tables that contain a composite primary key. That is, a table that has multiple columns that make up the primary key.

If your tables all have single-column primary keys, Second Normal Form will be satisfied. But realize there could be other normalization violations happening. This is why it’s important to know the other normal forms: First Normal Form and Third Normal Form (which we’ll discuss next week! Subscribe to my email list to be notified when the tutorial goes LIVE).



5. Links

Here is a link to a great YouTube video that discusses Second Normal Form. It helped me tremendously when I was first learning about normalization: Database Normalisation: Second Normal Form

Next Steps:

If you found this tutorial helpful, make sure you download your FREE Ebook:

FREE Ebook on SQL Server Table Normalization!

This FREE Ebook provides an excellent summary of all the rules you need to know when normalizing your tables to First, Second, and Third Normal Form. It will definitely be a great resource to keep and reference throughout your career as a data professional. Make sure you download the guide today!

Now that you are a master at Second Normal Form, it’s time to move on to the third and final normal form: Third Normal Form. It introduces the last set of rules you should follow to make sure your tables preserve data integrity and reduce data redundancy. Check it out here:

Third Normal Form: A Guide for Beginners



Thank you very much for reading!

Make sure you subscribe to my newsletter to receive special offers and notifications anytime a new tutorial is released!

If you have any questions, or if you are struggling with a different topic related to SQL Server, I’d be happy to discuss it. Leave a comment or visit my contact page and send me an email!

Related Post

Leave a Reply

Your email address will not be published. Required fields are marked *