Normalization of Relational Databases

This page will try to summarize the normalization theory for BCNF and 3NF normal forms. We will start at the beginning so that all of our terms and concepts will be well defined. In fact, we will begin with the definition of an attribute and work our way, carefully, up to the two desired normal forms and their decomposition algorithms.

1.) An attribute is a string of letters from a specific alphabet.

The choice of alphabet does not matter here. For most of our discussions, the attributes will be single capital letters, like "A", "B" and "C". But in real applications, the attributes are usually names like "AccountNo", "StudentID", or "SSN".

2.) A type is a set of values.

Some examples of types are,

  • the set of all integers,
  • the set of all positive integers,
  • the set of all strings of length 3
  • the set of integers {1, 2, 4, 6, 8, 10},
  • the set of characters {'r', 'g', 'b'}.
Any set of values can be a type. We will not need to make much use of types in the following discussion.

3.) A schema is a set of pairs where each pair contains an attribute and a type. No two pairs can contain the same attribute.

A simple example of a schema might be S={ A::Integer, B::Char, C::{1,2,3} }. This schema has three attributes, A, B and C, and each attribute has associated with it a type. Notice that the condition that no two attribute::type pairs can contain the same attribute means that you can't have the attribute A with type Integer and in the same schema also have the attribute A with the type Char.

In what follows, the types are not crucial, so we will often omit them (or just assume that all the attributes have type "Positive Integer"). We will usually abbreviate the notation for a schema as S={A, B, C}, or even S(A, B, C). But remember, each attribute really does have a type.

4.) A tuple over a schema S is a set of pairs where each pair contains an attribute from S and a value from that attribute's type. A tuple over the schema S should have as many pairs as S has attributes, and each attribute in S should also be in the tuple.

For the schema example from paragraph (3), some example tuples are { (A,5), (B,c), (C,1) }, or { (C,3), (A,2008), (B,m) }. Notice that since a tuple is a set, the order of the pairs is not important (and that is why the pairs need attributes in them, so that you can tell who the values belong to). If we are willing to take the order of the attributes in a schema's definition as a kind of "preferred order", then we can abbreviate the notation for a tuple by dropping the attributes from the tuple (this is pretty much what SQL does in its INSERT INTO statement). So the last two tuples could be written as (5, c, 1) and (2008, m, 3).

5.) A functional dependency (abbreviated as FD) over a schema S is a symbol of the form X->Y where X and Y are subsets of the attributes from S.

We will refrain in this paragraph from saying what functional dependencies mean. For now, they are just symbols. For the schema S(A, B, C), some examples of functional dependencies are {A}->{B,C} or {A,C}->{A}. When our attributes are just single letters, it is convenient to adopt an abbreviated notation for functional dependencies and not explicitly use the set notation. So the last two FD's could be written as A->BC and AC->A.

6.) Given a schema S, a functional dependency X->Y over S, and two tuples t1 and t2 over S, we say that the tuples are constrained by the FD if the following statement is true. If for each attribute in X the tuples t1 and t2 have the same value, then the tuples have the same value for each of the attributes in Y.

Here is another way to write this definition. Let the expression t1[X] = t2[X] mean that for every attribute in X, the value of that attribute in tuple t1 is equal to the value of the same attribute in t2. Then tuples t1 and t2 are constrained by the FD X->Y if

t1[X] = t2[X] ⇒ t1[Y] = t2[Y]

For example, given the schema S(A,B,C) and the FD A->C, the tuples (1,2,3) and (1,1,3) are constrained by the FD (but they would not be constrained by the FD A->B).

Notice that if you are given just one tuple, it does not make any sense to say that the tuple is constrained by a FD. It takes two (or more) tuples to make a FD constraint meaningful.

7.) A relation R(S, F) is a schema S and a set F of functional dependencies over that schema.

An example of a simple relation would be R( S(A,B,C), {AB->C, C->A} ). This relation has the schema S(A,B,C) with three attributes and the relation has two functional dependencies. Another simple relation would be R( S(A,B,C), {A->BC, C->A} ). This relation has the same schema, S(A,B,C), and two FD's, but one of the FD's is different than in our first example. This second relation is in fact a very different relation than the first one. As we will soon see, these two relations have very different properties.

8.) An instance of a relation R(S, F) (or, a table over R) is a set of tuples over the schema S such that any two of the tuples are constrained by each of the FD's in F.

For example, given the first relation R( S(A,B,C), {AB->C, C->A} ) from paragraph (6), the table on the left is an example of an instance of R.
124
235
144
215
        
124
225
346
447
Notice however that the table on the left is not an instance of the second relation R( S(A,B,C), {A->BC, C->A} ) from paragraph (6) (which shows that these two relations are not the same). The table on the right however is an instance of this second relation.

9.) When we defined FD's in paragraph (5) we did not say anything about what they "mean", we just defined them as symbols. Functional dependencies gain their "meaning" from the above definition of tables over a relation. First of all, functional dependencies are constraints since they constrain the values that can be placed in a table that is an instance of the relation that the functional dependencies come from. But here is another, more "semantic", way to think about functional dependencies. Given a functional dependency X->Y from a relation R(S,F) and a tuple t1 from a table over the relation R(S,F), we imagine that the attribute values from X identify some particular entity and the attribute values from Y provide some information about that entity. Now suppose that some other tuple t2 from the same table agrees with tuple t1 on the set X. So both of the tuples are in the table and both of the tuples are referring to the exact same entity. So it would be strange if the two tuples contained in their Y attributes conflicting information about that entity. So it makes sense to say that if two tuples refer to the same entity (that is, if t1[X] = t2[X]) then we should require that (i.e., constrain that) they provide the same information about that entity (that is, t1[Y] = t2[Y]).

10.) Given a relation R(S,F) and a functional dependency f over S, we say that F entails f if the relations R(S, F) and R(S, F ∪ {f}) have the exact same set of tables. That is, adding the FD f to the FD's in the relation R(S, F) does not change the set of allowed tables.

11.) Given a relation R(S,F), the closure F+ of the set F of FD's is the set of all FD's entailed by F.

This and the previous definition are very difficult to work with since they imply the need to check an infinite number of conditions. For example, to determine if a FD f is entailed by the set F of FD's we would need to check every one of a (possibly) infinite set of tables and determine if each table is an instance of the new set of FD's F ∪ {f}). This is not practical. The next few paragraphs make working with entailment and closures a bit easier.

12.) Theorem: (The Soundness of Armstrong's Axioms) Let R(S,F) be a relation and let X, Y and Z be subsets of attributes from the schema S. Then

  1. If Y ⊂ X then the FD X->Y is entailed by F. (That is, trivial FD's are always in F+.)
  2. If X->Y and Y->Z are entailed by F, then X->Z is entailed by F. (This is the transitivity axiom.)
  3. If X->Y is entailed by F, then XZ->YZ is entailed by F. (This is the augmentation axiom.)

Rather than proving this theorem, let us see an example of how it can be used. In paragraphs (6) and (8) we considered the relations

R( S(A,B,C), {AB->C, C->A} )
R( S(A,B,C), {A->BC, C->A} ).
Let us show that the FD AB->C from the first relation is entailed by the FD's F={A->BC, C->A} from the second relation.
  1. We have AB->A entailed by F since this is a trivial FD.
  2. Since we have AB->A and A->BC entailed by F, we have AB->BC entailed by transitivity
  3. We have BC->C entailed by F since this is a trivial FD.
  4. Since we have AB->BC and BC->C entailed by F, we have AB->C entailed by transitivity
This result proves that every table of the second relation is also a table for the first relation. (But not the other way around, as we saw in paragraph (8).)

13.) (Three more "axioms") Let R(S,F) be a relation and let X, Y, Z and W be subsets of attributes from the schema S. Then

  1. If X->YZ is entailed by F, then X->Y and X->Z are entailed by F. (This is the decomposition axiom.)
  2. If X->Y and X->Z are entailed by F, then X->YZ is entailed by F. (This is the union axiom.)
  3. If X->Y and ZY->W are entailed by F, then XZ->W is entailed by F. (This is the pseudotransitivity axiom.)

Try to derive these axioms from the three Armstrong axioms. (Note: The notation XZ is just an abbreviation of the notation X∪Z for the union of X and Z.)

14.) Theorem: (The Completeness of Armstrong's Axioms) Let R(S,F) be a relation and let f be a FD in F+. Then we can derive, in a finite number of applications of the three Armstrong Axioms, f from the FD's in F.

This theorem does not tell you how to derive f from the FD's in F, it just tells you that it can always be done. Discovering the right sequence of axioms can be challenging. (We will not be using this theorem, so we will not prove it.)

15.) Let R(S,F) be a relation and let X be a subset of the attributes in the schema S. The attribute closure of X, denoted by X+F, is the set of all attributes A from S such that the FD X->A is entailed by F (that is X->A ∈ F+).

The following theorem shows how closely related are the notions of closure and attribute closure. Its proof is an exercise in using Armstrong's axioms.

Theorem: Let R(S,F) be a relation. Then X->Y ∈ F+ if and only if Y ⊆ X+F.

Attribute closure turns out to be a very useful way to work with the entailment and closure of FD's. The reason why is that there is a simple algorithm for computing attribute closures.

16.) Let R(S,F) be a relation and let X be a subset of the attributes in the schema S. Then the following Attribute Closure Algorithm computes X+F.

   closure = X;
   repeat
      previous_closure = closure;
      if there exits sets Z, Y such that (Z->Y ∈ F) && (Z ⊂ closure) && (Y ⊄ closure) then
         closure = closure ∪ Y
   until ( closure == previous_closure );
   return closure;
To see how this algorithm works, first notice that if A is an attribute in X, then X->A is entailed by F by Armstrong's first axiom, since X->A is a trivial FD. So we see that X ⊂ X+F. So the algorithm starts by letting the closure begin as X. The algorithm then grows the closure one step at a time by finding an FD Z->Y where all the attributes of Z are already in the closure but not all the attributes of Y are in the closure yet. We need to show that every attribute in Y is in fact in the closure. If A is an attribute from Y, then Y->A is entailed by F since this is a trivial FD. We are assuming that for every attribute B ∈ Z we have X->B entailed by F. So by the Union axiom in paragraph (13) above we have X->Z entailed by F. So now we have X->Z, Z->Y, and Y->A, so by transitivity, X->A, so A is in the closure for every attribute A in Y. The algorithm halts when we can no longer find another FD Z->Y with all the attributes of Z already in the closure but some attribute of Y not in the closure. (Note: So far, we have really only proven that the attribute closure algorithm gives us a set that is a subset of the attribute closure, not the whole closure.)

17.) Let R(S,F) be a relation. A subset X of attributes from the schema S is a super key for the relation R if X+F = S, that is, the attribute closure of X is the whole schema S.

18.) Let R(S,F) be a relation. A subset X of attributes from S is a key for the relation R if X is a super key for R and no proper subset of X is also a super key. In other words, keys are minimal super keys. If you remove a single attribute from a key, it is no longer a key.

Note that many textbooks make the incorrect statement that the values of a tuple on a key determine the values of the tuple on all the other attributes. Here is an example that shows this to be wrong. Suppose we have the simple relation R( S(A,B,C), {A->BC} ). Notice that the single FD says that the single attribute set {A} is a key. Suppose we begin to fill in an instance of this relation and we start with the following partially filled in tuple.
1  
Does the value of 1 for the key attribute A determine the values for the other two attributes? No, not at all. The other two attributes are not at all constrained (except by their respective types). Suppose that we fill in these two attributes.
184
Suppose that we now wish to update this tuple. Does the key value of 1 influence in any way how we can update this tuple? Again, the answer is no. We can change the values in this tuple any way we wish. The key value of 1 in no way determines the values of the other tuples.

19.) The following theorem could be taken as the definition of a super key. However, the definition for super key that we gave above will be easier for us to compute with. As an exercise in using some of the above ideas, you should try to prove this theorem.

Theorem: If X is a subset of the attributes from a relation R(S,F), then X is a super key for the relation if and only if the FD X->S is entailed by F. In other words, X is a super key if and only if X->S ∈ F+.

Notice that in this theorem you cannot replace "super key" with "key". The resulting theorem will not be true. However, it will be true in one direction (which direction?) and false in the other direction.

20.) Here is an algorithm that determines if a subset X of attributes from a relation R(S,F) is a key for the relation. First, compute the attribute closure X+F (using the algorithm from paragraph 16). If the attribute closure is not S, then X is not a super key, and so it is not a key. If the attribute closure is S, then X is a super key, and we have to determine if it is in fact a key. For each attribute A ∈ X, compute the attribute closure of X - {A}. If any one of these attribute closures is S, then X is not a key (since X contains a super key). If every one of these attribute closures is not S, then X is a key.

21.) Let us show that every super key contains a key. We prove this by providing an algorithm for finding a key contained in any super key. Let X be a super key for the relation R(S,F). Use the algorithm from the last paragraph to determine if X is a key. If it is, we are done. If X is not a key, then there is an attribute A in X such that X - {A} is a super key. Let X - {A} be our new super key and apply this algorithm to it. Eventually we must find a key, since our super key keeps on shrinking and it can't shrink forever. In particular, if the super key shrinks to a single attribute, then the super key is a key since the empty set cannot be a super key (the attribute closure of the empty set is the empty set).

22.) By using the algorithm for finding a key in a super key, we can prove that every relation R(S,F) has a key. Since the attribute closure of S is S, the set S is always a super key for the relation R(S.F). And since every super key must contain a key, the relation R(S,F) must have a key.

23.) Here is an algorithm that finds all the keys of a relation R(S,F). For each attribute A ∈ S, determine if the set {A} is a key for the relation. After finding all the single attribute keys, find all the double attribute keys. That is, for each pair of attributes {A,B} where neither A nor B is a key, determine if the set {A,B} is a key. After finding all the double attribute keys, find all the triple attribute keys. That is, for each triple of attributes {A,B,C} such that no one of the attributes is a key and no pair of the attributes is a key, determine if the set {A,B,C} is a key. Keep on going this way until you have checked for every possible size of key.

24.) Let us do a concrete example involving the calculation of keys. Let us compute the keys for the two relations from paragraphs (6), (8) and (12)

R( S(A,B,C), {AB->C, C->A} )
R( S(A,B,C), {A->BC, C->A} ).
Let us start with the first relation. Notice that the attribute closure of {A} is {A} and the attribute closure of {B} is {B}, so neither A nor B is a key. Now let us compute the attribute closure of {C}. The FD C->A allows us, after one iteration of the attribute closure algorithm, to grow the set {C} to the set {A,C}. But we cannot iterate the attribute closure algorithm again, so the attribute closure of {C} is {A,C}, so {C} is not a key. So this relation does not have any single attribute keys. Let us now look at pairs of attributes. The set {A,B} is a super key since the FD AB->C all by itself implies that the attribute closure of {A,B} is all of the attributes. Since neither of the individual attributes is a key, the super key {A,B} is in fact a key. It is not hard to see that the attribute closure of the set {A,C} is itself, so {A,C} is not a key. Now let us compute the attribute closure of {B,C}. The FD C->A allows us, after one iteration of the attribute closure algorithm, to grow the set {B,C} to the set {A,B,C}, so {B,C} is a super key. Since neither of the individual attributes is a key, the super key {B,C} is in fact a key. So this relation has two keys, the sets {A,B} and {B,C}.

Now let us find the keys for the second relation. The set {A} is a key since the FD A->BC all by itself implies that the attribute closure of {A} is all of the attributes. Now let us compute the attribute closure of {C}. The FD C->A allows us, after one iteration of the attribute closure algorithm, to grow the set {C} to the set {A,C}. Then the FD A->BC allows us, after another iteration of the attribute closure algorithm, to grow the set {A,C} to the set {A,B,C}. So know we know that the attribute closure of {C} is S. So {C} is also a key for the second relation. Now, notice that the attribute closure of {B} is {B}, so B is not a key. So we know that the two attributes A and C are keys. There are no pairs of attributes which do not contain a key, so none of the pairs of attributes is a key. So A and C are the only two keys for the second relation.

25.) A relation R(S,F) is in Third Normal Form, or 3NF, if, for every functional dependency X->Y ∈ F, one of the following three statements is true.

  1. Y ⊂ X. (In other words, this is a trivial functional dependency.)
  2. X is a super key.
  3. Each attribute in Y - X is contained in some key.
Note that these three statements are not mutually exclusive. So for some functional dependencies it may be that more than one of them is true. The requirement for 3NF is that for each functional dependency in F at least one of these three statements is true.

26.) A relation R(S,F) is in Boyce-Codd Normal Form, or BCNF, if, for every functional dependency X->Y ∈ F, one of the following two statements is true.

  1. Y ⊂ X. (In other words, this is a trivial functional dependency.)
  2. X is a super key.
Note that these two statements are the first two statements from the definition of 3NF. So any relation that is in 3NF is certainly in BCNF.

27.) Notice that both the definition of 3NF and the definition of BCNF is a statement about a relation. It is not a statement about tables over a relation. It doesn't make sense to state that a table is in 3NF or BCNF. Those are statements that we make about the relation that the table is an instance of.

It is worth thinking for a moment about what it would take to determine if a relation is in 3NF or BCNF. Given a relation R(S,F), the question of whether it is in 3NF or BCNF is mostly a question about the functional dependencies in F. In particular, it is mostly a question about the super keys and keys for the relation. To test if R(S,F) is in BCNF, for each functional dependency in F we first should check if it is trivial, and this is very trivial to do. If the FD X->Y is not trivial, then we should check if the left hand side X of the FD is a super key. This is not too hard to do. We just use the Attribute Closure Algorithm to compute X+F and see if it is S. So testing if a relation is in BCNF is quite doable. To test if a relation is in 3NF, we might need to test for the third of the three statements. To test the third statement, we need to take each attribute that is in Y, but not in X, and determine if that attribute is in a key. To determine if an attribute A is in some key, we either need to know what all the keys are (which is very difficult to compute) or we need to take every subset of S that contains A and determine if that subset is a key. This is a real lot of work (though not as much work as finding all of the keys). So BCNF is computationally an easier normal form to check for than 3NF.

28.) Let us determine if the two relations from paragraphs (6), (8), (12) and (24) are in 3NF or BCNF.

R( S(A,B,C), {AB->C, C->A} )
R( S(A,B,C), {A->BC, C->A} ).
Let us do the second relation first. We showed that it has only two keys, {A} and {C}. But this shows us right away that each of the two FD's in the relation have a key as their left hand side and therefore satisfy the second statement in the definition of BCNF. So the second relation is in BCNF.

For the first of the two relations, recall that it has two keys, {A,B} and {B,C}. So the first of the two FD's in the relation satisfies the second statement in the definitions of both 3NF and BCNF. But the second FD in this relation is not a trivial FD and it does not satisfy the second statement in the definition of BCNF, so this relation is not in BCNF. But the second FD in this relation does have a right hand side that is contained in the key {A,B}. So the second FD satisfies the third statement in the definition of 3NF. So the first relation is in 3NF.

29.) A decomposition of the schema from a relation R(S,F) is a pair of proper subsets S1,S2 of S such that S1 ∪ S2 = S.

A decomposition of a schema is said to be a lossless decomposition if the following identity holds for every table T over the relation R(S,F).

T = proj(S1, T) join proj(S2, T)
where join is the relational algebra natural join operator and proj is the relational algebra projection operator.

30.) Lemma: Let S1, S2 be a decomposition of the schema from the relation R(S,F). Then for every table T over the relation R(S,F) we have the following.

T ⊂ proj(S1, T) join proj(S2, T)
In other words, the join of the two projections will always contain all of the original tuples from T, but it might also contain extra tuples. To prove this fact, let t be a tuple from T, let t[S1] denote the tuple in proj(S1,T) that t projects onto, and let t[S2] denote the tuple in proj(S2,T) that t projects onto. Recall that the first step in computing the natural join is to take the product of the tables proj(S1,T) and proj(S2,T). In this product table there will be a tuple that is the concatenation of the tuples t[S1] and t[S2]. The next step in computing the natural join is to remove from the product table those tuples that do not agree in the duplicated attributes that come from S1 ∩ S2. But since t[S2] and t[S2] are projected from the same tuple t, these two tuples will agree on the attributes in S1 ∩ S2. So the tuple in the product that is the concatenation of these two tuples will not be deleted from the product. The last step in computing the natural join is to remove from the product table one copy of each duplicate attribute from S1 ∩ S2. But this step will make the concatenation of t[S1] and t[S2] back into the tuple t. This shows that t is in the join, and so it proves that T is a subset of the join.

To see that T might be a proper subset of the above join, let us look at a specific example. Consider the relation R( S(A,B,C), {A->BC, C->A} ) and let us use the decomposition S1={A,B} and S2={B,C}. A table T over this relation is
124
225
The projections of T look like the following tables
12
22
    
24
25
And proj(S1, T) join proj(S2, T), the join of these two projections, looks like this.
124
125
224
225
Notice two facts about this last table. Not only is it larger than the original table T, but this table is not even an instance of the relation R(S,F) that we started with. In the process of taking an instance of the relation R(S,F), decomposing it, and then joining the two projections, some constraint (i.e., some FD) from R(S,F) ended up getting lost.

31.) Theorem: If S1, S2 is a decomposition of the schema from the relation R(S,F), and if either of the following two statements is true

  1. S1 ⊆ (S1 ∩ S2)+F, or
  2. S2 ⊆ (S1 ∩ S2)+F
then the decomposition is a lossless decomposition.

Notice that the test for a lossless decomposition is that the attribute closure of S1∩S2 contains either S1 or S2. Some books state this condition as "the set S1∩S2 must contain a key for S1 or a key for S2". The problem with this statement is that neither S1 nor S2 are relations, so they do not have keys (but one of the steps below is to show how, under certain circumstances, to define S1 and S2 as relations).

32.) Let S1,S2 be a decomposition of the schema from the relation R(S,F). Suppose we have a way of choosing functional dependencies F1 and F2 for the schemas S1 and S2. Then we can define the relations R1(S1,F1) and R2(S2,F2). Once we have the relations R1(S1,F1) and R2(S2,F2) we can talk about tables which are instances of those relations. This lets us ask the following two questions (which are really questions about the two sets of functional dependencies F1 and F2).

  1. If table T is an instance of R(S,F), is proj(S1,T) an instance of R1(S1,F1), and is proj(S2,T) an instance of R2(S2,F2)?
  2. If T1 is an instance of R1(S1,F1) and T2 is an instance of R2(S2,F2), is T1 join T2 an instance of R(S,F)?
Notice, for example, that if the set F1 contains a large number of FD's (more than in F say), then there are a lot of constraints placed on the instances of the relation R1(S1,F1), and so it is unlikely that proj(S1,T) will be an instance of R1(S1,F1). On the other hand, if the sets F1 and F2 contain very few FD's, then there are very few constraints placed on the instances T1 and T2 of the relations R1(S1,F1) and R2(S2,F2), and so it is unlikely that T1 join T2 will be an instance of R(S,F). What we want to do in the next few paragraphs is discuss conditions on the sets of functional dependencies F1 and F2 under which the answers to the above two questions is yes.

33.) Let S1,S2 be a decomposition of the schema from the relation R(S,F) and let F1 and F2 be sets of functional dependencies on the schemas S1,S2. We say that the relations R1(S1,F1) and R2(S2,F2) are a dependency-preserving decomposition of R(S,F) if T1 join T2 is an instance of R(S,F) for any instances T1 and T2 of relations R1(S1,F1) and R2(S2,F2).

34.) Theorem: Let S1,S2 be a decomposition of the schema from the relation R(S,F) and let F1 and F2 be sets of functional dependencies on the schemas S1,S2. Then R1(S1,F1) and R2(S2,F2) is a dependency-preserving decomposition of R(S,F) if and only if (F1 ∪ F2)+ = F+.

Unfortunately, this theorem is not very useful. Because working with closures is so difficult, this theorem does not give us a practical way to determine the sets of functional dependencies F1 and F2.

35.) Whenever we decompose a relation, we definitely need the decomposition to be lossless and we would like the decomposition to be dependency preserving. However, not all relations have a lossless, dependency preserving decomposition. Here is an example of one. In paragraph (28) we showed that the relation

R(S,F) =R({A,B,C}, {AB->C, C->A})
is in 3NF. Since this relation is in 3NF, its tables have some redundancy which we would like to remove by decomposing the relation. Unfortunately, there is no lossless, dependency-preserving decomposition of this relation. To show this, notice that any schema with exactly three attribute has only three possible decompositions, S1={A,B}, S2={A,C} or S1={B,A}, S2={B,C} or S1={C,A}, S2={C,B}. It is not hard to use the condition for lossless join from paragraph (29) to see that only S1={C,A}, S2={C,B} works as a lossless decomposition. Let us now show that this cannot be made into a dependency preserving decomposition. So far we have just the schemas for a decomposition. We need to show that for any choice of functional dependencies F1 and F2, the relations R1(S1,F1) and R2(S2,F2) are not dependency preserving. Consider the following two tables, which are both instances of R(S,F).
311
322
        
141
242
Here are the projections of these two tables onto the schemas S1={C,A} and S2={C,B} respectively. Notice that no matter what choices we make for the functional dependencies F1 and F2, these two tables must be instances of the relations R1(S1,F1) and R2(S2,F2) respectively, otherwise we cannot even make the decompositions.
31
32
        
41
42
Here is the natural join of the above two tables, Notice that this table is not an instance of R(S,F). This shows that there is no way to make our decomposition S1={C,A}, S2={C,B} into a dependency-preserving decomposition.
341
342
As we will see shortly, the fact that some relations do not have lossless, dependency-preserving decompositions means that the algorithm for decomposing tables into BCNF cannot guarantee that the resulting decompositions are dependency-preserving (but the algorithm for decomposing tables into 3NF will guarantee that the decompositions are dependency-preserving).

NOTE: You should check that the example we did in this paragraph is the HasAccount example from page 217 of our textbook.

36.) Let S1,S2 be a decomposition of the schema from the relation R(S,F). Here is one way to choose the functional dependencies F1 and F2 for the schemas S1 and S2. The projection of F onto S1 is the set of all functional dependencies from F+ that involve attributes only from S1. Similarly, the projection of F onto S2 is the set of all functional dependencies from F+ that involve attributes only from S2.

Notice the use of F+ in the definition of the projection of F onto a subset of attributes. If we replace F+ in this definition with just F, then our decomposed relations R1(S1,F1) and R2(S2,F2) may not contain enough functional dependencies and the resulting decomposition may not be dependency-preserving (remember, the fewer constraints there are in F1 and F2, the more likely it is that when we join tables from R1(S1,F1) and R2(S2,F2), the resulting table will not obey all the constraints in R(S,F) ).

It is important to realize that giving the decomposition the functional dependencies defined by the above projection does not guarantee that the decomposition will be dependency-preserving. Some relations just do not have dependency-preserving decompositions. (We will see later what the 3NF decomposition algorithm has to do in order to guarantee that its decompositions are dependency-preserving.)

37.) Let R(S,F) be a relation that is not in BCNF. Since the relation is not in BCNF, we can find a functional dependency of the form X->A where A is a single attribute and X is not a super key. Let

S1=S-{A}
S2=X ∪ {A}.
Let F1 and F2 be the projections of F onto S1 and S2. Then the relations R1(S1,F1) and R2(S2,F2) are a lossless decomposition of R(S,F).

To prove that the decomposition is lossless, we need to compute (S1 ∩ S2)+F. Since the FD X->A, violates BCNF, it is not a trivial FD, so A is not in X. Therefore, S1 ∩ S2 = X. Since X->A, one iteration of the attribute closure algorithm grows X to X ∪ {A}, so obviously we have S2 ⊂ X+F. Therefore S2 ⊂ (S1 ∩ S2)+F which by paragraph (31) means that our decomposition is lossless.

38.) BCNF Decomposition Algorithm Let R(S,F) be a relation that is not in BCNF. Apply the step in paragraph (37) to get the lossless decomposition R1(S1,F1) and R2(S2,F2). If one (or both) of the relations R1(S1,F1) and R2(S2,F2) is not in BCNF, then recursively apply this step to that relation.

39.) Example Let us decompose the relation R(S,F) = R({A,B,C,D}, {A->B, C->D}). You should check that this relation has just one key, {A,C}. So both of the FD's in R(S,F) violate BCNF. Let us apply one step of the BCNF decomposition algorithm using the FD C->D. Then S1={A,B,C} and S2={C,D}. And F1={A->B} and F2={C->D}.

You should check that the relation R2(S2,F2) = R2({C,D}, {C->D}) is in BCNF. On the other hand, the relation R1(S1,F1) = R1({A,B,C}, {A->B}) has the single key {A,C}, so the FD A->B still violates BCNF. So we apply one step of the decomposition algorithm using A->B. Then S3={A,C} and S4={A,B}. And F3={} and F4={A->B}.

You should check that the relations R3(S3,F3) = R3({A,C}, {}) and R4(S4,F4) = R4({A,B}, {A->B}) are in BCNF.

So the final decomposition is
R2(S2,F2) = R2({C,D}, {C->D}),
R3(S3,F3) = R3({A,C}, {}),
R4(S4,F4) = R4({A,B}, {A->B}).

Let us apply some semantics to this example. Change the attribute A to SupplierID, change B to ZIP, change C to PartNo, and change D to Weight. An instance of this relation might look like this.
SupplierIDZIPPartNoWeight
146323103
246394207
246384103
146323207
There is a lot of redundancy in this table. The normalized tables look like this.
SupplierIDPartNo
110
120
210
220
        
SupplierIDZIP
146323
246394
        
PartNoWeight
103
207
Notice that this is just the famous Suppliers-Parts database. The normalization algorithm factored the original table into two "entity" tables and one "relationship" table.

40.) As a partial summary of the above material, here is a list of calculations that you should be able to do when given a relation R(S,F).

  1. Given a subset X of S, compute the attribute closure X+F.
  2. Given a subset X of S, determine if X is a super key.
  3. Given a subset X of S, determine if X is a key.
  4. Compute all the keys of R(S,F).
  5. Determine if the relation R(S,F) is in 3NF or BCNF (or neither).
  6. Given a decompostion of the schema S, determine if the decomposition is lossless.
  7. Given a decompostion S1, S2 of the schema S, compute the projections F1, F2 of F onto S1 and S2.
  8. Given a decompostion of the schema S, determine if the decomposition is dependency-preserving.
    If it is not, be able to give an example of how a specific FD is not preserved in a natural join.
  9. Given a FD from F that violates BCNF, compute the lossless decomposition R1(S1,F1) and R2(S2,F2).
  10. Compute the complete BCNF decomposition of a relation that is not in BCNF.


Return to the CS 442 lecture page.
Return to the CS 442 home page.


compliments and criticisms