Multivalued dependiency

Recall that when we discussed database modelling using the E-R Modelling technique, we noted difficulties that can arise when an entity has multivalue attributes. It was because in the relational model, if all of the information AbOUT such entity is to be represented in one relation, it will be necessary to repeat all the information other than the multivalue attribute value to represent all the information that we wish to represent. This results in many tuples about the same instance of the entity in the relation and the relation having a composite key (the entity id and the mutlivalued attribute). Of course the other option suggested was to represent this multivalue information in a separate relation. The situation of course becomes much worse if an entity has more than one multivalued attributes and these values are represented in one relation by a number of tuples for each entity instance such that every value of one the multivalued attributes appears with every value of the second multivalued attribute to maintain consistency. The multivalued dependency relates to this problem when more than one multivalued attributes exist. Consider the following relation that represents an entity employee that has one mutlivalued attribute proj:

emp (e#, dept, salary, proj)

We have so far considered normalization based on functional dependencies; dependencies that apply only to single-valued facts. For example, e# -> dept implies only one dept value for each value of e#. Not all information in a database is single-valued, for example, proj in an employee relation May Be the list of all projects that the employee is currently working on. Although e# determines the list of all projects that an employee is working on, e# -> proj is not a functional dependency.

So far we have dealt with multivalued facts about an entity by having a separate relation for that multivalue attribute and then inserting a tuple for each value of that fact. This resulted in composite keys since the multivalued fact must form part of the key. In none of our examples so far have we dealt with an entity having more than one multivalued attribute in one relation. We do so now.

The fourth and fifth normal forms deal with multivalued dependencies. Before discussing the 4NF and 5NF we discuss the following example to illustrate the concept of multivalued dependency.

programmer (emp_name, qualifications, languages)

The above relation includes two multivalued attributes of entity programmer; qualifications and languages. There are no functional dependencies.

The attributes qualifications and languages are assumed independent of each other. If we were to consider qualifications and languages separate entities, we would have two relationships (one between employees and qualifications and the other between employees and programming languages). Both the above relationships are many-to-many i.e. one programmer could have several qualifications and may know several programming languages. Also one qualification may be obtained by several programmers and one programming language may be known to many programmers.

The above relation is therefore in 3NF (even in BCNF) but it still has some disadvantages. Suppose a programmer has several qualifications (B.Sc, Dip. Comp. Sc, etc) and is ProFicient in several programming languages; how should this information be represented? There are several possibilities.

emp_name qualifications languages SMITH SMITH SMITH SMITH SMITH SMITH B.Sc B.Sc B.Sc Dip.CS Dip.CS Dip.CS FORTRAN COBOL PASCAL FORTRAN COBOL PASCAL

emp_name qualifications languages SMITH SMITH SMITH SMITH SMITH B.Sc Dip.CS NULL NULL NULL

   NULL

NULL FORTRAN COBOL PASCAL

emp_name qualifications languages SMITH SMITH SMITH B.Sc Dip.CS NULL FORTRAN COBOL PASCAL

Other variations are possible (we remind the reader that there is no relationship between qualifications and programming languages). All these variations have some disadvantages. If the information is repeated we face the same problems of repeated information and anomalies as we did when second or third normal form conditions are violated. If there is no repetition, there are still some difficulties with search, insertions and deletions. For example, the role of NULL values in the above relations is confusing. Also the candidate key in the above relations is (emp name, qualifications, language) and existential integrity requires that no NULLs be specified. These problems may be overcome by decomposing a relation like the one above as follows:

emp_name qualifications SMITH SMITH B.Sc Dip.CS

emp_name languages SMITH SMITH SMITH FORTRAN COBOL PASCAL

The basis of the above decomposition is the concept of multivalued dependency (MVD). Functional dependency A -> B relates one value of A to one value of B while multivalued dependency A ->> B defines a relationship in which a set of values of attribute B are determined by a single value of A.

The concept of multivalued dependencies was developed to provide a basis for decomposition of relations like the one above. Therefore if a relation like enrolment(sno, subject#) has a relationship between sno and subject# in which sno uniquely determines the values of subject#, the dependence of subject# on sno is called a trivial MVD since the relation enrolment cannot be decomposed any further. More formally, a MVD X ->> Y is called trivial MVD if either Y is a subset of X or X and Y together form the relation R. The MVD is trivial since it results in no constraints being placed on the relation. Therefore a relation having non-trivial MVDs must have at least three attributes; two of them multivalued. Non-trivial MVDs result in the relation having some constraints on it since all possible combinations of the multivalue attributes are then required to be in the relation.

Let us now define the concept of multivalued dependency. The multivalued dependency X ->> Y is said to hold for a relation R(X, Y, Z) if for a given set of value (set of values if X is more than one attribute) for attributes X, there is a set of (zero or more) associated values for the set of attributes Y and the Y values depend only on X values and have no dependence on the set of attributes Z.

In the example above, if there was some dependence between the attributes qualifications and language, for example perhaps, the language was related to the qualifications (perhaps the qualification was a training certificate in a particular language), then the relation would not have MVD and could not be decomposed into two relations as abve. In the above situation whenever X ->> Y holds, so does X ->> Z since the role of the attributes Y and Z is symmetrical.

Consider two different situations.

(a) Z is a single valued attribute. In this situation, we deal with R(X, Y, Z) as before by entering several tuples about each entity. (b) Z is multivalued.

Now, more formally, X ->> Y is said to hold for R(X, Y, Z) if t1 and t2 are two tuples in R that have the same values for attributes X and therefore with t1[x] = t2[x] then R also contains tuples t3 and t4 (not necessarily distinct) such that

t1[x] = t2[x] = t3[x] = t4[x] t3[Y] = t1[Y] and t3[Z] = t2[Z] t4[Y] = t2[Y] and t4[Z] = t1[Z]

In other words if t1 and t2 are given by

t1 = [X, Y1, Z1], and t2 = [X, Y2, Z2]

then there must be tuples t3 and t4 such that

t3 = [X, Y1, Z2], and t4 = [X, Y2, Z1]

We are therefore insisting that every value of Y appears with every value of Z to keep the relation instances consistent. In other words, the above conditions insist that Y and Z are determined by X alone and there is no relationship between Y and Z since Y and Z appear in every possible pair and hence these pairings present no information and are of no significance. Only if some of these pairings were not present, there would be some significance in the pairings.