Relational Database Cheatsheet
Cheatsheet Content
### Functional Dependencies (FDs) A functional dependency $X \to Y$ between two sets of attributes $X$ and $Y$ in a relation $R$ means that for any two tuples in $R$, if they agree on the values of all attributes in $X$, then they must also agree on the values of all attributes in $Y$. - $X$ is the **determinant**, $Y$ is the **dependent**. - If $X \to Y$ and $Y \to X$, then $X \leftrightarrow Y$. #### Normal Forms Normal forms are rules for database design that prevent anomalies (insertion, deletion, update) and data redundancy. ##### First Normal Form (1NF) A relation is in 1NF if and only if all attribute values are atomic (indivisible) and single-valued. - No multi-valued attributes. - No composite attributes that are not broken down. - Each column contains values of the same type. ##### Second Normal Form (2NF) A relation is in 2NF if it is in 1NF and every non-key attribute is fully functionally dependent on the primary key. - No partial dependencies: A non-key attribute cannot be dependent on only a part of a composite primary key. - **Example:** `(StudentID, CourseID) -> Grade`, `StudentID -> StudentName`. Here, `StudentName` is partially dependent on `StudentID` (part of the composite key). To achieve 2NF, decompose into `(StudentID, CourseID, Grade)` and `(StudentID, StudentName)`. ##### Third Normal Form (3NF) A relation is in 3NF if it is in 2NF and there are no transitive dependencies. - No transitive dependencies: A non-key attribute cannot be dependent on another non-key attribute. If $A \to B$ and $B \to C$, then $A \to C$ is a transitive dependency. 3NF states that if $A \to B$ and $B \to C$, then either $B$ must be a superkey or $C$ must be a prime attribute (part of *some* candidate key). - **Example:** `StudentID -> DepartmentID`, `DepartmentID -> DepartmentName`. Here, `DepartmentName` is transitively dependent on `StudentID` via `DepartmentID`. To achieve 3NF, decompose into `(StudentID, DepartmentID)` and `(DepartmentID, DepartmentName)`. ### Armstrong’s Axioms & Canonical Cover #### Armstrong’s Axioms A set of inference rules used to infer all FDs that logically follow from a given set of FDs (called a cover). 1. **Reflexivity:** If $Y \subseteq X$, then $X \to Y$. (If $Y$ is a subset of $X$, then $X$ determines $Y$.) 2. **Augmentation:** If $X \to Y$, then $XZ \to YZ$ (for any $Z$). (Adding attributes to the determinant does not invalidate the dependency.) 3. **Transitivity:** If $X \to Y$ and $Y \to Z$, then $X \to Z$. (If $X$ determines $Y$ and $Y$ determines $Z$, then $X$ determines $Z$.) **Derived Rules:** - **Decomposition:** If $X \to YZ$, then $X \to Y$ and $X \to Z$. - **Union:** If $X \to Y$ and $X \to Z$, then $X \to YZ$. - **Pseudotransitivity:** If $X \to Y$ and $YW \to Z$, then $XW \to Z$. #### Closure of a Set of Attributes ($X^+$) The set of all attributes that are functionally determined by $X$, given a set of FDs $F$. - Used to find candidate keys and check dependencies. #### Canonical Cover ($F_c$) A minimal set of functional dependencies that is equivalent to a given set of FDs $F$. - No redundant FDs: If an FD $X \to A$ is in $F_c$, it cannot be inferred from $F_c - \{X \to A\}$. - No redundant attributes in the left-hand side (determinant) of any FD: If $XY \to A$ is in $F_c$, $X \to A$ cannot be inferred from $F_c$. - All FDs in $F_c$ have a single attribute on the right-hand side. **Algorithm to find Canonical Cover:** 1. **Split right-hand sides:** Replace $X \to YZ$ with $X \to Y$ and $X \to Z$. 2. **Remove redundant FDs:** For each $FD_i: X \to A$ in $F$, check if $A \in (X)^+_{F - \{FD_i\}}$. If yes, remove $FD_i$. 3. **Remove redundant attributes from left-hand sides:** For each $FD_i: X \to A$ in $F$ and for each attribute $B \in X$, check if $A \in (X - \{B\})^+_F$. If yes, replace $X \to A$ with $(X - \{B\}) \to A$. ### Decomposition and its Properties Decomposition is the process of breaking down a relation into smaller relations to eliminate redundancy and anomalies. #### Desirable Properties of Decomposition: 1. **Lossless-join Decomposition:** - Ensures that no spurious (extra) tuples are generated when the decomposed relations are joined back together. - If a relation $R$ is decomposed into $R_1$ and $R_2$, the decomposition is lossless if $R = R_1 \bowtie R_2$. - **Condition:** The decomposition of $R(A, B, C)$ into $R_1(A, B)$ and $R_2(A, C)$ is lossless-join if $A \to B$ or $A \to C$ holds (where $A$ is the common attribute). - More generally, for a decomposition of $R$ into $R_1, ..., R_n$, it's lossless if applying the natural join operation on all $R_i$ reconstructs the original relation $R$. 2. **Dependency-preserving Decomposition:** - Ensures that all functional dependencies of the original relation can be enforced by simply enforcing the functional dependencies in the decomposed relations. - No need to perform joins to check dependencies. - If $F$ is the set of FDs for $R$, and $F_i$ is the set of FDs for $R_i$, then the union of $F_i$ should be equivalent to $F$. More precisely, $(F_1 \cup F_2 \cup ... \cup F_n)^+ = F^+$. - Achieving both lossless-join and dependency-preserving decomposition simultaneously is often a goal for 3NF. ### Boyce–Codd Normal Form (BCNF) A relation is in BCNF if and only if for every non-trivial functional dependency $X \to Y$, $X$ is a superkey of the relation. - A superkey is any set of attributes that uniquely identifies a tuple. - BCNF is a stricter form of 3NF. If a relation is in BCNF, it is also in 3NF. - **Difference from 3NF:** 3NF allows $X \to Y$ if $Y$ is a prime attribute (part of *some* candidate key), even if $X$ is not a superkey. BCNF does not allow this exception. - **Problem with BCNF:** Sometimes, achieving BCNF might lead to a lossy decomposition or a decomposition that is not dependency-preserving. If dependency preservation is critical, 3NF might be preferred. ### Fourth and Fifth Normal Form ##### Fourth Normal Form (4NF) A relation is in 4NF if and only if it is in BCNF and contains no multi-valued dependencies (MVDs). - **Multi-valued Dependency (MVD):** $A \twoheadrightarrow B$ means that for a given value of $A$, there is a set of values for $B$, and this set is independent of the values of other attributes. - MVDs arise when two or more independent multi-valued facts about an entity are stored in the same relation. - **Decomposition for 4NF:** If $A \twoheadrightarrow B$ is a non-trivial MVD, decompose the relation into two relations: one containing $A$ and $B$, and another containing $A$ and the remaining attributes. ##### Fifth Normal Form (5NF) / Project-Join Normal Form (PJNF) A relation is in 5NF if and only if it is in 4NF and contains no join dependencies that are not implied by the candidate keys. - **Join Dependency (JD):** A relation $R$ satisfies a join dependency $JD(R_1, R_2, ..., R_n)$ if $R = R_1 \bowtie R_2 \bowtie ... \bowtie R_n$. - 5NF deals with cases where a relation can be decomposed into smaller relations and then rejoined without loss of information, even though no MVDs or FDs are violated. This usually happens with complex many-to-many relationships. - Most database designs do not go beyond 3NF or BCNF, as 4NF and 5NF are rarely necessary and can introduce too many tables, increasing query complexity.