### Unit 1: Introduction to DBMS #### DBMS Overview - **Definition:** A Database Management System (DBMS) is software that interacts with end-users, applications, and the database itself to capture and analyze data. - **Purpose:** To store, retrieve, and manage data efficiently and reliably. - **Key Characteristics:** Data independence, efficient data access, data integrity & security, concurrent access, backup & recovery. #### Data vs. Information vs. Knowledge - **Data:** Raw, unprocessed facts (e.g., `25`, `Anna`). - **Information:** Processed data that is meaningful (e.g., `Anna is 29 yrs old`). - **Knowledge:** Awareness or conscious understanding of information, enabling practical use (e.g., `Ben, Anna, and Kathy are young and prospective leaders`). #### Data/Information Processing - The process of converting raw data into meaningful information. $$\text{Data} \xrightarrow{\text{Processed}} \text{Information} \xrightarrow{\text{Processed}} \text{Knowledge}$$ #### File-Based Systems - **Definition:** Collection of data stored in files with specific formats, managed by a file system. - **Problems:** - No standards, data duplication, data dependence. - No way to generate ad hoc queries. - No provision for security, recovery, concurrency. #### File System vs. DBMS Comparison | Basis | File System | DBMS | |------------------|-------------------------------------------------|-------------------------------------------| | **Structure** | Software managing files in storage medium | Software managing the database | | **Data Redundancy** | Redundant data can be present | No redundant data | | **Backup & Recovery** | Doesn't provide backup/recovery | Provides backup/recovery | | **Query Processing** | No efficient query processing | Efficient query processing | | **Consistency** | Less data consistency | More data consistency (normalization) | | **Complexity** | Less complex | More complexity in handling | | **Security** | Less security | More security mechanisms | | **Cost** | Less expensive | Comparatively higher cost | | **Data Independence** | No data independence | Data independence exists | | **User Access** | Only one user at a time | Multiple users at a time | #### Database System - **Definition:** An organized collection of structured data, typically stored electronically, controlled by a DBMS. - **Components:** Data (database), Software (DBMS), Hardware, Users. - **Benefits:** Reduced redundancy, avoided inconsistency, transaction support, maintained integrity, enforced security, balanced conflicting requirements, enforced standards. - **Applications:** Railway reservation, banking, manufacturing, library management, online shopping, education, finance, telecommunication, social media, credit card transactions. #### Database Users - **Actors on the Scene:** - **Database Administrator (DBA):** Defines schema, controls access, security, backup/recovery, repairs damage. - **Naive / End Users:** Use applications without DBMS knowledge (e.g., railway ticket users). - **System Analyst:** Analyzes requirements for parametric end users. - **Sophisticated Users:** Familiar with database, use SQL queries (engineers, scientists). - **Database Designers:** Design database structure (tables, indexes, views, constraints). - **Application Programmer:** Writes code for application programs. - **Casual Users / Temporary Users:** Occasionally access the database. - **Workers Behind the Scene:** - DBMS system designers and implementers. - Tool developers. - Operators and maintenance personnel. #### Concepts of Schema, Instance, and Data Independence ##### Schema - **Definition:** The overall description or design of a database (blueprint/skeleton). - **Types:** 1. **Physical Schema:** Describes physical storage structure, how data is stored in blocks, low-level data structures. 2. **Logical Schema (Conceptual Schema):** Defines logical constraints, tables, views, and integrity constraints. 3. **View Schema (External Schema):** Describes the database at the view level for end-user interaction. ##### DBMS Instance - **Definition:** The data stored in the database at a particular moment in time. - The values of variables defined by the schema at a given moment. (e.g., a `student` table with 100 records now has an instance of 100 records). ##### Data Independence - **Definition:** Ability to change the schema at one level without affecting higher levels. - **Types:** 1. **Physical Data Independence:** Changes in physical storage (file organization, indexing) do not affect the conceptual schema or external views. - **Examples of changes:** New storage device, modifying file organization, switching data structures, changing access method, modifying indexes, compression techniques, changing database location. 2. **Logical Data Independence:** Changes in the conceptual schema (adding/removing attributes) do not affect external views or application programs. - **Examples of changes:** Add/Modify/Delete attribute, entity, or relationship; merging or breaking records. - **Importance:** Improves data quality, makes maintenance affordable, enforces standards, improves security, allows developers to focus on structure, enables performance improvements. ### Unit 1 Part B: Entity Relationship Model #### E-R Model - **Definition:** Entity-Relationship (ER) Model, also known as E-R Diagram, proposed by Peter Chen (1971) to describe relational databases. - **Purpose:** Systematically analyze data requirements to produce a well-designed database; define the conceptual view. - **Why use ER Diagrams?** - Describes entities, attributes, relationships. - Previews table connections and fields. - Translatable into relational tables for quick database building. - Blueprint for database implementation. - Helps designers understand the database. - **Advantages:** Simple, effective communication tool, easy conversion to other models. - **Disadvantages:** No industry standard for notation, hidden information (high-level view may hide details). #### Components of the ER Diagram 1. **Entities** 2. **Attributes** 3. **Relationships** #### Notations (Symbols used in E-R Model) - **Rectangle:** Entity types. - **Double Rectangle:** Weak Entity. - **Ellipse:** Attributes. - **Dashed Ellipse:** Derived attributes. - **Double Ellipses:** Multi-valued attributes. - **Diamond:** Relationship types. - **Double Diamond:** Identifying relationship (for weak entities). - **Lines:** Links attributes to entities and entities to relationships. - **Underlined Attribute:** Primary key. - **Dashed Underlined Attribute:** Discriminator / Weak primary key. #### Entity and Entity Set - **Entity:** A real-world distinguishable thing (person, place, object, event, concept). Represented by a rectangle. - **Strong Entity:** Can be uniquely identified by its own attributes (has a primary key). Represented by a single rectangle. - **Weak Entity:** Cannot be uniquely identified by its own attributes; depends on a strong entity (identifying owner) for existence and unique identification. Represented by a double rectangle. - **Discriminator (Partial Key):** Set of attributes that uniquely distinguishes entities within a weak entity set. Underlined with a dashed line. - The primary key of a weak entity set is formed by the primary key of its strong entity owner plus its discriminator. - The relationship between a weak entity type and a strong entity type is called an **identifying relationship**, represented by a double outlined diamond. It must be a total participation, one-to-many relationship from strong to weak entity. - **Entity Set:** A collection of similar types of entities (e.g., `Employee` set). #### Attributes - **Definition:** Property or characteristic of an entity. Represented by an ellipse/oval. - **Domain:** A specific set of values from which an attribute can take its values. - **Types of Attributes:** 1. **Simple Attribute:** Atomic values, cannot be divided further (e.g., `gender`). 2. **Composite Attribute:** Made up of more than one simple attribute, can be further divisible (e.g., `Name` into `f_name, m_name, l_name`). 3. **Single-Valued Attribute:** Takes only one value at a time (e.g., `dob`). 4. **Multi-Valued Attribute:** Can take more than one value at a time. Represented by a double ellipse (e.g., `Mob_no`, `Email_id`). 5. **Derived Attribute:** Can be derived from other attributes. Represented by a dashed ellipse (e.g., `age` from `dob`). 6. **Key Attribute:** Uniquely identifies an entity in an entity set. Underlined (e.g., `roll_no`). #### Relationship and Relationship Set - **Relationship:** An association among two or more entities. Represented by a diamond shape. - **Descriptive Attributes:** Attributes of a relationship (e.g., `Since` in `Works_at`). - **Relationship Set:** A set of relationships of similar type. #### Degree of a Relationship - The number of entity sets participating in a relationship set. - **Types of Relationships (based on degree):** 1. **Unary Relationship (Degree 1):** Relationship between entities of the same type (recursive relationship). (e.g., `Employee Supervises Employee`). 2. **Binary Relationship (Degree 2):** Relationship between exactly two entity sets (e.g., `Student Enrolls Course`). 3. **Ternary Relationship (Degree 3):** Relationship between exactly three entity sets (e.g., `Doctor Prescribes Medicine to Patient`). 4. **N-ary Relationship (Degree n):** Relationship between more than three entity sets. #### Mapping Cardinalities/Constraints - **Definition:** Defines how many entities in one entity set can be related to entities in another entity set in a relationship set. It defines the number of instances participating in a relationship. - **Types (for binary relationships):** 1. **One-to-One (1:1):** An entity in A associated with at most one entity in B, and vice-versa. 2. **One-to-Many (1:M):** An entity in A associated with zero or more entities of B, but an entity in B is associated with at most one entity in A. 3. **Many-to-One (M:1):** An entity in A associated with at most one entity in B, but an entity in B is associated with zero or more entities in A. 4. **Many-to-Many (M:N):** An entity in A associated with any number of entities in B, and vice-versa. #### Participation Constraints - **Definition:** Specifies the minimum participation of an entity set in a relationship. - **Types:** - **Total Participation:** Every entity in the entity set must participate in at least one relationship in the relationship set. Represented by a double line. - **Partial Participation:** Only some entities in the entity set participate in relationships. Represented by a single line. - **Cardinality Constraints:** Specify maximum participation (i.e., "Atmost"). - **Participation Constraints:** Specify minimum participation (i.e., "At least" or "Minimum"). #### Extended Features of E-R Model (EER Model) - Introduced to handle complex applications. - **Features:** More accurate design, reflects data properties/constraints precisely, includes all ER modeling concepts, diagrammatic technique, includes specialization and generalization. - **Concepts:** 1. **Generalization:** A bottom-up design approach where two or more lower-level entities with common attributes are combined to form a higher-level, more generalized entity (e.g., `Teacher` and `Student` generalized to `Person`). 2. **Specialization:** A top-down design approach where a higher-level entity is broken down into two or more lower-level entities based on distinct attributes (e.g., `Account` specialized into `Current Account` and `Savings Account`). 3. **Aggregation:** A process where a relationship between two entities is treated as a single entity, or a higher-level entity set, to participate in other relationships. This is an abstraction used to express relationships among other relationships. ### Unit 2: Relational Model & SQL Basics #### Relational Model (RM) - **Definition:** Represents the database as a collection of relations (tables). - **Proposed by:** E.F. Codd. - **Purpose:** Convert the conceptual ER model into a relational model for implementation using RDBMS languages (e.g., Oracle SQL, MySQL). #### Relational Model Terminology - **Relation (Table):** Organized into rows and columns, stores data. - **Tuple (Row):** A single record in a table. - **Attribute (Column):** A specific piece of information for each record. - **Domain:** Set of possible valid values for an attribute. - **Relation Schema:** Describes a relation with its attributes (e.g., `STUDENT (ROLL_NO, NAME, ADDRESS, PHONE, AGE)`). - **Relation Instance:** The set of tuples in a relation at a particular moment. - **Degree (Arity):** Number of attributes in the relation. - **Cardinality:** Number of tuples (rows) in the relation. - **NULL Values:** Represents unknown or inapplicable values. Different from zero. - **Interpretations:** Value unknown, attribute not applicable. #### Constraints in DBMS - **Definition:** Rules enforced on data columns to ensure accuracy and reliability. - **Types of Constraints:** 1. **Domain Constraints:** Restrict values a column can hold. Ensures data type match and other constraints (NOT NULL, UNIQUE, PRIMARY KEY, FOREIGN KEY, CHECK, DEFAULT) are fulfilled. 2. **Key Constraints:** Ensure uniqueness of tuples. - **Superkey:** A set of attributes that uniquely identifies tuples. - **Candidate Key:** A minimal superkey (no redundant attributes). - Properties: Unique values, multiple attributes possible, must not contain NULL values. - **Primary Key:** A chosen candidate key that uniquely identifies each row. - Rules: Unique, not NULL, cannot be modified if referenced by foreign key. Only one primary key per table. 3. **Integrity Constraints:** Maintain data quality and guard against accidental damage. - **Entity Integrity Constraint:** Primary key values cannot be NULL. - **Referential Integrity Constraint:** Ensures consistency between relations via foreign keys. A foreign key in R1 must either match a primary key in R2 or be NULL. #### Relational Algebra - **Definition:** A procedural query language that takes relations as input and produces relations as output. - **Fundamental Operators:** 1. **Selection ($\sigma$):** Filters rows (tuples) based on a condition. (e.g., `$\sigma_{C>3}$(R)`). 2. **Projection ($\pi$):** Selects columns (attributes). Removes duplicate tuples by default. (e.g., `$\pi_{B,C}$(R)`). 3. **Union ($\cup$):** Combines rows from two relations. Relations must have the same number and data types of columns. 4. **Set Difference (-):** Returns rows present in one relation but not in another. Relations must have the same attributes. 5. **Set Intersection ($\cap$):** Returns rows common to two relations. Relations must have the same attributes. 6. **Rename ($\rho$):** Gives a temporary new name to a relation or its attributes. (e.g., `$\rho_{a/b}$(R)` renames attribute `b` to `a`). 7. **Cartesian Product (X):** Combines every row of one table with every row of another. If A has 'n' tuples and B has 'm' tuples, A X B has 'n*m' tuples. - **Derived Operators:** 1. **Natural Join ($\bowtie$):** Combines tuples from two or more relations based on equal common attributes. Duplicate common attributes are removed. 2. **Conditional Join ($\Join$):** Similar to natural join but allows arbitrary conditions (>,