Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 1 1 Database Systems: Design, Implementation, and Management, 14e Module 6: Normalization of Database Tables Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part..
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 2 Chapter Objectives By the end of this chapter, you should be able to: 1. Explain normalization and its role in the database design process 2. Identify each of the normal forms: 1NF, 2NF, 3NF, BCNF, 4NF, and 5 NF 3. Explain how normal forms can be transformed from lower normal forms to higher normal forms 4. Apply normalization rules to evaluate and correct table structures 5. Identify situations that require denormalization to generate information efficiently 6. Use a data-modeling checklist to check that the ERD meets a set of minimum requirements.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 3 Database Tables and Normalization (1 of 2) • Normalization is a process for evaluating and correcting table structures to minimize data redundancies − It reduces the likelihood of data anomalies − Assigns attributes to tables based on determination • Normalization works through a series of stages called normal forms and the first three are described as follows: − First normal form (1NF) − Second normal form (2NF) − Third normal form (3NF).
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 4 Database Tables and Normalization (2 of 2) • From a structural point of view, higher normal forms are better than lower normal forms − For most purposes in business database design, 3NF is as high as you need to go in the normalization process • Denormalization produces a lower normal form − The result of denormalization is increased performance but greater data redundancy.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 5 The Need for Normalization • Database designers commonly use normalization in the following two situations: − When designing a new database structure − To analyze the relationship among the attributes within each entity and determine if the structure can be improved through normalization • The main goal of normalization is to eliminate data anomalies by eliminating unnecessary or unwanted data redundancies • Normalization uses the concept of functional dependencies to identify which attribute determines other attributes.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 6 The Normalization Process (1 of 4) • The objective of normalization is to ensure that each table conforms to the concept of well-formed relations and has the following characteristics: − Each table represents a single subject − Each row/column intersection contains only one value and not a group of values − No data item will be unnecessarily stored in more than one table − All nonprime attributes in a table are dependent on the primary key − Each table has no insertion, update, or deletion anomalies.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 7 The Normalization Process (2 of 4) • An objective is to ensure that all tables are in at least in 3NF − Higher forms, such as 5NF and DKNF are not likely to be encountered in business environment • The normalization process works one relation at a time, identifying the dependencies of a relation (table).
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 8 The Normalization Process (3 of 4) Table 6.2 Normal Form Characteristic Section First normal form (1NF) Table format, no repeating groups, and PK identified 6-3a Second normal form (2NF) 1NF and no partial dependencies 6-3b Third normal form (3NF) 2NF and no transitive dependencies 6-3c Boyce-Codd normal form (BCNF) 3NF and every determinant is a candidate key (special case of 3NF) 6-6a Fourth normal form (4NF) BCNF and no independent multivalued dependencies 6-6b Fifth normal form (5NF or PJNF) 4NF and cannot have lossless decomposition into smaller tables 6-6c Normal Forms.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 9 The Normalization Process (4 of 4) • A partial dependency exists when there is a functional dependence in which the determinant is only part of the primary key − The assumption is that there is only one candidate key − Partial dependencies tend to be straightforward and easy to identify • A transitive dependency exists when the attribute is dependent on another attribute that is not part of the primary key − Transitive dependencies are more difficult to identify among a set of data − They occur only when a functional dependence exists among non-prime attributes.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 10 Conversion to First Normal Form (1NF) (1 of 3) • A repeating group derives its name from the fact that a group of multiple entries of the same type can exist for any single key attribute occurrence − Normalizing the table structure will reduce data redundancies • Normalization starts with the following three-step procedure: − Eliminate the repeating groups − Identify the primary key − Identify all dependencies.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 11 Conversion to First Normal Form (1NF) (2 of 3) • A dependency diagram depicts all dependencies found within given table structure − It helps to get an overview of all relationships among table’s attributes − Their use makes it less likely that an important dependency will be overlooked • The term 1NF describes the tabular format in which the following occur: − All key attributes are defined − There are no repeating groups in the table − All attributes are dependent on the primary key • All relational tables satisfy 1NF requirements • Some tables contain partial dependencies.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 12 Conversion to First Normal Form (1NF) (3 of 3) Figure 6.3 First Normal Form (1NF) Dependency Diagram.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 13 Conversion to Second Normal Form (2NF) (1 of 2) • Conversion to 2NF occurs only when the 1NF has a composite primary key − If the 1NF has a single-attribute primary key, then the table is automatically in 2NF • The 1NF-to-2NF conversion is simple, you take the following steps: − Step 1: Make new tables to eliminate partial dependencies − Step 2: Reassign corresponding dependent attributes • A table is in 2NF under the following circumstances : − When it is in 1NF − When it includes no partial dependencies.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 14 Figure 6.4 Second Normal Form (2NF) Conversion Results Conversion to Second Normal Form (2NF) (2 of 2).
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 15 Conversion to Third Normal Form (3NF) (1 of 2) • The data anomalies created by the database organization shown in Figure 6.4 are easily eliminated by completing the following two steps: − Step 1: Make new tables to eliminate transitive dependencies − Step 2: Reassign corresponding dependent attributes • A table is in 3NF under the following circumstances: − When it is in 2NF − When it contains no transitive dependencies.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 16 Conversion to Third Normal Form (3NF) (2 of 2) Figure 6.5 Third Normal Form (3NF) Conversion Results.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 17 Improving the Design (1 of 2) • The following are various types of issues you need to address to produce a good normalized set of table: − Minimize data entry errors − Evaluate naming conventions − Refine attribute atomicity ▪ An atomic attribute is an attribute that cannot be further subdivided ▪ Atomicity is a characteristic an attribute that cannot be divided into smaller units − Identify new attributes − Identify new relationships.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 18 Improving the Design (2 of 2) • The following are various types of issues you need to address to produce a good normalized set of table (continued): − Refine primary keys as required for data granularity ▪ Granularity refers to the level of detail represented by the values stored in a table’s row − Maintain historical accuracy − Evaluate using derived attributes.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 19 Surrogate Key Considerations • Surrogate keys are used by designers when the primary key is considered to be unsuitable • A surrogate key is a system-defined attribute generally created and managed via the DBMS • Usually it is a numeric value which is automatically incremented for each new row.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 20 Higher-Level Normal Forms • Tables in 3NF will perform suitably for business transactional databases • Higher normal forms are sometimes useful • This section covers the following higher-level normal forms: − Boyce-Codd normal form (BCNF) − Fourth normal form (4NF) − Fifth normal form (5NF).
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 21 The Boyce-Codd Normal Form (1 of 4) • A table is in BCNF when it is in 3NF and every determinant in the table is a candidate key − Recall from Chapter 3 that a candidate key has the same characteristics as primary key but was not chosen to be the primary key • When a table contains only one candidate key, the 3NF and the BCNF are equivalent • BCNF can be violated only when the table contains more than one candidate key • BCNF is considered to be a special case of 3NF.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 22 The Boyce-Codd Normal Form (2 of 4) • Figure 6.7 Tables with Multiple Candidate Keys.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 23 The Boyce-Codd Normal Form (2 of 4) • Figure 6.8 A Table That Is in 3NF but not in BCNF.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 24 The Boyce-Codd Normal Form (3 of 4) Figure 6.9 Decomposition to BCNF.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 25 The Boyce-Codd Normal Form (4 of 4) Table 6.5 Stu_ID Staff_ID Class_code Enroll_grade 125 25 21334 A 125 20 32456 C 135 20 28458 B 144 25 27563 C 144 20 32456 B Sample Data for a BCNF Conversion.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 26 Fourth Normal Form (4NF) (1 of 2) • The discussion of 4NF is academic if you make sure that your tables conform to the following two rules: − All attributes must be dependent on the primary key, but they must be independent of each other − No row may contain two or more multivalued facts about an entity • A table is in 4NF under the following circumstances: − When it is in 3NF − When it has no multivalued dependencies.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 27 Figure 6.12 A Set of Tables in 4NF Fourth Normal Form (4NF) (2 of 2).
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 28 Fifth Normal Form (5NF) (1 of 2) • Fifth normal form, also known as project join normal form (PJNF), addresses the issue in which a table cannot be decomposed anymore without losing data or creating incorrect information • Lossless decomposition occurs when the decomposed tables are joined and the original table is recreated • Higher normal forms can provide value, however, the value is limited by the additional processing necessary to work with the data • The lower normal forms are generally highly desirable and should always be considered during the database design process.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 29 Fifth Normal Form (5NF) (2 of 2) Figure 6.13 Fifth Normal Form (5NF).
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 30 Normalization and Database Design (1 of 4) • Normalization should be part of the design process − Proposed entities must meet the required normal form before table structures are created • Principles and normalization procedures to be understood to redesign and modify databases include the following: − ERD is created through an iterative process − Normalization focuses on the characteristics of specific entities.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 31 Normalization and Database Design (2 of 4) • Figure 6.14 Initial Contracting Company ERD.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 32 • Figure 6.15 Modified Contracting Company ERD Normalization and Database Design (2 of 4).
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 33 Normalization and Database Design (3 of 4) • Figure 6.16 Incorrect M:N Relationship Representation.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 34 Normalization and Database Design (3 of 4) • Figure 6.17 Final Contracting Company ERD.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 35 Normalization and Database Design (4 of 4) Figure 6.18 The Implemented Database.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 36 Denormalization (1 of 2) • Important database design goals include the following: − Creation of normalized relations − Considering processing requirements and speed • A problem with normalization is that as tables are decomposed to conform to normalization requirements, the number of database tables expands • Joining a larger number of tables takes additional input/output (I/O) operations and processing logic, thereby reducing system speed.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 37 Denormalization (2 of 2) • Defects in unnormalized tables include the following: − Data updates are less efficient because tables are larger − Indexing is more cumbersome − There are no simple strategies for creating virtual tables known as views • Unnormalized database tables often lead to various data redundancy disasters in production databases.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 38 Data-Modeling Checklist (1 of 5) • Business rules − Properly document and verify all business rules with the end users − Ensure that all business rules are written precisely, clearly, and simply ▪ The business rules must help identify entities, attributes, relationships, and constraints − Identify the source of all business rules, and ensure that each business rule is justified, dated, and signed off by an approving authority.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 39 Data-Modeling Checklist (2 of 5) • Data modeling − Naming conventions: all names should be limited in length (database-dependent size) − Entity names: ▪ Should be nouns that are familiar to business and should be short and meaningful ▪ Should document abbreviations, synonyms, and aliases for each entity ▪ Should be unique within the model ▪ Composite entities may include a combination of abbreviated names of the entities linked through the composite entity.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 40 Data-Modeling Checklist (3 of 5) • Data modeling (continued) − Entities: ▪ Each entity should represent a single subject ▪ Each entity should represent a set of distinguishable entity instances ▪ All entities should be in 3NF or higher Any entities below 3NF should be justified ▪ The granularity of the entity instance should be clearly defined ▪ PK should be clearly defined and support the selected data granularity.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 41 Data-Modeling Checklist (4 of 5) • Data modeling (continued) − Attributes: ▪ Should be simple and single-valued (atomic data) ▪ Should document default values, constraints, synonyms, and aliases ▪ Derived attributes should be clearly identified and include source(s) ▪ Should not be redundant unless this is required for transaction accuracy, performance, or maintaining a history ▪ Nonkey attributes must be fully dependent on the PK attribute.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 42 Data-Modeling Checklist (5 of 5) • Data modeling (continued) − Relationships: ▪ Should clearly identify relationship participants ▪ Should clearly define participation, connectivity, and document cardinality − ER model: ▪ Should be validated against expected processes: inserts, updates, and deletions ▪ Should evaluate where, when, and how to maintain a history ▪ Should not contain redundant relationships except as required (see attributes) ▪ Should minimize data redundancy to ensure single-place updates.
Coronel, Carlos and Morris, Steven, Database Systems: Design, Implementation, and Management, 14 Edition. © 2023 Cengage. All Rights Reserved. May not be scanned, copied or duplicated, or posted to a publicly accessible website, in whole or in part. 43 Summary Now that the lesson has ended, you should be able to: 1. Explain normalization and its role in the database design process 2. Identify each of the normal forms: 1NF, 2NF, 3NF, BCNF, 4NF, and 5 NF 3. Explain how normal forms can be transformed from lower normal forms to higher normal forms 4. Apply normalization rules to evaluate and correct table structures 5. Identify situations that require denormalization to generate information efficiently 6. Use a data-modeling checklist to check that the ERD meets a set of minimum requirements.