Monday, August 27, 2007

Codd Rules

Following is a list of Codd's 12 rules, including his original name for each rule and a simplified description. I also have included a note where certain rules are problematic to implement. Don't worry if some of these items are confusing to you, as we move further through this newsletter series we will fill in the details.

Rule 1: The Information Rule
All data should be presented to the user in table form. Last week's newsletter already discussed the basics of this rule.

Rule 2: Guaranteed Access Rule
All data should be accessible without ambiguity. This can be accomplished through a combination of the table name, primary key, and column name.

Rule 3: Systematic Treatment of Null Values
A field should be allowed to remain empty. This involves the support of a null value, which is distinct from an empty string or a number with a value of zero. Of course, this can't apply to primary keys. In addition, most database implementations support the concept of a nun- null field constraint that prevents null values in a specific table column.

Rule 4: Dynamic On-Line Catalog Based on the Relational Model A relational database must provide access to its structure through the same tools that are used to access the data. This is usually accomplished by storing the structure definition within special system tables.

Rule 5: Comprehensive Data Sublanguage Rule
The database must support at least one clearly defined language that includes functionality for data definition, data manipulation, data integrity, and database transaction control. All commercial relational databases use forms of the standard SQL (Structured Query Language) as their supported comprehensive language.

Rule 6: View Updating Rule
Data can be presented to the user in different logical combinations, called views. Each view should support the same full range of data manipulation that direct-access to a table has available. In practice, providing update and delete access to logical views is difficult and is not fully supported by any current database.

Rule 7: High-level Insert, Update, and Delete
Data can be retrieved from a relational database in sets constructed of data from multiple rows and/or multiple tables. This rule states that insert, update, and delete operations should be supported for any retrievable set rather than just for a single row in a single table.

Rule 8: Physical Data Independence
The user is isolated from the physical method of storing and retrieving information from the database. Changes can be made to the underlying architecture ( hardware, disk storage methods ) without affecting how the user accesses it.

Rule 9: Logical Data Independence
How a user views data should not change when the logical structure (tables structure) of the database changes. This rule is particularly difficult to satisfy. Most databases rely on strong ties between the user view of the data and the actual structure of the underlying tables.

Rule 10: Integrity Independence
The database language (like SQL) should support constraints on user input that maintain database integrity. This rule is not fully implemented by most major vendors. At a minimum, all databases do preserve two constraints through SQL.

No component of a primary key can have a null value. (see rule 3)
If a foreign key is defined in one table, any value in it must exist as a primary key in another table.
Rule 11: Distribution Independence
A user should be totally unaware of whether or not the database is distributed (whether parts of the database exist in multiple locations). A variety of reasons make this rule difficult to implement; I will spend time addressing these reasons when we discuss distributed databases.

Rule 12: Nonsubversion Rule
There should be no way to modify the database structure other than through the multiple row database language (like SQL). Most databases today support administrative tools that allow some direct manipulation of the datastructure.

Saturday, July 7, 2007

SQL Structured Query Language or Standard Query Language

SQL is the only language through which you can access database.

Advantages of Using SQL

1) Standard Query langauge used by all DBMS.
2) English like Language.
3) Non-procedural language.
4) Batch processing, processes multiple records rather than one at a time.

Sub languages of SQL include the following.
DDL Data Defination Language
Create,Alter,Drop,Rename and Truncate
DML Data Manipulation Language
Select( although belongs to query language),Insert,Update,Delete
DCL Data Control Language
Grant,Revoke
TCL Transaction Control Language
Commit,Rollback,Savepoint and Rollback to Save point
DDL Data Defination Language
Is used for defining all the objects in the database. This contains three commands Create, Alter and Drop.

Create is used for create all the objects in the database. The objects that can be created in the database are Table, views, Indexes, sequences, synonyms.

Table is the primary object that needs to be created prior to creating any other objects in the database. Table is the object, which is used for storing data in the database. All other objects i.e views, indexes, sequences and synonyms are dependent on tables. All the database objects are logical storage objects with are physically mapped to the datafiles.

Tables are storage objects for Data
Views are used for security and are used for storing complex queries
Indexes are used for improving performance of the queries
Synonyms are used for creating shortcuts to database objects
Sequences are used for generating automatic sequencing

Evolution of Databases

Database in simple terms is a collection of related information.Prior to the invention of computing systems ,organizational data or information was stored in manual files. These manual or paper files had data in an organized sequence. Each file consisted of related information with or without being indexed.

Disadvantage of Storing data in paper or manual files :
1) These paper files needed huge storage area.
2) paper files are indexed on particular characteristic, where as in case of software we could index it on many permutations and combinations.
3) Since these are manual files same information might need to be reflected in many files. Any modifications to these require editing of all the files manually ,which is a huge task. In case of Software you could write the rules once and that will be reflected automatically everywhere.
4) Searching for related information also requires manual searching in all the paper files.

Database Management Systems.
Database Management System is a piece of software to maintain data in an organized manner and with user friendly interface. The user interface could be software subprograms or functions or else it could be a GUI interface.

Prior to the advent of DBMS software high level languages like C were used for developing database management software. languages like c were designed for scientific purpose but not for Database management. In order to design a software using c for Database management system it takes millons of lines of code. Cobol was designed for Database specific applications.

Types of Databases

1) DBMS (Database Management Systems)
2) RDBMS(Relational Database Management Systems)
3) ORDBMS (Object Relational Database Management Systems)

History of Computing

1)First Generation Languages
First Generation language was a binary language and is a machine-level programming language. The instruction set consists of 1s and 0s.

Disadvantages
1) Difficult to write/edit errors in the program.
2) The code is not protable as it is machine specific.
3) Incorrect instruction typed by programmer may damage equipment and data.

Advantages
1) Its faster and efficient as it is executed directly by the CPU
2) Does not need any compiler or interpreter.

2)Second Generation Languages
In order to overcome some of the disadvantages of the first generation langauge, Assembly level language was developed. Assembly level language was developed. Assembly level language consists of mnemonics , small set of instruction set. These instruction set consists of key words which are human readable but must be converted to machine-readable form in order to execute by the computer.

Disadvantages
1) This still requires a Assembler, a small piece of software to convert mnemonics to binary language.
2) The code is not protable as it is machine specific.

Advantages over 1GL:
1) Human readable for better understanding.

3)Third Generation Languages
3GL or High level languages are human readable instructions. These are also called as structured programming languages. Its easy to write programs using these languages ,however you still need software called compilers or interpreters to convert the code into machine specific instructions. As it is structured programming language instruction set is well organized and human readable.3GLs are character based languages.

Disadvantages:
1) It takes aditional steps to compile from source code to the executable code.
2) Ever single change needs the program/piece of code to be recompiled.
3) Developing a Graphical User Interface needs lots of programming

Ex : C,COBOL,FORTRAM,PASCAL etc

4)Fourth Generation Languages
4GLS includes Packages and GUI ( Graphical User Interfaces). In order to over come the limitations of 3GLs they have developed packaged softwares and GUI's.
Packages helps us the reusuability of the code. GDI ( Graphical Developer Interfaces) allows us the functionally for reusuability. GUI/GDIS will help us to develop software programs at ease and faster pace. In order tp develop a data entry screen you can use the components like buttons,windows,checkboxes , which are already been tested and ported to us. It has advantage of both component/code resuability.

example : Visual Basic,Visual C++, VJ++, Developer 2000,Designer 2000

5)Fifth Generation Languages/Internet Languages
5GL internet languages helps us in deploying applications over web. These languages include HTML,XML,VB Script,Java Script,ASP,JSP,PHP,PSP etc