Guideline, Oracle SQL Date: Jul-2002 Copyright © 2000 Jerry Mattsson / Timehole.com All Rights Reserved. CG_jm1.php
Contents:
SQL is a set oriented language and is used to define and change data in a relational database.
The SQL functionality together with a well designed database can eliminate a lot of
application and client side code.
In a perfect world it could probably possible to put more than 90% of the code on the database side
in SQL. You could do data verification, validation, data formatting and data manipulation that also
could go into the application side code. But if put in the database it will minimize the amount
of code you write in the application. Data will be consistently verified and you could easily use
different end-user interfaces to handle the code. If you do, what is left for the application code
is normally only user-interaction activities and some formatting and error handling.
The drawback could be that you must know how to write efficient, complex and long SQL functions.
And you must have to have a well designed and normalized data structure.
An often seen sign of a bad database design is the absence of more complex SQL statement and the
view that it is hard to write the SQL-functions that does what you want it to do.
The opposite is also true, a well normalized database makes it easier to write SQL-functions that does
a lot of the work that you otherwise would put in your application.
When the design quality and normalization level of the database is questioned, the argument is often that
te design is done for performance reasons, witch almost never are true.
A well normalized database schema in an Oracle database does not normally degrade performance,
the contrary is more often true.
The most common error in the design phase is that the performance reasons are introduced to early.
This in turn is a little bit strange when we often in the next breath defend our bad design with the
argument that we have enough computer power to do it this way anyway.
The logical design of the database should be made in a very idealistic way. If you in implementation
phase have to compromise or store data in a non-normalized way that should be a very well thought through
decision, with both flexibility and performance in mind.
In addition to this constraint definition is seldom used to the extent that it should. If used,
this also means that application code is eliminated. The total number of access to the database decreases
and performance increases. Further to this the database normally makes checks faster and more
concise than if you put this in the application code, especially primary and foreign key relationship checks.
All this is positive and time saving results if you do a good database design and make use the SQL language
in the way that it should.
If we then start to use the procedural capabilities of the database then an extra level of functionality is
introduced with much of the same advantages as well. But this is not covered in this document.
When we create applications today, we have the possibility to use many different environments and tools
to develop, test and execute our code. Actual access to an the database could be hidden from us or done
from different interfaces and programs, but the access will always be made through the SQL-language.
Different tools is often optimized to generate one or a couple of different types of applications, reports,
data-loading or manipulative programs. It is therefore important to use the right type of tool
( forms-generators, report-generators, data loaders, 3-GL etc. ) for the specific problem we are trying to
solve. If you are developing large applications, or systems with high performance expectations, it will
be important to utilize the power of tool in the best possible way. This is to efficiently use the power
of the computers that will handle the database request, both on the client and the server side.
Even if it today sometimes feels like we have access to infinite CPU and memory resources, it is
easy to try to use more than you have. Using 3-GL-type interfaces ( like C, C++, Fortran, Java etc.)
gives us more control and more fine tuning options. It also poses more responsibilities on us,
to code efficient and to create good and trouble free multi-user programs.
<Up to Contents>
Names of all database objects, tables, views and synonyms should be consistently constructed and named
so it will be possible to "guess" or deduce the name when writing code. It's not really
important how this is set up as long as you use it and makes it easy to find and understand what
kind of objects you are dealing with. Simple is good.
Hopefully you are using tools to design and create your database schema. These tools has their own naming
conventions and it should be wise to accept the tools standard.
If not, define a standard to all objects types that you will use. Make your own list, something like this:
| Type
| Format / Sample
| Description
|
| Tables
| Adm_Personnel Persons
|
Short unique descriptive name. Uniqueness may be enforced with
application system prefix.
|
| Views
| Adm_Persons
| Same as Tables
|
| Synonyms
| Adm_Persons
| Same as Tables
|
| Table Aliases
| PER
|
Three to five character of initial table name.
|
| Columns
| Ename
|
Short descriptive name. Preferable without special characters. Used as required.
|
| Column Aliases
| "Family Name"
|
If special characters are used this should be a double quoted String. Used as required.
|
| Column Prefix
| PER
|
Use table Alias, Used as required.
|
| Constraints
|
|
Composed from the tablename + constraint type + column short name
+ type of constraint or number
|
| - Primary key
| Adm_Per_PK
|
|
| - Foreign key
| Adm_Per_FK_Dept_Dname
|
|
| - Unique key
| Adm_Per_UK_Ename
|
|
| - Check
| Adm_Per_CHK_Ename_UC
| Check ename is upper case
|
| - Check
| Adm_Per_CHK_Ename1
|
|
| Index
| Adm_Per_IX_Dname
|
Appl. + Table alias or index type + IX + Col names
|
<Up to Contents>
Public objects are references to tables, synonyms, packages, procedures, functions and views globally
and commonly accessible for users or programs that the user run within a database.
Private objects are references within a schema to objects that are hidden from the other users and
used only by programs or views that access the schema directly.
This can be used to hide the actual physical objects from the user or application and give you the
possibility to exchange the actual underlying physical object without changing the application.
Create public synonyms for all object that should be commonly available, for reporting or ad-hoq access.
The name of the synonym could be the same as the actual object but the importance here is that the synonyms
are the easy-to-remember object names.
To create public synonyms requires the specific right to do this. This right is by default only are
owned by database administrators ( DBA's ), but this right could be given to the application owner
accounts, consult your DBA for advice.
All public database objects must by definition be created with a database unique name.
Creating public synonyms will enforce object name uniqueness in a database.
You create your public synonyms with the create synonym statement:
CREATE public synonym EMP for SCOTT.EMP;
All users and programs can there after use the synonym EMP to refer to the table SCOTT.EMP.
Recreating the synonym after that for another object will redirect all accesses to that
"new" object.
CREATE public synonym EMP for SCOTT.EMP2;
This is not the same thing as to give everybody access to the (public) object.
The rights to objects are defined separately and the access right should be granted to the users
directly or via roles. The access structure and the object rights should be defined and set up in
the development phase in the development system in the same way as it will be in the production system.
This will let you define and test your security structure throughout the development phase.
There might also be a number of private synonyms pointing on other objects with the name of EMP.
Private synonyms override public and could be used in the development schema for different users pointing
to private temporary objects during development.
<Up to Contents>
Name constraints consistently. This gives you the possibility to track the source of the problem to
the underlying objects and to understand what kind of rule that was violated in a program.
Whenever there is an error in the database, caused by violating a constraint, the database reports this
with a message containing the name of the constraint. The constraint name should be assembled from the
table short name and a shortening of the constraint type, plus the column that the constraint is defined for.
Some constraint types may be declared multiple times for columns and should therefore be numbered or
otherwise further defined.
CREATE TABLE EMP (
EMPNO NUMBER(4,0) Constraint EMP_CHK_EMPNO_NN check(EMPNO NOT NULL),
ENAME VARCHAR(10) Constraint EMP_CHK_ENAME_NN check(ENAME NOT NULL),
EMP_CHK_ENAME_UC check(ENAME=upper(ENAME)),
JOB VARCHAR(9) Constraint EMP_CHK_JOB_NN check(JOB NOT NULL),
EMP_CHK_JOB_UC check(JOB = upper(JOB)),
MGR NUMBER(4,0) REFERENCES EMP (EMPNO)
Constraint EMP_FK_EMP,
HIREDATE DATE Constraint EMP_CHK_HIRED_NN check(HIREDATE NOT NULL),
SAL NUMBER(7,2) Constraint EMP_CHK_SAL_Range check(SAL > 0),
COMM NUMBER(7,2) Constraint EMP_CHK_COMM_Range check(COMM > 0),
DEPTNO NUMBER(2,0) Constraint EMP_CHK_DEPTNO_NN check(DEPTNO NOT NULL),
FOREIGN KEY (DEPTNO) REFERENCES DEPT (DEPTNO)
Constraint EMP_FK_DEPT,
PRIMARY KEY (EMPNO) Constraint EMP_PK_EMPNO);
<Up to Contents>
2.2.1 General
Programs (source, files, modules) should be named according to your general program naming standard.
CASE-tools usually use a naming standard and could generate names like APP0123 for a program module.
This is a good practice to adopt for a general programming naming standard. The application system
short name would be used as a prefix and added to that, a running numbering of the program or a short
description.
Use short program/module names with 8 character or less. If this program is a file containing SQL statements,
intended to be run as a ( e.g. maintenance ) program, this should be named APP0123.SQL, where APP is a
short identification for the application system, and 0123 the number of the program. All program file names
should be short, preferably at the most 8 characters with a 3 letter suffix, to allow portability to
simple file systems.
Files used to create or manipulate database objects can be included in or run from a number of different tools.
The naming of these files should be in conformance to the tools convention. If this is not appropriate or
if the file is intended for general use, the following file name schema could be used:
| Type
| Name
| Suffix
| Sample
|
| General SQL command file
| AppShortName + Number +
| sql
| PER0012.sql
|
| Table definitions
| AppShortName + Number +
| tab
| PER0013.tab
|
| Constraints definitions
| AppShortName + Number +
| con
| PER0014.con
|
| Grant commands
| AppShortName + Number +
| grt
| PER0015.grt
|
<Up to Contents>
All source code should at least contain information about who wrote the code,
what it was intended to do, and when the code was created. Header information could look like:
REM Title: Administration of user grants.
REM File: Adm0033.SQL
REM Version: 1.0
REM
REM Description: This program is intended to be used for......
REM Author: Jerry Mattsson / Timehole.com.
REM
REM Create date: 5-apr-99
REM Modified date: 6-may-00
REM Modification purpose: Added functions in .....
REM Modified by: JM
Scripts that are intended to be used from or within a specific interface or program, should be documented
in accordance to that program standard. Like all other source code it should be version controlled and well
documented.
If the code is generated from a code generator, the generator can include comments, and history information
in the code. Enter the appropriate information in the tools Description, Notes, Modification History, etc.
This will then hopefully be used in the documentation part of the generated code.
Remember to put your modification information in the tool, and to reverse engineer the code when you are
modifying the code manually.
<Up to Contents>
The SQL language can be divided into:
DDL-statements - data definition language.
DML-statements - data manipulation language.
Control-statements - controls the environment or session.
DML-statements are used to process or manipulate data in the database. This type of operations should
be used in your application programs. The basic functions for this is:
SELECT - Retrieves data from the database
INSERT - Create data in the database
UPDATE - Change data in the database
DELETE - Remove data from the database
DDL-statements are used to create objects in the database, tables, indexes, procedures etc.
This type of operations should NOT be used in your application programs.
DDL-commands are:
CREATE - Create tables, views, procedures, tablespaces.....
ALTER - Changes objects like tables, database ....
DROP - Removes tables, views, procedures.....
GRANT - Gives rights to roles or users.
REVOKE - Removes rights from users or roles.
Remember that SQL-scripts containing DDL-statements to create or modify database objects, including
tables and views should be regarded as source code and should be treated as such.
Keep the valid DDL-code up-to-date and deliverable when new application code is produced.
<Up to Contents>
A transaction is a series of data manipulations that we want to regard and handle as one un dividable unit.
When we have done our changes to one or more tables or set of data, we issue a COMMIT, to finalize and store
the changes in the database.
When the COMMIT is accepted by the database, it is confirmed, with a success return code or a message
from the database. The transaction is only completed in it's whole or not at all.
Changes in the database can not be viewed by other users, until the transaction is committed.
The transaction starts at the last commit or full rollback.
Some tools starts the transaction as the first action after the log-on event, some explicitly set the first
transaction, after some initial work, or on user request. Check the tools behavior to be sure.
We always commit the total number of changes to the database, and if one part in this fails,
the transaction fails. If the first update in a transaction is possible, but the second one fails,
we can undo the changes made, with the ROLLBACK statement. This will back all the changes made since the
transaction began and reset data to the state it was when we started our transaction.
COMMIT; End Point for last transaction, start new transaction.
DELETE FROM Emp Delete a row in one table.
WHERE Empno = 7139;
DELETE FROM Dept; Delete of all rows by mistake
ROLLBACK; Undo to last commit.
This type of undo function can also be done to return to a earlier defined "bookmark", called savepoints.
This could be set somewhere after the start of the transaction and can then be used by us as a return point,
if things fails.
UPDATE Emp SET Sal = Sal + 10; Update before savepoint
SAVEPOINT BeforeDelete; Set a label to rollback to
DELETE FROM Dept Delete some row
WHERE Deptno = 10;
DELETE FROM Emp Delete some other row
WHERE Deptno = 10;
ROLLBACK To BeforeDelete; Regret the two previous deletes,
but keeps the update.
<Up to Contents>
DML-statements is transaction dependent requires that the user or program issues a commit or rollback
for the transaction, whenever the operation is regarded as complete.
DDL-statements have a number of specific properties that requires special database rights and resource
privileges, these should not normally be available to a application users.
DDL-statements are not transaction dependent. Each statement makes an implicit COMMIT.
This means that it is not reversible as a data manipulating statement is, and that if there are uncommitted
transactions pending, they will be committed when a DDL-statement is executed. DDL-statements should normally
not exist in the application code.
Do not use DDL-statements in application code.
If you do:
INSERT into emp ( empno, ename, sal, hiredate )
VALUES ( 7303, 'SMITH', 3201, '12-MAY-94');
/* and after this do a */
DROP table dept;
The drop command would commit the inserted row(s), when the drop command is issued. No rollback option exist
for the insert statement anymore. DML-commands should be the only type of statements that are used in the
application program code.
Exceptions to this is control statements:
SET Set the current role, or set the "transaction read only".
ALTER In the "alter session"; command, to set NLS-or trace- functions.
<Up to Contents>
Read consistency is an important concept in the Oracle database. Statement level read consistency
guarantee that the information from one operation is consistent, during the time the statement is executed.
The data that is acted upon or retrieved to us is the same set of data, and in the same state
as when we started the operation.
When we do a:
SELECT sum(sal) FROM large_emp;
the sum is the actual sum from the moment that we started the query.
It will not change during that operation, and it is not dependent on what other users do during the time
the statement is executing.
If another user is deleting all the rows:
DELETE FROM large_emp;
the information does not disappear in some strange way for the first user, because the other user(s)
operation was faster, or the priority was higher than ours. This is always guaranteed for each statement
that we execute, and the read consistency, by default works without any "read locking".
To have read consistency on consecutive statements you have to issue the command:
SET transaction read only;
SELECT 'some information' FROM any_table;
Delete from another_table;
Insert into any_table select * from backup_of_another_table;
COMMIT;
Read consistency does not cause any locking. It is done automatically but it requires the database to
save changed information in the rollback segments, and if the time or size of the reads and writes done
simultaneously are large, or the time between the statements is long, these has to hold the active information
longer and these areas have to be dimensioned properly.
Always consider the effect of the read-consistency functions in the database.
<Up to Contents>
The select statements are the most commonly used and a versatile SQL statement.
There are a number of basic types of select-statements, and you should be familiar with the different
variations and the functions that you can use. E.g. outer joins, subqueries, exist functions,
group by functions, having clause, union, intersect, minus etc.
Simple one table searches:
SELECT e.ename Give me the name of the person having
FROM emp e Empno equal to 7139
WHERE e.empno = 7139;
Simple Join searches:
SELECT e.ename Give me all employee names that has a
FROM emp e, dept d department registered to them.
WHERE e.deptno = d.deptno; Eliminate the ones that don't have one.
Outer Joins:
SELECT e.ename, e.empno Give me only the name of the employees that
FROM emp e, dept d not are assigned any department,
WHERE e.deptno = d.deptno(+) or the department has vanished from
AND d.deptno is null; the department table.
Subqueries:
SELECT e.ename Give me the name of the highest paid employee.
FROM emp e
WHERE e.sal =
( SELECT max(e2.sal )
FROM emp e2 );
Correlated Sub query:
SELECT e1.ename Give me all names of the persons that have
FROM emp e1 a higher salary than the average on their
WHERE e1.sal > own department.
( SELECT avg(e2.sal)
FROM emp e2
WHERE e1.deptno = e2.deptno );
Set-oriented queries:
SELECT e1.ename,e1.sal Give me all employees names and salaries that
FROM emp e1 are above the average,
WHERE e1.sal > but not those with .....
( SELECT avg(e2.sal)
FROM emp e2 )
MINUS
SELECT e3.ename, e3.sal
FROM emp e3
WHERE e3.sal >
( SELECT avg(e4.sal) + avg( e4.com)
FROM emp e4 );
These small examples of select statement shows that the basic functions can be combined and linked
with one another to solve quite complex problems or questions ( this also applies to "update,
delete and insert" statements). It is easy to understand that when the complexity and the number of
tables involved increase, it will be more important to give the database as much information as possible,
so that it will perform it's task as good and efficient as possible.
Try to perform as much as possible of the logical work in each SQL-statement.
<Up to Contents>
In addition to the standard SQL functions that the Rdbms can handle there are a number of functions
in the Rdbms. There are about a hundred predefined functions for string handling,
numeric and date conversions, calculations etc. Here are some examples of commonly used function,
that you may use in your SQL-statement:
decode - a "case-test" construction,
nvl - tests and replaces null values,
substr - extracts a part of a string,
instr - locates a part of a string,
add_months - adds a number of months to a date,
to_date - converts a string to a date,
to_char - converts dates and numbers to strings,
To_number - converts strings to numbers
These can be used to manipulate, change or test the data retrieved on rows that are processed.
It also makes it possible to perform further data manipulating at the data access layer than with
the "pure-SQL" and it could be used instead of putting this code in the application.
This will minimize both the the amount of code that are needed in the application program and the number
of round trips that the program has to do to access the database.
E.g. This database function eliminates leading spaces and gives us the initial character of the name:
DECLARE
-- a one character variable in your host program to receive the initial
l_EnameInitial varchar(1);
BEGIN
-- some more code .....
SELECT substr(ltrim(P.Ename,' '),1,1)
INTO l_EnameInitial
FROM Adm_Pers P
WHERE P.Empno = 1234;
The alternative should be to write the host application code to perform the string manipulating functions.
Receive the string, search it, and move the first character to a separate variable.
In Oracle Rdbms it is possible to define your own functions in PL/SQL or in JAVA, and use these in your SQL
statements in addition to the predefined ones.
<Up to Contents>
Automatic type conversion is done whenever it is needed within the Oracle Rdbms-kernel.
Conditions like:
WHERE numeric_column = '123'
works fine although this test does not compare data of the same data type.
The database automatically converts either one of the two values to a common data type before comparison.
But this functionality should not be used by default.
You should always try to do an explicit data conversion in your statements if it is required.
This is because we don't know for certain that the database applies the conversion on
the "right" element. The result is normally ok, but it could also mean that indexes are not
used and we cause the database to chose a slower access path than necessary.
Explicit conversion:
WHERE numeric_column = to_number('123');
or
WHERE date_col = to_date('19-may-94');
instead of implicit conversion:
WHERE date_col = '19-may-94'
or even worse, do not explicitly convert columns that could
use indexes in the search:
WHERE to_char(date_col,'yymmdd') = '951212'
Be sure that the data types you are comparing are of the same type or do the conversion before you
submit the code to the database, or do it explicitly in the statement.
<Up to Contents>
SQL statements are created to perform a specific operation. The same operation can be written in a lot of
different ways, good or bad and there are few limitations on how you can write your SQL.
It should be written both for ease of understanding and efficiency.
Format your SQL so that it will be easily read.
The more complex your SQL is becoming, the more important it will be to include comments in it.
Later it will be hard, to figure out what some large and complex SQL- statement actually
is supposed to do, especially for someone else than the author.
Use comments whenever it might be tricky to understand, e.g. when there are complex parts or were
the SQL-code is hard to get to work in the way you want it to do. Comment the SQL-code, whenever necessary.
We don't want code like:
SELECT substr(text, instr(text,'(')+1,
instr(text,')') -instr(text,'(')-1 ) FROM
text_table;
without any comment on what this is supposed to do or why.
Comments in SQL are done with C/C++ -like syntax:
/* comment start --- end after this */.
The previous code should look something like this:
SELECT substr (
text, /* search text for position of*/
instr(text,'(')+1, /* left parentheses */
instr(text,')')-
instr(text,'(')-1 /* and right parentheses */
) /* and return the string inside */
FROM text_table;
If there is any doubt or difficulty to identify the precedence of an expression this should be coded
with surrounding parentheses.
This is an example, with comments and other functionality, taken from the dictionary view ALL_CATALOG:
create or replace view ALL_CATALOG
(OWNER, TABLE_NAME, TABLE_TYPE) as
select
u.name,
o.name,
decode(o.type,
0, 'NEXT OBJECT', 1, 'INDEX', 2, 'TABLE',
3, 'CLUSTER', 4, 'VIEW', 5, 'SYNONYM', 6, 'SEQUENCE',
'UNDEFINED')
from sys.user$ u, sys.obj$ o
where o.owner# = u.user#
and o.type in (2, 4, 5, 6)
and o.linkname is null
and (o.owner# in (uid, 1) /* public objects */
or obj# in ( select obj# /* directly granted privileges */
from sys.objauth$
where grantee# in ( select kzsrorol from x$kzsro )
)
or ( o.type in (7, 8, 9) /* prc, fcn, pkg */
and exists
(select null from v$enabledprivs where priv_number = -144
/* EXECUTE ANY PROCEDURE */ )
)
or ( o.type in (2, 4, 5) /* table, view, synonym */
and exists
(select null from v$enabledprivs
where priv_number in
(-45 /* LOCK ANY TABLE */,
-47 /* SELECT ANY TABLE */,
-48 /* INSERT ANY TABLE */,
-49 /* UPDATE ANY TABLE */,
-50 /* DELETE ANY TABLE */)
)
)
or ( o.type = 6 /* sequence */
and exists
(select null from v$enabledprivs
where priv_number = -109 /* SELECT ANY SEQUENCE */)))
<Up to Contents>
All base SQL keywords should be written in upper case and indented for ease of readability.
Keywords should start on a new line.
Select statements should be written with the following keywords starting on a new line:
SELECT, INTO, FROM WHERE,
AND, OR, GROUP by, HAVING,
CONNECT by, FOR UPDATE of, ORDER by.
SELECT t1.name,
t2.name
INTO l_first_name,
l_second_name
FROM parties t1,
party_roles t2,
pr_types t3
WHERE t1.code = t3.code
AND t2.code = t3.code
AND t3.code = l_code;
Indentation and alignment should be used to make the code more readable.
Columns should be prefixed with the table alias, or otherwise with the full table name.
SELECT E.Name "Name",
D.Deptno "Department",
D.Location "Location",
E.Sal + nvl(Comm,0) "TotalSalary"
FROM Emp E, Dept D
WHERE E.Deptno = D.Deptno;
Fully qualify the columns in all SQL statements. If this is done the database does not have to search
all the columns in all tables to find out in which table the column exists.
Use table aliases, column aliases if necessary and always use table prefixing for columns.
Always specify the column list in a select statement. If this is done, it will later be possible
to add columns to the table, without re-coding existing applications.
Do not use the asterisk as a shorthand to select all columns in your SQL-code:
SELECT * FROM emp;
This might be used in dynamic programming or in ad-hoc situations, but normally not in your application code.
The number of columns returned, must be placed in a the same number of receiving host program variables.
If the table changes, the matching of columns against variables will fail.
There are exceptions, it does not apply when you declare %ROWTYPE in PL/SQL, or if you are writing dynamic
SQL code in some 3-GL. The efficiency and the resource utilization will also be better if you just select
the columns that you need.
Use column name references in the "ORDER BY" clauses, do not use column numbering reference
( unless it is necessary as in the case of unions ).
SELECT p.name, 'Dept:'||p.department
FROM people p
ORDER by
p.name,p.department; /* do not use 1,2 it will not use indexes */
<Up to Contents>
Insert statements should be written in a way that the following keywords starts on a new line:
INSERT, VALUES
Indentation and alignment should be used to make the code more readable.
Insert statements should always name the columns that the statement is inserting data into.
This will eliminate re-coding of the statement if the table is expanded with new columns or the table
is recreated with the columns in a different order. If this is done then you don't have to specify
null values for columns that is inserted with a null value or if a default value for the column
will be inserted.
INSERT INTO emp ( ename, sal, empno, deptno )
VALUES ( l_ename, l_sal, l_empno, l_deptno);
<Up to Contents>
Update statements should be written with the following keywords starting on a new line:
UPDATE, SET, WHERE, OR, AND
Indentation and alignment should be used to make the code more readable.
UPDATE emp e
SET e.sal = e.sal + 10
WHERE e.sal < 1000
AND e.comm is null;
Update and delete operations is usually done on single rows or on a subset of rows in the table.
This implies that there are some search criteria's and a where-clause in the statement. This also
implies that most statements of this kind should be optimized for ( read / search) performance as well
as the select-statements.
<Up to Contents>
Delete statements should be written in a way that the following keywords starts on a new line:
DELETE, WHERE, AND, OR, CASCADE
Indentation and alignment should be used to make the code more readable.
DELETE FROM emp e
WHERE e.empno = 7139
AND e.deptno = 10;
To perform this type of operation ( deletes and updates ), you should previously have locked the row(s)
with a locking statement. You should have fetched the row-address (the ROWID), and consequently you
can use that in your search criteria instead of a value based search criteria again, when deleting or
updating.
<Up to Contents>
The RDBMS have a number of locking functions. The most important reason is to protect data in the database,
changed by one user, from being changed or overwritten by another user.
All SQL-statements that change data in the database will cause locking. This can be implicit (automatic)
or manually declared and controlled.
Always try to use explicit ( manual ) locking in programs where data is changed and there ever might be
more than one process or user changing the same data. Always consider the locking strategy when you are
updating or deleting data.
Locks is acquired when these commands are issued to the database or when a cursor like these are opened:
SELECT ename
FROM emp WHERE empno = 7139
FOR UPDATE of sal;
UPDATE emp SET sal = sal + 10;
DELETE FROM emp WHERE empno = 7139;
INSERT INTO emp VALUES ( 1212,'Smith', 1212 );
LOCK TABLE emp IN EXCLUSIVE MODE NOWAIT;
The select, update and delete statements all creates exclusive row-locks on the rows that are hit by
the statement. There are both exclusive locks and shared locks. Exclusive locks prevents other user to
attain the same level of lock or a lower level of lock on the object. Shared locks may be attained by
several users at the same time, and prevents other users to attain exclusive locks on the same object
(or row).
<Up to Contents>
The database can provide and use locks both on the table level and on row level. Programs can attain
both types of locks. Your programs should not use or cause exclusive table locking if it can be avoided.
Table-locking should only be used after very thorough consideration of the consequences. If done this
would increase the possibility of conflicts and decrease the concurrency between users.
Always use row-locking.
All statement causes shared table locks, these can be held by several users at the same time.
These locks are put on the table to prevent the data definition to change during use.
Table locks may be exclusive, but normal data manipulation operation does not require this type of locks.
They could be placed on the table with the:
LOCK TABLE ..... statement,
but this should be avoided.
<Up to Contents>
The database automatically ensures the integrity of the "foreign key relation", by locking
the master-detail related data. This is to prevent the key values in the relation to change during
operations on the related keys. This locking is done on the column level, using the index tree.
The locking mechanism does not prevent any user to query the locked rows.
<Up to Contents>
Manual locking of the rows that are to be changed or deleted, should follow a common strategy among
the applications, working on a set of tables, or competing of the same data. This is to avoid different
locking strategies to cause concurrency problems. The coding should therefore be done in a streamlined
and consistent way, in a fashion like Oracle*Forms does:
Step 1, Search and display information ( optional):
SELECT e.rowid,
e.ename,
e.sal
INTO l_ename,
l_rowid,
l_rowid
FROM emp e
WHERE e.empno = l_empno1;
Step 2, Lock the row(s):
SELECT e.sal
INTO l_sal
FROM emp e
WHERE e.rowid = l_rowid
FOR UPDATE of e.sal NOWAIT;
Rowid is the actual physical address to the row. This is always the fastest way to access a single row,
because you actually know where the row is, through the address.
By default the database waits for a lock until it is available, or the user sends a break to cancel the
lock request. The NOWAIT option is not default.
The use of this option allows us to control and handle locking conflicts, as desired in our code.
If the locking statement fails, the database will return the ORA-0054 error, and a "record locked"
message.
Step 3, Update or delete the row:
UPDATE emp e
SET e.sal = e.sal + l_AddSal
WHERE e.rowid = l_rowid;
Step 4, Commit or rollback to end the transaction and to release the locks:
COMMIT;
If this is coded in PL/SQL or another 3-GL like language then these statements should rather be coded
with cursors and "cursor for loops" than as exemplified above.
The "cursor for loop" construct uses an internal addressing of the row, which is as fast as the rowid
address, but without any explicit fetching and storing of the address value.
From sample4.sql
/* This program modifies the ACCOUNTS table based on instructions
** stored in the ACTION table. Each row of the ACTION table .......
*/
DECLARE
CURSOR c1 IS
SELECT a.account_id,
a.oper_type,
a.new_value
FROM action a
ORDER BY a.time_tag
FOR UPDATE of status NOWAIT;
BEGIN
FOR acct IN c1 LOOP -- process each row one at a time
acct.oper_type := upper(acct.oper_type);
/* Process an UPDATE.
If account does not exist, create a new account */
IF acct.oper_type = 'U' THEN
UPDATE accounts a
SET a.bal = acct.new_value
WHERE a.account_id = acct.account_id;
IF SQL%NOTFOUND THEN -- account did not exist. Create it.
INSERT INTO accounts ( account_id, bal )
VALUES ( acct.account_id, acct.new_value);
UPDATE action a
SET a.status = 'Update: ID not found. Value inserted.'
WHERE current of c1;
ELSE
UPDATE action a
SET a.status = 'Update:Success.'
WHERE current of c1;
END IF;
END LOOP;
EXCEPTION
WHEN TIMEOUT_ON_RESOURCE THEN -- locking conflict
raise_application_error(-20101,'Record(s) locked by other users');
END;
<Up to Contents>
If the actual SQL-code is hidden from us, generated by some tool then we will not have control
over the way code is generated, how it looks, or how it is sent to the server to be executed.
We must therefore, try to gain information on how the access is done, and try to learn the important
aspects of the tools processing. This knowledge should then be used to write code with the tool and
from other interfaces, in a way that co-operates well with the used tools. All this should be
done to avoid conflicts (e.g. locking ) or resource exhausting on the server. This means that the tool
must generate efficient and well designed code, and that it is well documented so that it is easy to
understand and analyze.
It is important that all SQL statement used from different programs are written in the same way,
using exactly the same strings, to take advantage of the SQL-caching, done in the Rdbms kernel.
The two statements:
SELECT e.ename
FROM emp e
WHERE e.empno = 7139;
SELECT e.ename
FROM emp e
WHERE e.empno = 7138;
is regarded as two different statements, the second one will be parsed and analyzed by the Rdbms
in the same way as the first one and consume the same amount of both i/o and cpu-power.
Try to create and set up "libraries" of the access code and reference/use this code in your
program code. This will guarantee that the same string are used every time you execute a SQL-statement.
use parameter variables instead of strings if possible. The database will cache and reuse the
information it has on those SQL-statements and consume less resources.
Reuse the actual string of the SQL-statement to take benefit of the SQL-caching.
If the example above instead is coded:
SELECT e.ename
FROM emp e
WHERE e.empno = :l_empno;
this could be sent to the database with different bind variable values (l_empno is my host variable with
the value 7139 and 7138 at different times). This would then be regarded as only one SQL statement by
the database, eliminating the analyze and parse overhead the second and subsequent times it
is handled by the rdbms.
This also implies that dynamic SQL should be avoided and that all variable information should be
communicated via bind variables from your host programs.
<Up to Contents>
To create efficient applications we must write our SQL-operations as efficient as possible.
A lot of things influence the applications performance. The database-design, the table layout and the
construct of indexes. The design of PK- and FK- constraints and the physical data storage is also of
great importance. This together with how data is used and actually accessed at the data layer in the
database will decide how much power the application will use in the database server.
We have to consider the effect of all this in terms of memory usage, CPU-utilization and i/o efficiency.
Sending a SQL-statement to the database requires the database to analyze the statement in several steps,
for accuracy,
access rights,
and to optimize
that statement for fastest access path.
Analyzing, opening and closing of cursors is therefore a relative expensive process that requires both
memory-, i/o and CPU- resource. Access optimization of the SQL statements should always be done.
There are several ways to analyze the work done on the database side.
By using the explain command, or The database trace functions and program utilities.
We should use one of these functions to verify that the statements we are sending to the database
are using the expected indexes and that access times are in the expected and required ranges.
<Up to Contents>
You can use the explain command to examine each SQL access that you submit o the database and look at the
result and optimization. To do this you execute the command:
EXPLAIN PLAN SET statement_id = 'jm1'
INTO plan_table
FOR select * from emp where sal > 1000;
This will results in an access analyze from the database optimizer without actually executing the
statement. The result is placed in your plan_table table, and you may then read the result with a SQL-statement.
To use this function regularly you should set up procedures and reports to handle the information
efficiently or use Oracle's or some other commercially available tool that does this for you.
<Up to Contents>
The database have a couple of ways to turn on a SQL trace function and direct commands to turn this on.
One way to do this in your applications is with the SQL command:
ALTER SESSION SET sql_trace = true;
and you turn it off with:
ALTER SESSION SET sql_trace = false;
These commands should be possible to execute from any of your programs if needed.
You should always provide a way to turn on the trace function in the application program. Make sure that
a "TRACE" switch is provided ( in the same way ) in all program and if possible, in a similar
way in all program environments.
Oracle tools often provides a switch for this purpose, so no extra coding will be required.
This is sometimes the STATISTICS switch and can be found among the run options.
What happens is that the database writes a trace file on the server, on behalf of the user.
This file may then be analyzed in different ways and with different programs. As a traditionalist I like
the " tkprof "-program that is supplied by Oracle since version 4 or something. The tkprof-program
is normally executed on the database server, where the trace files are created. The program can give
different information, depending on how the database is set up, and depending on the server operating
system. It is also possible to ask tkprof to execute the explain-command for all SQL-statements in the
trace file. This is really a useful and the simplest way to do an application performance analysis.
Command line syntax for tkprof is:
tkprof tracefile.trc outfile.prf [sort=option] [explain=usr/pwd]
File naming and place of the trace file is system dependent, so ask your system administrator to set this
up and give you information on where and how they can be accessed.
<Up to Contents>
The database engine have developed significantly during the years and have had a number of different
changes to the optimizer. A RULE-based optimizer was the only optimizer in a number of earlier Oracle
versions, and a STATISTIC optimizer that was introduced with version 7 of the database.
The rule optimizer had a small number of rules that it applies to the SQL statement to determine the
access path. These rules are not obsolete and are sometimes used if forced to or if no other access
information is available. It these cases it will happen that several rules end up with the same
weight or priority, and that the rule that was found first will be applied. The effect is that the order
of table names in the from clause, and the order of the where conditions may be important when the rule
optimizer is used.
The statistic optimizer has a more extensive set of rules and does take into account the statistic
information stored with the tables and indexes in the database. Using this optimizer requires statistic
information to be collected with the "analyze" command. This also means that it is
important to have the right amount of data in the tables during development and testing.
<Up to Contents>
There are a lot of hints that can be given in the SQL-code to give instructions to the database how to
optimize the code. This is done with a special type of comment format in C or ADA style,
"*+ Start of hint ... until *"
or
"--+ Hint until end of line".
SELECT /*+ index ( i_empno ) */ e.ename
FROM emp e
WHERE e.empno = 7139;
If you think you know the best way to do the actual access to data and that this never will change,
then you may use hints. Otherwise it should be avoided. It could case more damage than good.
Here are some hints possibilities:
ALL_ROWS The Intention should be to retrieve
all rows as fast as possible.
AND_EQUAL (tab.idx, tab.idx) Merge index in the search.
CLUSTER (table) Read the table via the cluster.
FIRST_ROWS The intention is to retrieve the
first rows as soon as possible.
FULL (table) Do a full scan on table.
HASH (table) Do read with a hash-scan.
INDEX (tab.index) Read the table though this index
INDEX_ASC (tab.index) Read the table through this index,
ascending, if range-scan.
INDEX_DESC (tab.index) Read the table through this index,
descending, if range-scan.
ORDERED Use the order of the tables in
the from clause to perform join.
ROWID (table) Read with the tables row-address.
RULE Use the RULE-based optimizer.
USE_MERGE (table) Do a sort-merge-join
USE_NL (table) Use a nested loop to do the join.
Use your best judgment to decide when to let the Rdbms do all of the optimization and when you know the
best way to perform the access.
<Up to Contents>
Index are created for several reasons, primarily to increase performance, to give faster access to data
for key values that you use and to enforce uniqueness in the tables. The decisions for where, on what and
how the indexes should be created should be done in the design phase. The design tool that you use should
help you define most indexes that you need but you should always be prepared to examine and evaluate the
outcome manually.
Create index to enforce uniqueness and primary keys.
Create index on foreign key columns.
Create indexes on search items in the tables.
Create indexes on columns that has a good selective impact.
Try to keep the number of indexes few ( < 5 ) on tables that are frequently updated.
Defining and creating the right indexes can speed up retrieval operations. It may eliminate sort
operations. But if the number of rows that are retrieved via an index is greater than 20 - 30% of the
total number of rows, a full table scan is probably faster than a scan via one or more indexes.
Index degrade performance in the database on update, delete and insert operations. The more indexes that
the database has to be maintained on a table, the more power and i/o is required in operations that change
the index tree.
Indexes takes up database space. The index space is often as large as the actual data in the database.
Avoid creating indexes on columns that contains few different values.
<Up to Contents>
To get the fastest possible speed on searches in the database you must provide good search conditions that
the database may use in the optimization. You always have a number of ways to write your SQL-statements,
and choose whether you think that a join or a sub query is the best way for the database to access the data.
Different ways of writing conditions, although they might be similar in appearance will probably be treated
and optimized differently. Every time you create or change a new SQL-statement you have to take the access
performance consequences in consideration, what columns that are indexed,
what primary key's that exist on the table, how selective your search criteria is etc.
The database has a number of different ways to retrieve the rows requested by a SQL-statement.
In order of efficiency the most important are:
By ROWID.
By hash-key-value.
By index search.
By a full table scan.
The ROWID is the actual address to the row, so if that is known that is a retrieval
"without a search". This is a direct fetch of the row from a known place, and therefore the fastest way
to fetch a row from the database.
This can often be used in the secondary operations, update and deletes, in your applications.
The rowid is stored in the index, so when an index search is done, the index tree is searched for the index
value. When this is found the row address is used to retrieve the row with it's associated data.
When there is a hash cluster the key values is used to calculate the place that the row
should be stored or retrieved from. This approach does not need any i/o find the address of the row,
it is always calculated and will only use CPU-power. This type of search is only efficient when you
search for distinct values.
Full table scans ( reading a table in full, sequentially ) are the most efficient way
to retrieve data when more than 20 - 30% of the tables rows are to be retrieved. Index searches are most
effective when single rows are to be retrieved in an operation.
"Where-clauses" that causes extensive full table scans and other non-expected behaviors should
identified in the applications and should be corrected if not intended.
When searches are made for specific values there are several comparison types that you will use:
Equality:
Equality searches ( comparing values that should be equal to
a constant or a value in another table ) of indexed columns is very efficient and easy to use is:
WHERE index_column = FIXED_VALUE;
This will make it possible for the database to do a direct index scan search.
Concatenated indexes:
If concatenated indexes are used, make sure that all columns from the beginning of the index is given
in the search criteria's.
WHERE index_column1 = FIXED_VALUE1
AND index_column2 = FIXED_VALUE2;
This will use an index if the index is created with index_column1 first and index_column2 next.
If the index has a column index_column0 as the first part in the concatenated index the index can
not be used. Likewise you will not cause an index scan if you only refer to index_column2 in your
search criteria.
Null values:
Be especially careful and avoid search criteria like:
WHERE col_x is null
or
WHERE col_x is not null
This will often result in a full table scans, because no index entry can exists for a value that
does not exist.
Functions:
Do not apply functions on columns that are used in the search criteria's that should use
indexes. If done, the database will not be able to use the index, because the index value is
modified by the function, and a full table scan will be done. The possibility to create index
on functions are seldom used in my experience.
Avoid unnecessary sorting to gain performance in:
Set operations:
Set operations like "union, intersect, and minus"
will cause a sort/merge of the sets retrieved. Be sure that
the number of rows retrieved from each set operation is
small.
Distinct:
The "DISTINCT" operator should be avoided, because it will always result in a sort to
ensure that distinct values will be returned.
Order by and Group by:
Order by and group by will use indexes if the columns referenced in the list are indexed.
The same rules as for searches applies for these "sort" operations.
Correlated subqueries:
Be careful when using correlated subqueries. The subquery will be executed for each record
retrieved in the master query and could cause an extensive amount of database access.
<Up to Contents>
Network communication is an other factor to take into consideration. Every SQL-call causes a
number of packages to be sent to and from the program/client to the database server.
The more information the SQL-statement contains the more calls have to be made. The number of
columns in a select statement might therefore be important in this respect. The size of the
data is another factor. The larger the data is the more packages. If Oracle's array interface
is used then more rows can be transferred in fewer packages. One SQL-statement normally causes
15 - 20 calls to be executed. In an client/server environment it is essential to try to keep the
total number of SQL-statements as low as possible.
This also supports the argument that you should write more complex SQL-functions and try to do
as much work as possible in each call to the database. If you split up your database
access to several distinct select statements and send them one after one to the database,
perform some data manipulation in between and interact with the database a number of times
instead of writing a more complex SQL-statement the result is probably a much slower application.
A major performance gain can often be archived by this approach.
<Up to Contents>
One goal when you try to create efficient applications, should be to keep the total number of
different SQL operations as low as possible both for the reason explained above and for memory
reasons.
If information is needed from two or more tables, then it is usually more efficient to join these
tables in one SQL-statement than to code a number of separate SQL cursors read the information from
them and combine the data from them in the application.
The database tries to reuse the information it has about the database, and the SQL-statements
through a SQL-caching mechanism. When the database receives a SQL statement it compares this string
to all SQL statements already in use, to see if it have information on how to perform the
SQL-statement. If found it will use the parse and access information already in the cache.
Each SQL statement needs a number of memory data areas to communicate with the database.
The "cursor area" holds information about the SQL-statement and the data that are to be
transferred between the database and the application. Each SQL-statement has it's own data areas
and the total size is typically in the range of 5kb - 50kb. The more cursors that are active and
open the more memory we consume, both on the server side and on the client side.
This areas are released when the cursors is closed. If your application requires hundreds of
simultaneously open cursors, you are probably using a lot more system resources than expected,
and this might be constraining in a resource limited system. So by closing the cursors that are no
longer needed you free some memory, If you need to reopen the cursor then you will cause the database
to reevaluate the statement and use more cpu and memory again. If this is the case you are not
using the power of the SQL language optimally.
<Up to Contents>
Programs both in PL/SQL and other environments should be written in a way to minimize the number
of cursors used. This is to improve performance and optimize the internal communication.
To access data in the database one should try to declare and use explicit cursor declaration and
not implicit cursors.
Use as few cursors as possible.
Implicit cursors:
SELECT e.ename, d.dname
INTO l_ename, l_dname
FROM emp e, dept d
WHERE e.deptno = d.deptno;
Explicit cursor:
DECLARE
CURSOR cr_emp1 is
SELECT e.ename, d.dname
FROM emp e, dept d
WHERE e.detpno = d.deptno;
l_ename emp.ename%TYPE;
l_dname dept.dname%TYPE;
BEGIN
OPEN cr_emp1;
FETCH cr_emp1 into l_ename, l_dname;
CLOSE cr_emp1;
END;
Because SQL is a set oriented language it should assume that when an implicit cursor is executed
it is supposed to fetch a set of records. If you are coding with implicit cursors, and you only
expect to fetch only one row, the database will fetch one row from the table and return that
data to you. It will then again try to fetch another row, at least to be sure that there are no
more rows to be fetched. If there were, it would return the error code for "TOO MANY ROWS"
to you. This means that if the aim is to get only one or no record back to the program,
the implicit cursor will do two accesses in the database and more work than if you use an explicit
cursor and stopped fetching data after the first row is returned.
<Up to Contents>
All non obvious or very simple SQL statements must be analyzed and tested for performance,
security and accuracy of the result.
To be able to do realistic tests we need to test all functions in a production like database.
We also need realistic data and a user-like database identity. This identity should not have
more privileges than is intended for the user that is going to run the application. Either do
the test under a specific user identity, given to you by the database administrator, or let
the database administrator give you the same rights as is intended for the user or create a
separate user to own, create and manipulate objects and run all applications and do
development as a different user.
Views can be used to enforce security and limit the data a user may access.
Views might be seen as the users API to a system and is therefore an important part of an open
system.
They can act as a filter for the data in the database to make it easier to access.
There may be performance consequences if the views are created and used in a way that makes
the optimizer to perform separate operations to construct the view before the users
queries might be applied to the view.
<Up to Contents>
Use good tools and methods to design and verify your database design. A bad design is hard to
optimize in respect of performance. We should perform both a logical and a physical
database design. It is almost always true, that a strictly normalized database perform better in
a mixed and evolving system, than a poorly normalized database. Deviation from the
ultimate database design can improve performance for certain programs or a set of programs,
but the compromises should not be made until the logical design has been made, requirements
is known and all aspects of the relationships between data and it's keys has been established.
When the logical design is done, a physical implementation should be done. Design compromises that
affect the performance may be implemented in the physical implementation. Do not start to compromise
in the analyze phase ( logical database design ), to achieve physical performance gains. The physical
storage of data and index will influence performance on critical or large tables in the database.
Always provide a storage clause to the table specification and do not accept the default storage
parameters.
<Up to Contents>
The declarative constraints can replace much application code if used right. It will help you
control data in a very thoroughly way. It allows you to define a lot of checks and
consistency controls directly on the table. It may influence performance and increases concurrency
problems.
Be sure that primary-key- and foreign-key- constraint columns are indexed, and that the index are
constructed in the same way as the constraint declaration. The index created on the
constraint columns are used by the database in the process of securing the relational constraints.
If an index do not exist the database will issue a table lock, to avoid a scan of the
whole table, and this will decrease the concurrency of the system.
The primary- and unique- key constraint declarations creates an index automatically for you,
but the foreign key constraint does not.
Check constraint ( and not null ) constraints is less harmful in these aspects and should be defined
whenever possible.
<Up to Contents>
To test for accuracy use a reasonable scaled database, with a realistic spread of data and key values.
Be sure that your test database contains data, scaled to at least 10 - 30% of the expected amount
of data. This is especially important for tables that will be larger than a couple of hundred
of rows.