Introduction
This document covers SQL best practices for developers who are planning work on SQL performance improvement and these recommendations may not apply to all kinds of databases.
Efficient use of the WHERE clause
To use the WHERE clause efficiently:
- Always use good filters. Ensure that such predicates or filters are indexed. Identify good filters to add based on the requirement to reduce execution time and resources, for example, blocks to read.
- Ensure to have good filter columns. A column can be said to be a good filter column if it has good Selectivity, in other words, the column, which has more distinct values.
- Ensure to have good predicates. Each added good predicate further filters your result set and improves performance.
- Always use bind(:) variables in the queries. Bind variables reduce the optimizer time in preparing execution plans and parsing time for subsequent executions.
Selectivity
Selectivity is the value between 0 and 1, and it is the fraction of rows returned after applying a filter on the table. For example, if a table has 10,000 rows and the query returns 2601 rows, the selectivity would be 2601/10000 or .26 or 26 percent. Selectivity enables you (or the optimizer) to decide which data access method is optimum in the execution plan.
Joins
Recommendations for selecting a better join
Following are general recommendations for selecting a better join:
Avoid using an outer join to a view – it can lead to a non mergeable view.
Avoid using concatenation and leading wildcard search on columns.
Avoid COUNT SQL queries to display result set counts in the user interface.
Check for implicit or incorrect data type conversions.
Avoid using explicit data conversion in joins and predicates.
Avoid using functions (for example, instead of NVL or DECODE, try COALESCE) and ORs on columns used in joins or used for selective filter predicates.
Recommendations for selecting the best join order
When SQL statements include two or more tables:
The order in which you join the tables is extremely important. The column returned most from a table must be the first table in a join.
Compare various join orders and select the best one after considering the number of rows returned by each join order.
If a join is only used to filter, change it to an EXISTS statement in the WHERE clause. Do not join in the WHERE clause – do all table joins in the JOIN clause.
Be careful of table restrictions – SAP Hana filters WHERE conditions at a different time than join ON conditions.
Inbuilt SQL functions
The database engine mostly ignores the index on the column if SQL functions are used in the WHERE clause on that column (for example, SUBSTR, INSTR, TO_DATE, UPPER, LOWER, and TO_NUMBER). Avoid creating functional-based indexes as much as possible because they cause DML overhead. Instead, look for alternatives, such as query rewrite.
Example
In this example, though both queries are functionally the same, the rewritten query avoids the to_char function and therefore performs better than the original query.
SQL> CREATE TABLE TBL1(A TIMESTAMP);
Table created.
SQL> INSERT INTO TBL1(A) VALUES(to_date('2015-03-01T15:39:00', 'YYYY-MM-DD"T"HH24:MI:SS'));
1 row created.
SQL> INSERT INTO TBL1(A) VALUES(to_date('2020-11-13T15:39:00', 'YYYY-MM-DD"T"HH24:MI:SS'));
1 row created.
SQL> create index idx1 on TBL1(A);
Index created.
IN vs. EXISTS conditions
An IN condition is a membership condition. It tests a value for membership in a list of values or a subquery.
An EXISTS condition tests for existence of rows in a subquery and is used in combination with a subquery. It is considered "to be met" if the subquery returns at least one row.
When to use IN and EXISTS
When the result or record count of the subquery (inner query) is small, the IN condition is more efficient. IN is for big outer queries and small inner queries. When the result or record count of the subquery (inner query) is huge, the EXISTS condition is more efficient. EXISTS is for smaller outer queries and bigger inner queries.
Example
Display department details in which employees are working. Do not display department details if no employee is working in it.
Query using EXISTS
select d.deptno from dept d where exists (select 1 from emp e where d.deptno = e.deptno);
Indexing
Indexing helps retrieve data as quickly as possible. Ensure that you always refer to database coding standards before creating indexes.
When and what to index
- Create as few indexes as possible but as many as you need. For example, if there are four columns in the table, you need not have four indexes.
- There is always a trade off between querying and DML. Check to see if the index exists on the table already. You might only need to improve it.
- Avoid creating more than one index that starts with the same column. That will confuse the optimizer, and it may not use the index you want it to use.
- Column order matters in an index. Ensure that the most selective columns appear first.
- Indexes are based on the WHERE clause.
- SQL Server likes to have the table joins in the index as well, but they must be after the columns in the WHERE clause.
- SQL Server likes to have ORDER BY and GROUP BY columns in the index, but they must be after the columns in the WHERE clause or can be put into the INCLUDE part of the index.
- SQL Server is the only database to use CLUSTER and INCLUDE. Use both options with care.
Index the columns, which are highly selective. See Selectivity.
Index all important foreign keys.
Index the columns used in table joins and predicates.
Guidelines and common pitfalls
Following are the common guidelines and pitfalls to consider for better query rewrite.
Always follow database coding standards.
Filter as early possible and then, join to the tables exploding the result set.
Always use column names, not SELECT * or INSERT INTO without column names.
Use table aliases when there is more than one table in the SQL query.
Use column aliases only when necessary (such as grouping columns).
Do not create too many indexes or too many single column indexes.
Verify that DISTINCT is necessary.
Check to see if the joins should be EXISTS.
Verify whether the code can handle duplicates.
Do not keep the main filter in the OR clause.
Do not keep an outer join on the main or leading filter table.
Join columns of the same data type.
Remove unnecessary outer joins.
Try to remove tables without any columns selected.