QUERY OPTIMIZATION AND PROCESSING- THEORY

QUERY PROCESSING AND OPTIMIZATION

Query processing involves extracting data from a database through multiple steps.

This includes translating high-level queries into low-level expressions at the file system's physical level, optimizing queries, and executing them for actual results.

Query Processing

One of the types of DBMS is Relational Database Management System (RDBMS) where data is stored in the form of rows and columns (in other words, stored in tables) which have associations with each other.

The users (both applications and humans) have the liberty to select, insert, update and delete these rows and columns without violating the constraints provided at the time of defining these relational tables.

Let’s say you want the list of all the employees who have a salary of more than 10,000.

SQL Query

SELECT

emp_name

FROM

employee

WHERE

salary>10000;

The problem here is that DBMS won't understand this statement.

SQL being a High-Level Language makes it easier not just for the users to query data based on their needs but also bridges the communication gap between the DBMS which does not really understand human language.

In fact, the underlying system of DBMS won't even understand these SQL queries.

For them to understand and execute a query:

i) they first need to be converted to a Low-Level Language.

ii) then SQL queries go through a processing unit that converts them into low-level Language via Relational Algebra in DBMS.

Since relational algebra queries are a bit more complex than SQL queries, DBMS expects the user to write only the SQL queries.

It then processes the query before evaluating it.

Phases of Query Processing in DBMS

As mentioned in the above image, query processing can be divided into compile-time and run-time phases.

Compile-time phase includes:

Parsing and Translation (Query Compilation)
Query Optimization
Evaluation (code generation)

In the Runtime phase, the database engine is primarily responsible for interpreting and executing the hence generated query with physical operators and delivering the query output.

Parsing and Translation

The first step in query processing is Parsing and Translation.

In the first step, queries undergo lexical, syntactic, and semantic analysis. Essentially, the query gets broken down into different tokens and white spaces are removed along with the comments (Lexical Analysis).

In the next step, the query gets checked for the correctness, both syntax and semantic wise. The query processor first checks the query if the rules of SQL have been correctly followed or not (Syntactic Analysis).

Finally, the query processor checks if the meaning of the query is right or not. Things like if the table(s) mentioned in the query are present in the DB or not? if the column(s) referred from all the table(s) are actually present in them or not? (Semantic Analysis)

Once the above mentioned checks pass, the flow moves to convert all the tokens into relational expressions, graphs, and trees. This makes the processing of the query easier for the other parsers.

Let's consider a query :

SQL Query:

SELECT

emp_name

FROM

employee

WHERE

salary>10000;

The above query would be divided into the following tokens: SELECT, emp_name, FROM, employee, WHERE, salary, >, 10000.

The tokens (and hence the query) get validated for

The name of the queried table is looked into the data dictionary table.
The name of the columns mentioned (emp_name and salary) in the tokens are validated for existence.
The type of column(s) being compared have to be of the same type (salary and the value 10000 should have the same data type).

The next step is to translate the generated set of tokens into a relational algebra query. These are easy to handle for the optimizer in further processes.

∏ emp_name (σ salary > 10000)

Query Tree

A Query Tree is a tree data structure that represents the hierarchical structure of a database query. It is primarily used in query processing and optimization in databases.

Leaf nodes represent relations (tables).
Internal nodes represent relational algebra operations like selection (σ), projection (π), join (⨝), etc.

It helps in visualizing how a query will be executed step-by-step and is used by database systems to optimize queries.

Relations (Tables) → at the bottom (leaves)

Operators → in internal nodes:

σ: Selection (WHERE clause)
π: Projection (SELECT clause)
⨝: Join (JOIN clause)
×: Cartesian product
∪, −, ∩: Set operations (UNION, DIFFERENCE, INTERSECTION)

Example 1:

SQL Query:

SELECT name

FROM Student

WHERE age > 20;

Relational Algebra Equivalent:

π_name (σ_age > 20 (Student))

Query Tree:

π_name

σ_age > 20

Student

Example 2:

SQL Query

SELECT S.name, D.dept_name

FROM Student S, Department D

WHERE S.dept_id = D.id AND S.age > 20;

Relational Algebra Equivalent:

π_S.name, D.dept_name (σ_S.age > 20 (Student ⨝ Student.dept_id = Department.id Department))

Query Tree

π_S.name, D.dept_name

σ_S.age > 20

⨝_S.dept_id = D.id

/ \

Student Department

Example 3:

SQL Query

SELECT e.name, d.name

FROM Employee e, Department d

WHERE e.dept_id = d.dept_id;

Relational Algebra Equivalent

π_e.name, d.name (Employee ⨝_e.dept_id = d.dept_id Department)

Query Tree

π_e.name, d.name

⨝_e.dept_id = d.dept_id

/ \

Employee Department

Example 4:

SQL Query

SELECT e.name

FROM Employee e, Department d

WHERE e.dept_id = d.dept_id AND d.location = 'Kathmandu';

Relational Algebra Equivalent

π_e.name (Employee ⨝_e.dept_id = d.dept_id (σ_location = 'Kathmandu' (Department)))

Query Tree

π_e.name

⨝_e.dept_id = d.dept_id

/ \

Employee σ_location = 'Kathmandu'

Department

Query Evaluation

Once the query processor has the above-mentioned relational forms with it, the next step is to apply certain rules and algorithms to generate a few other powerful and efficient data structures.

These data structures help in constructing the query evaluation plans.

For example, if the relational graph was constructed, there could be multiple paths from source to destination.

A query execution plan will be generated for each of the paths.

i) One way could be first projecting followed by selection (on the right).

ii) Another way would be to do selection followed by projection (on the left).