Q: What are window functions, and when would you use them?

Window functions perform calculations across a set of rows related to the current row without collapsing them (unlike GROUP BY). Common functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER(), AVG() OVER(). Use them for running totals, rankings, row numbering, and comparing a row to its neighbors.

Q: How would you optimize a slow SQL query?

Run EXPLAIN ANALYZE to see the query plan. Look for sequential scans on large tables (add indexes), high row estimates vs. actuals (update statistics), nested loops on large datasets (consider hash joins). Other fixes: avoid SELECT * (select only needed columns), rewrite subqueries as JOINs, add covering indexes, and check for lock contention.

Q: What is a Common Table Expression (CTE), and when is it useful?

A CTE (WITH clause) defines a temporary named result set that you can reference within the main query. It improves readability for complex queries with multiple steps. Recursive CTEs can traverse hierarchical data (org charts, category trees). Note: in some databases, CTEs are optimization fences (not inlined), so check your database's behavior for performance-critical queries.

Q: Explain database normalization. What are the first three normal forms?

Normalization reduces data redundancy. 1NF: each column holds atomic values (no arrays or repeated groups). 2NF: 1NF plus every non-key column depends on the entire primary key (not just part of it). 3NF: 2NF plus no non-key column depends on another non-key column (eliminate transitive dependencies). Denormalization is sometimes appropriate for read-heavy workloads.

Q: What is the difference between a clustered and non-clustered index?

A clustered index determines the physical order of data in the table (there can be only one). A non-clustered index is a separate structure with pointers back to the data rows (there can be many). In PostgreSQL, the equivalent of a clustered index is CLUSTER command, though it does not maintain order automatically. Primary keys are typically clustered indexes.

Q: Write a query to find employees who have a higher salary than their manager.

Self-join: SELECT e.name, e.salary, m.name AS manager_name, m.salary AS manager_salary FROM employees e JOIN employees m ON e.manager_id = m.id WHERE e.salary > m.salary. This joins the table to itself, matching each employee with their manager, then filters where the employee earns more.

Question 1

What is the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN?

Accepted Answer

INNER JOIN returns only matching rows from both tables. LEFT JOIN returns all rows from the left table plus matching rows from the right (NULLs where no match). RIGHT JOIN is the mirror of LEFT JOIN. FULL OUTER JOIN returns all rows from both tables, with NULLs on the side that has no match. Draw a Venn diagram to visualize.

Question 2

Explain the difference between WHERE and HAVING clauses.

Accepted Answer

WHERE filters rows before grouping (operates on individual rows). HAVING filters groups after GROUP BY (operates on aggregate results). Example: WHERE salary > 50000 filters individual employees; HAVING AVG(salary) > 50000 filters departments whose average salary exceeds 50K. You cannot use aggregate functions in WHERE.

Question 3

Write a query to find the second-highest salary in a table.

Accepted Answer

Several approaches: SELECT MAX(salary) FROM employees WHERE salary < (SELECT MAX(salary) FROM employees). Or use DENSE_RANK() window function: SELECT salary FROM (SELECT salary, DENSE_RANK() OVER (ORDER BY salary DESC) as rnk FROM employees) sub WHERE rnk = 2. The window function approach generalizes to Nth highest.

Question 4

What are window functions, and when would you use them?

Accepted Answer

Window functions perform calculations across a set of rows related to the current row without collapsing them (unlike GROUP BY). Common functions: ROW_NUMBER(), RANK(), DENSE_RANK(), LAG(), LEAD(), SUM() OVER(), AVG() OVER(). Use them for running totals, rankings, row numbering, and comparing a row to its neighbors.

Question 5

How would you optimize a slow SQL query?

Accepted Answer

Run EXPLAIN ANALYZE to see the query plan. Look for sequential scans on large tables (add indexes), high row estimates vs. actuals (update statistics), nested loops on large datasets (consider hash joins). Other fixes: avoid SELECT * (select only needed columns), rewrite subqueries as JOINs, add covering indexes, and check for lock contention.

Question 6

What is a Common Table Expression (CTE), and when is it useful?

Accepted Answer

A CTE (WITH clause) defines a temporary named result set that you can reference within the main query. It improves readability for complex queries with multiple steps. Recursive CTEs can traverse hierarchical data (org charts, category trees). Note: in some databases, CTEs are optimization fences (not inlined), so check your database's behavior for performance-critical queries.

Question 7

Explain database normalization. What are the first three normal forms?

Accepted Answer

Normalization reduces data redundancy. 1NF: each column holds atomic values (no arrays or repeated groups). 2NF: 1NF plus every non-key column depends on the entire primary key (not just part of it). 3NF: 2NF plus no non-key column depends on another non-key column (eliminate transitive dependencies). Denormalization is sometimes appropriate for read-heavy workloads.

Question 8

What is the difference between a clustered and non-clustered index?

Accepted Answer

A clustered index determines the physical order of data in the table (there can be only one). A non-clustered index is a separate structure with pointers back to the data rows (there can be many). In PostgreSQL, the equivalent of a clustered index is CLUSTER command, though it does not maintain order automatically. Primary keys are typically clustered indexes.

Question 9

Write a query to find employees who have a higher salary than their manager.

Accepted Answer

Self-join: SELECT e.name, e.salary, m.name AS manager_name, m.salary AS manager_salary FROM employees e JOIN employees m ON e.manager_id = m.id WHERE e.salary > m.salary. This joins the table to itself, matching each employee with their manager, then filters where the employee earns more.

Question 10

What is a deadlock in a database, and how do you prevent it?

Accepted Answer

A deadlock occurs when two transactions each hold a lock the other needs, creating a circular wait. The database detects this and kills one transaction. Prevent by: acquiring locks in a consistent order, keeping transactions short, using lower isolation levels when possible, and avoiding user interaction within transactions. Monitor deadlock frequency in your database logs.

Question 11

Explain the difference between UNION and UNION ALL.

Accepted Answer

UNION combines result sets and removes duplicates (performs a sort/hash for deduplication). UNION ALL combines result sets without removing duplicates (faster because no dedup step). Use UNION ALL when you know there are no duplicates or when duplicates are acceptable. In practice, UNION ALL is used more often because it is more performant.

Question 12

How do transactions work, and what are the ACID properties?

Accepted Answer

A transaction groups multiple operations into an atomic unit. ACID: Atomicity (all or nothing), Consistency (data stays valid after the transaction), Isolation (concurrent transactions do not interfere), Durability (committed data survives crashes). Isolation levels (read uncommitted, read committed, repeatable read, serializable) trade isolation strength for concurrency performance.

SQL interview questions

1.What is the difference between INNER JOIN, LEFT JOIN, RIGHT JOIN, and FULL OUTER JOIN?

2.Explain the difference between WHERE and HAVING clauses.

3.Write a query to find the second-highest salary in a table.

4.What are window functions, and when would you use them?

5.How would you optimize a slow SQL query?

6.What is a Common Table Expression (CTE), and when is it useful?

7.Explain database normalization. What are the first three normal forms?

8.What is the difference between a clustered and non-clustered index?

9.Write a query to find employees who have a higher salary than their manager.

10.What is a deadlock in a database, and how do you prevent it?

11.Explain the difference between UNION and UNION ALL.

12.How do transactions work, and what are the ACID properties?

Prepare further

More interview topics