Nth Highest Salary and it's working process.

Nth Highest Salary and it's working process,AskHareesh Blog for OracleApps

 

To Find  2nd Highest Salary or Nth Highest Salary and it's working process.

Suppose that you are given the following simple database table called Employee that has 2 columns named Employee ID and Salary:Employee
Employee ID
Salary
3
200
4
800
7
450
Write aSQL query to get the second highest salary from the table above. Also write aquery to find the nth highest salary in SQL, where n can be any number.
Theeasiest way to start with a problem like this is to ask yourself a simplerquestion first. So, let’s ask ourselves how can we find the highest salary in atable? Well, you probably know that is actually really easy – we can just usethe MAX aggregate function:
select MAX(Salary) from Employee; Remember that SQL is basedon set theory
You shouldremember that SQL uses sets as the foundation for most of its queries. So, thequestion is how can we use set theory to find the 2nd highest salary in thetable above? Think about it on your own for a bit – even if you do not remembermuch about sets, the answer is very easy to understand and something that youmight be able to come up with on your own.
salary Figuring out the answer to find the 2nd highest
What if wetry to exclude the highest salary value from the result set returned by the SQLthat we run? If we remove the highest salary from a group of salary values,then we will have a new group of values whose highest salary is actually the2nd highest in the original Employee table.
So, if wecan somehow select the highest value from a result set that excludes thehighest value, then we would actually be selecting the 2nd highest salaryvalue. Think about that carefully and see if you can come up with the actualSQL yourself before you read the answer that we provide below. Here is a smallhint to help you get started: you will have to use the “NOT IN” SQL operator.
Solution to finding the 2nd highest salary in SQL
Now, hereis what the SQL will look like:
SELECTMAX(Salary) FROM Employee WHERE Salary NOT IN (SELECT MAX(Salary) FROM Employee)
 Running the SQL above would return us “450″,which is of course the 2nd highest salary in the Employee table.
Subscribeto our newsletter for more free interview questions.
An explanation of the solution
The SQLabove first finds the highest salary value in the Employee table using “(selectMAX(Salary) from Employee)”. Then, adding the “WHERE Salary NOT IN” in frontbasically creates a new set of Salary values that does not include the highestSalary value. For instance, if the highest salary in the Employee table is200,000 then that value will be excluded from the results using the “NOT IN”operator, and all values except for 200,000 will be retained in the results.
This nowmeans that the highest value in this new result set will actually be the 2ndhighest value in the Employee table. So, we then select the max Salary from thenew result set, and that gives us 2nd highest Salary in the Employee table. Andthat is how the query above works.
An alternative solution using the not equals SQL operator
We canactually use the not equals operator – the “<>” – instead of the NOT INoperator as an alternative solution to this problem. This is what the SQL wouldlook like:
selectMAX(Salary) from Employee WHERE Salary <> (select MAX(Salary) fromEmployee )
 Howwould you write a SQL query to find the Nth highest salary?
What wedid above was write a query to find the 2nd highest Salary value in theEmployee table. But, another commonly asked interview question is how can weuse SQL to find the Nth highest salary, where N can be any number whether it’sthe 3rd highest, 4th highest, 5th highest, 10th highest, etc? This is also aninteresting question – try to come up with an answer yourself before readingthe one below to see what you come up with.
The answer and explanation to finding the nth highest salaryin SQL
Here wewill present one possible answer to finding the nth highest salary first, andthe explanation of that answer after since it’s actually easier to understandthat way. Note that the first answer we present is actually not optimal from aperformance standpoint since it uses a subquery, but we think that it will beinteresting for you to learn about because you might just learn something newabout SQL. If you want to see the more optimal solutions first, you can skipdown to the sections that says “Find the nth highest salary without a subquery”instead.
The SQLbelow will give you the correct answer – but you will have to plug in an actualvalue for N of course. This SQL to find the Nth highest salary should work inSQL Server, MySQL, DB2, Oracle, Teradata, and almost any other RDBMS:
SELECT * /*Thisis the outer query part */ FROM EmployeeEmp1 WHERE (N-1) = ( /* Subquery starts here */ SELECT COUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHEREEmp2.Salary > Emp1.Salary)
How does the query above work?
The queryabove can be quite confusing if you have not seen anything like it before – payspecial attention to the fact that “Emp1″ appears in both the subquery (alsoknown as an inner query) and the “outer” query. The outer query is just thepart of the query that is not the subquery/inner query – both parts of thequery are clearly labeled in the comments.
The subquery is a correlated subquery
Thesubquery in the SQL above is actually a specific type of subquery known as acorrelated subquery. The reason it is called a correlated subquery is becausethe the subquery uses a value from the outer query in it’s WHERE clause. Inthis case that value is the Emp1 table alias as we pointed out earlier. Anormal subquery can be run independently of the outer query, but a correlatedsubquery can NOT be run independently of the outer query. If you want to readmore about the differences between correlated and uncorrelated subqueries youcan go here: Correlated vs Uncorrelated Subqueries.
The mostimportant thing to understand in the query above is that the subquery isevaluated each and every time a row is processed by the outer query. In otherwords, the inner query can not be processed independently of the outer querysince the inner query uses the Emp1 value as well.
Finding nth highest salary example and explanation
Let’s stepthrough an actual example to see how the query above will actually execute stepby step. Suppose we are looking for the 2nd highest Salary value in our tableabove, so our N is 2. This means that the query will look like this:
SELECT *FROM Employee Emp1 WHERE (1) = ( SELECT COUNT(DISTINCT(Emp2.Salary)) FROMEmployee Emp2 WHERE Emp2.Salary > Emp1.Salary)
You canprobably see that Emp1 and Emp2 are just aliases for the same Employee table –it’s like we just created 2 separate clones of the Employee table and gave themdifferent names.
Understanding and visualizing how the query above works
Let’s assume that we are using this data:
Employee
Employee ID
Salary
3
200
4
800
7
450
For the sake of our explanation,let’s assume that N is 2 – so the query is trying to find the 2nd highestsalary in the Employee table. The first thing that the query above does isprocess the very first row of the Employee table, which has an alias of Emp1.
The salaryin the first row of the Employee table is 200. Because the subquery iscorrelated to the outer query through the alias Emp1, it means that when thefirst row is processed, the query will essentially look like this – note thatall we did is replace Emp1.Salary with the value of 200:
SELECT * FROM Employee Emp1 WHERE (1) = ( SELECTCOUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHERE Emp2.Salary > 200)
So, whatexactly is happening when that first row is processed? Well, if you pay specialattention to the subquery you will notice that it’s basically searching for thecount of salary entries in the Employee table that are greater than 200.Basically, the subquery is trying to find how many salary entries are greaterthan 200. Then, that count of salary entries is checked to see if it equals 1in the outer query, and if so then everything from that particular row in Emp1will be returned.
Note thatEmp1 and Emp2 are both aliases for the same table – Employee. Emp2 is onlybeing used in the subquery to compare all the salary values to the currentsalary value chosen in Emp1. This allows us to find the number of salaryentries (the count) that are greater than 200. And if this number is equal toN-1 (which is 1 in our case) then we know that we have a winner – and that wehave found our answer.
But, it’sclear that the subquery will return a 2 when Emp1.Salary is 200, because thereare clearly 2 salaries greater than 200 in the Employee table. And since 2 isnot equal to 1, the salary of 200 will clearly not be returned.
So, what happensnext? Well, the SQL processor will move on to the next row which is 800, andthe resulting query looks like this:
SELECT * FROM Employee Emp1 WHERE (1) = ( SELECTCOUNT(DISTINCT(Emp2.Salary)) FROM Employee Emp2 WHERE Emp2.Salary > 800)
Sincethere are no salaries greater than 800, the query will move on to the last rowand will of course find the answer as 450. This is because 800 is greater than450, and the count will be 1. More precisely, the entire row with the desiredsalary would be returned, and this is what it would look like:
EmployeeIDSalary 7 450
It’s alsoworth pointing out that the reason DISTINCT is used in the query above isbecause there may be duplicate salary values in the table. In that scenario, weonly want to count repeated salaries just once, which is exactly why we use theDISTINCT operator.
A high level summary of how the query works
Let’s gothrough a high level summary of how someone would have come up with the SQL inthe first place – since we showed you the answer first without really goingthrough the thought process one would use to arrive at that answer.
Think ofit this way – we are looking for a pattern that will lead us to the answer. Oneway to look at it is that the 2nd highest salary would have just one salary thatis greater than it. The 4th highest salary would have 3 salaries that aregreater than it. In more general terms, in order to find the Nth highestsalary, we just find the salary that has exactly N-1 salaries greater thanitself. And that is exactly what the query above accomplishes – it simply findsthe salary that has N-1 salaries greater than itself and returns that value asthe answer.
Find the nth highest salary using the TOP keyword in SQLServer
We canalso use the TOP keyword (for databases that support the TOP keyword, like SQLServer) to find the nth highest salary. Here is some fairly simply SQL thatwould help us do that:
SELECT TOP1 Salary FROM ( SELECT DISTINCT TOP N Salary FROM Employee ORDER BY Salary DESC) AS Emp ORDER BY Salary To understand the query above, first look at thesubquery, which simply finds the N highest salaries in the Employee table andarranges them in descending order. Then, the outer query will actuallyrearrange those values in ascending order, which is what the very last line“ORDER BY Salary” does, because of the fact that the ORDER BY Default is tosort values in ASCENDING order. Finally, that means the Nth highest salary willbe at the top of the list of salaries, which means we just want the first row,which is exactly what “SELECT TOP 1 Salary” will do for us!
Find the nth highest salary without using the TOP keyword
There aremany other solutions to finding the nth highest salary that do not need to usethe TOP keyword, one of which we already went over. Keep reading for moresolutions.
Find the nth highest salary in SQL without a subquery
Thesolution we gave above actually does not do well from a performance standpoint.This is because the use of the subquery can really slow down the query. Withthat in mind, let’s go through some different solutions to this problem fordifferent database vendors. Because each database vendor (whether it’s MySQL,Oracle, or SQL Server) has a different SQL syntax and functions, we will gothrough solutions for specific vendors. But keep in mind that the solutionpresented above using a subquery should work across different database vendors.
Find the nth highest salary in MySQL
In MySQL,we can just use the LIMIT clause along with an offset to find the nth highestsalary. If that doesn’t make sense take a look at the MySQL-specific SQL to seehow we can do this:
SELECTSalary FROM Employee ORDER BY Salary DESC LIMIT n-1,1
Note thatthe DESC used in the query above simply arranges the salaries in descendingorder – so from highest salary to lowest. Then, the key part of the query topay attention to is the “LIMIT N-1, 1″. The LIMIT clause takes two arguments inthat query – the first argument specifies the offset of the first row toreturn, and the second specifies the maximum number of rows to return. So, it’ssaying that the offset of the first row to return should be N-1, and the maxnumber of rows to return is 1. What exactly is the offset? Well, the offset isjust a numerical value that represents the number of rows from the very firstrow, and since the rows are arranged in descending order we know that the rowat an offset of N-1 will contain the (N-1)th highest salary.
Find the nth highest salary in SQL Server
In SQLServer, there is no such thing as a LIMIT clause. But, we can still use theoffset to find the nth highest salary without using a subquery – just like thesolution we gave above in MySQL syntax. But, the SQL Server syntax will be abit different. Here is what it would look like:
SELECT Salary FROM Employee ORDER BY Salary DESC OFFSET N-1ROW(S) FETCH FIRST ROW ONLY
Note thatI haven’t personally tested the SQL above, and I believe that it will only workin SQL Server 2012 and up. Let me know in the comments if you notice anythingelse about the query.
Find the nth highest salary in Oracle using rownum
Oraclesyntax doesn’t support using an offset like MySQL and SQL Server, but we canactually use the row_number analytic function in Oracle to solve this problem.Here is what the Oracle-specific SQL would look like to find the nth highestsalary:
select * from ( select Emp.*, row_number() over (order bySalary DESC) rownumb from Employee Emp ) where rownumb = n;
/*n is nthhighest salary*/
The firstthing you should notice in the query above is that inside the subquery thesalaries are arranged in descending order. Then, the row_number analyticfunction is applied against the list of descending salaries. Applying therow_number function against the list of descending salaries means that each rowwill be assigned a row number starting from 1. And since the rows are arrangedin descending order the row with the highest salary will have a 1 for the rownumber. Note that the row number is given the alias rownumb in the SQL above.
This meansthat in order to find the 3rd or 4th highest salary we simply look for the 3rdor 4th row. The query above will then compare the rownumb to n, and if they areequal will return everything in that row. And that will be our answer!
Find the nth highest salary in Oracle using RANK
Oraclealso provides a RANK function that just assigns a ranking numeric value (with 1being the highest) for some sorted values. So, we can use this SQL in Oracle tofind the nth highest salary using the RANK function:
select * FROM ( select EmployeeID, Salary ,rank() over(order by Salary DESC) ranking from Employee ) WHERE ranking = N;
 The rank function will assign a ranking toeach row starting from 1. This query is actually quite similar to the one wherewe used the row_number() analytic function, and works in the same way as well.
We’ve nowgone through many different solutions in different database vendors likeOracle, MySQL, and SQL Server. Hopefully now you understand how to solve aproblem like this, and you have improved your SQL skills in the process! Besure to leave a comment if you have any questions or observations.
*/