![]() |
|
|
By: Eli Leiba | Read Comments (11) | Print Eli is a senior application DBA, a teacher and a senior database consultant with 19 years of RDBMS experience. Related Tips: More |
|
The standard SQL language has a number of aggregate functions like: SUM, MIN, MAX, AVG, but a common statistics function that SQL Server does not have is a built-in aggregate function for median. The median is the value that falls in the middle of a sorted resultset with equal parts that are smaller and equal parts that are greater. Since there is no built-in implementation for the median, the following is a simple solution I put together to find the median.
The general solution I'm suggesting here uses a stored procedure I created called dbo.sp_calc_median. The need I had was to just find the median value for various resultsets, so I put together this stored procedure that could be used for just about anything. The procedure takes a table name and a column name to calculate the median value for any data type.
The procedure builds an ordered temporary table of the column values and by using a dynamic scrollable cursor, scrolls to the middle of the result set and outputs the result.
Here is the stored procedure:
create procedure dbo.sp_calc_median
(@tablename varchar(50),
@columnname varchar(50),
@result sql_variant OUTPUT)
as
begin
declare @sqlstmt varchar(200)
declare @midCount int
set nocount on
set @sqlstmt = 'insert #tempmedian select ' + @columnname +
' from ' + @tablename + ' order by 1 asc '
create table #tempmedian (col sql_variant)
exec (@sqlstmt)
declare c_med cursor scroll for select * from #tempmedian
select @midCount = round ( count(*) * 0.5,0 ) from #tempmedian
open c_med
fetch absolute @midCount from c_med into @result
close c_med
deallocate c_med
drop table #tempmedian
end
goHere is small sample data set and sample executions, so it is easy to see that this is working.
--sample table CREATE TABLE [dbo].[TestTable]( [id] [int] IDENTITY(1,1) NOT NULL, [testID] [int] NULL, [testName] [varchar](50) NULL, [testDate] [date] NULL, CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED ([id] ASC) ) -- sample data INSERT INTO TestTable (testID, testName, testDate) SELECT 1, 'Dave', '2000-11-01' UNION SELECT 2, 'Mike', '1995-01-11' UNION SELECT 3, 'Sue' , '1965-07-14' UNION SELECT 4, 'Jill', '2001-03-07' UNION SELECT 5, 'Abe' , '2005-09-13' -- sample run DECLARE @result sql_variant EXEC dbo.sp_calc_median 'dbo.TestTable', 'testID', @result OUTPUT SELECT @result EXEC dbo.sp_calc_median 'dbo.TestTable', 'testName', @result OUTPUT SELECT @result EXEC dbo.sp_calc_median 'dbo.TestTable', 'testDate', @result OUTPUT SELECT @result -- output from above run 3 Jill 2000-11-01
With this small dataset it is easy to see how these values are determined as shown below where the values are sorted.
Here is another sample run I did against the Northwind database. This table has more data than the sample above, but works just the same. Also note that I am passing in the database name along with the table, so this stored procedure could be created in one database on your SQL Server instance and used for any database and table.
DECLARE @res sql_variant -- declaring a sql_variant column EXEC sp_calc_median 'northwind..products','productname',@res OUTPUT PRINT convert(nvarchar,@res) EXEC sp_calc_median 'northwind..products','UnitPrice',@res OUTPUT PRINT convert(real,@res) -- output from above run Maxilaku 19.5
| Share: | Share | Tweet |
|
![]() |
![]() |
Free SQL Server Learning |
| Monday, October 31, 2011 - 8:26:21 AM - BJ | Read The Tip |
|
Thanks. This is a simple option. |
|
| Monday, October 31, 2011 - 11:23:04 AM - David Morton | Read The Tip |
|
This is a nifty piece of code but I can see one small issue with it. It works great for odd numbered sets of data but in the case of even numbered sets of data, you would want to take the two middle values and average them together. All said, if you don't need that level of granularity, then this works great. In the perfect world, if we had an even number set of data and the two center values were different, such as a set of data that has 50 elements: row 25: 50.00 row 26: 70.00 The actual median value would be (50 + 70) / 2 = 60. If the set of data had 49 elements then the median value would have been the 25th element or 50. Here is an example - I used a #tmp1 table to store my intermediate results. I could have used the ID Column because it exists but what if we didn't have an ID table or the values where not linear? Remember, Medians are calculated on an ordered set of values regarless of their order in the parent table. I also used the original code and modified the TestId values. CREATE TABLE [dbo].[TestTable]( ([id] ASC) ) UNION
Select Select @cntr = count(*) from TestTable Select @cntr2 = round ( count(*) * 0.5,0 ) from testtable set @cntr3 = @cntr2 + 1
--***** Play with this. Try commenting out the last Union and Insert Statement to create an even number set of data and you will see what -- I am talking about.
Cheers
|
|
| Monday, October 31, 2011 - 11:37:37 AM - Jeremy Kadlec | Read The Tip |
|
David, Thank you for the post with the example code and data. Thank you, |
|
| Tuesday, November 08, 2011 - 12:26:16 AM - Brahm | Read The Tip |
|
Hi Good and effectve solution.
Just not that the code will work correctly if count(*) is uneven. Remember that one has to get the average between the middle two terms if count(*) is even |
|
| Tuesday, November 08, 2011 - 12:27:30 AM - Brahm | Read The Tip |
|
Thanks David Morton for your solution.
|
|
| Tuesday, November 08, 2011 - 3:08:12 AM - Flemming Thor Hansen | Read The Tip |
|
Much faster and direct method without cursors. -- Calculate MEDIAN of FieldName in Table TableName |
|
| Tuesday, November 29, 2011 - 11:41:13 AM - Scott C | Read The Tip |
|
I agree that it is much better to solve this problem without resorting to temp tables or cursors. I have some minor quibbles however. Most aggregate functions should explicitly exclude NULL values. The main query is repeated in the SELECT COUNT(*) subquery, which could be ***bersome if it is more complex than a simple one-table SELECT. Using common table expressions avoids having to repeat the main query. This query has the same execution plan as the previous one, the only difference is style. WITH Domain AS ( |
|
| Tuesday, November 29, 2011 - 12:28:13 PM - David Morton | Read The Tip |
|
Wow! Nice code examples! I have to agree that the CTE is much better than using cursors. Cursors are my bane and slowly I'm coming around to using CTE's, but it's a slow process for me! :)
Thanks for the updated code.
Dave
|
|
| Wednesday, November 30, 2011 - 12:29:33 AM - BrahmB | Read The Tip |
|
Hi Flemming Thor Hansen and Scott C This is really helpful - about 2 and a half years ago I did the Median with Temp Tables, where I calculate the median over different categories. I even started to think of rather converting it to a .NET assembly. Your solutions really addresses the requirement regarding the even and uneven row counts. Great solutions. Scott C - Your last query is to to point - have not seen anything like this yet. Just one word to both of you guys -WOW! Kind regards Brahm |
|
| Wednesday, November 30, 2011 - 12:37:45 AM - BrahmB | Read The Tip |
|
Just one thing - remember that one must sort the results on the field that one calculates the MEDIAN on. SQL 2005 and further does not allow one to ORDER BY on derived tables and views. Will play with your suggested solutions. regards Brahm
|
|
| Wednesday, November 30, 2011 - 8:57:52 AM - David Morton | Read The Tip |
|
Brahm: You are absolutely correct - the order by in the following line must be the field that contains the values we want to calculate our median on. Ordinal = ROW_NUMBER() OVER (ORDER BY FieldName) By using this technique our concerns for ordering the data values has been met.
Here is the first example (the cusor one) re-coded using Scott's approach and I apologize for not using a CTE to begin with.
--drop table TestTable CREATE TABLE [dbo].[TestTable]( ; WITH Domain AS -- Query 2 If you look at the result of the second query, you will see that the data is actually sorted without the use of an additional sort clause. The Row_Number() Function has done that for us.
Thanks guys - your all steeley eyed rocket scientists!
Dave Morton
|
|
|
privacy | disclaimer | copyright | advertise | about authors | contribute | feedback | giveaways | user groups Some names and products listed are the registered trademarks of their respective owners. Edgewood Solutions LLC | MSSharePointTips.com | MSSQLTips.com |