The standard SQL language has a number of aggregate functions like: SUM, MIN, MAX, AVG, but a common statistics function that SQL Server does not have is a built-in aggregate function for median. The median is the value that falls in the middle of a sorted resultset with equal parts that are smaller and equal parts that are greater. Since there is no built-in implementation for the median, the following is a simple solution I put together to find the median.
The general solution I'm suggesting here uses a stored procedure I created called dbo.sp_calc_median. The need I had was to just find the median value for various resultsets, so I put together this stored procedure that could be used for just about anything. The procedure takes a table name and a column name to calculate the median value for any data type.
The procedure builds an ordered temporary table of the column values and by using a dynamic scrollable cursor, scrolls to the middle of the result set and outputs the result.
Here is the stored procedure:
create procedure dbo.sp_calc_median (@tablename varchar(50), @columnname varchar(50), @result sql_variant OUTPUT) as begin declare @sqlstmt varchar(200) declare @midCount int set nocount on set @sqlstmt = 'insert #tempmedian select ' + @columnname + ' from ' + @tablename + ' order by 1 asc ' create table #tempmedian (col sql_variant) exec (@sqlstmt) declare c_med cursor scroll for select * from #tempmedian select @midCount = round ( count(*) * 0.5,0 ) from #tempmedian open c_med fetch absolute @midCount from c_med into @result close c_med deallocate c_med drop table #tempmedian end go
Here is small sample data set and sample executions, so it is easy to see that this is working.
--sample table CREATE TABLE [dbo].[TestTable]( [id] [int] IDENTITY(1,1) NOT NULL, [testID] [int] NULL, [testName] [varchar](50) NULL, [testDate] [date] NULL, CONSTRAINT [PK_TestTable] PRIMARY KEY CLUSTERED ([id] ASC) ) -- sample data INSERT INTO TestTable (testID, testName, testDate) SELECT 1, 'Dave', '2000-11-01' UNION SELECT 2, 'Mike', '1995-01-11' UNION SELECT 3, 'Sue' , '1965-07-14' UNION SELECT 4, 'Jill', '2001-03-07' UNION SELECT 5, 'Abe' , '2005-09-13' -- sample run DECLARE @result sql_variant EXEC dbo.sp_calc_median 'dbo.TestTable', 'testID', @result OUTPUT SELECT @result EXEC dbo.sp_calc_median 'dbo.TestTable', 'testName', @result OUTPUT SELECT @result EXEC dbo.sp_calc_median 'dbo.TestTable', 'testDate', @result OUTPUT SELECT @result -- output from above run 3 Jill 2000-11-01
With this small dataset it is easy to see how these values are determined as shown below where the values are sorted.
- testID - 1, 2, 3, 4, 5
- testName - Abe, Dave, Jill, Mike, Sue
- testDate - 1965-07-14, 1995-01-11, 2000-11-01, 2001-03-07, 2005-09-13
Here is another sample run I did against the Northwind database. This table has more data than the sample above, but works just the same. Also note that I am passing in the database name along with the table, so this stored procedure could be created in one database on your SQL Server instance and used for any database and table.
DECLARE @res sql_variant -- declaring a sql_variant column EXEC sp_calc_median 'northwind..products','productname',@res OUTPUT PRINT convert(nvarchar,@res) EXEC sp_calc_median 'northwind..products','UnitPrice',@res OUTPUT PRINT convert(real,@res) -- output from above run Maxilaku 19.5
- Create the sp_calc_median stored procedure and test against your data.
- Look at other ways you could use this in your environment.
- There are several other ways to determine the median value, but this was one simple solution that satisfied my needs. Hopefully this solution gets the wheels turning.
- Check out these related tips
Last Update: 10/31/2011
About the author
View all my tips