Learn more about SQL Server tools

mssqltips logo
 

Tutorials          DBA          Dev          BI          Career          Categories          Webcasts          Whitepapers          Today's Tip          Join

Tutorials      DBA      Dev      BI      Categories      Webcasts

DBA    Dev    BI    Categories

 

SQL Server Data Aggregation for Data with Different Sampling Rates


By:   |   Read Comments (3)   |   Related Tips: More > Dates

Attend these FREE MSSQLTips webcasts >> click to register


Problem

In the emerging "Internet of Things", there are multitudes of devices collecting data at differing sampling rates. Integrating this data so the data has a common granularity in time is important to not only allow for accurate analysis and mining, but it will also aid in reducing the amount of data to be stored and processed.

Solution

In this tip, we will demonstrate how to use the T-SQL AVG function and GROUP BY clause to transform data collected from two devices sampling at 100Hz and 40Hz to one row per second. We will use T-SQL to generate simulated data for this example. Let's begin by creating two tables, one for Device A's data and another for Device B's data.

create table dbo.tblDeviceA 
(
  pkDeviceA integer identity(1,1) primary key,
  recordingTime datetime2,
  xAxis decimal(5,3),
  yAxis decimal(5,3),
  zAxis decimal(5,3)
)

create table dbo.tblDeviceB 
(
  pkDeviceB integer identity(1,1) primary key,
  recordingTime datetime2,
  temperature decimal(5,2),
  humidity decimal(5,2)
)

Next, we will populate the Device A table with our simulated data from a device sampling at 100Hz. The DATEADD function is incrementing the recording time by 10 milliseconds.

declare @i as integer
declare @dateTime as datetime2
set @i=0
set @dateTime='2014-11-15 12:00:00.000';
while @i<1000
begin
  insert into dbo.tblDeviceA values 
  (dateadd(ms,@i*10,@dateTime), 2.0+rand()-0.5, rand()-0.5, rand()/10)
  set @i=@i+1
end
go

After inserting the rows, we will select the top 10 rows from the table to verify.

Top 10 rows from device A

Now we will populate the Device B table with our simulated data from a device sampling at 40Hz. The DATEADD function is incrementing the recording time by 25 milliseconds.

declare @i as integer
declare @dateTime as datetime2
set @i=0
set @dateTime='2014-11-15 12:00:00.000';
while @i<400
begin
  insert into dbo.tblDeviceB values 
  (dateadd(ms,@i*25,@dateTime), 72.0+rand()/10, 45.0+rand()/10)
  set @i=@i+1
end
go

After inserting the rows, we will select the top 10 rows from the table to verify.

Top 10 rows from device B

In these next steps we will roll up the data for each device to one row of data per second. An accepted way of doing this is to average the recorded data values within each second. In the T-SQL below, we are using the CONVERT function to remove the milliseconds from the date/time value. The style value of 120 returns the date in the yyyy-mm-dd hh:mi:ss(24h) format. The same CONVERT function is used in the GROUP BY and ORDER BY clauses. The AVG function is applied to each of the data columns to get an average value per second, and the CAST function is used to limit the returned values to three decimal places.

select CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) as recordingTime, 
cast(avg(xAxis) as decimal(5,3)) as averageXAxis,
cast(avg(yAxis) as decimal(5,3)) as averageYAxis,
cast(avg(zAxis) as decimal(5,3)) as averageZAxis
from dbo.tblDeviceA
group by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120))
order by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120))

The results from this query for Device A are shown below. We have reduced the data from 100 rows per second to 1 row per second.

One row per second for Device A

For Device B's data, we apply the same CONVERT function. Notice how the query does not need the sampling rate.

select CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) as recordingTime, 
cast(avg(temperature) as decimal(5,2)) as averageTemperature,
cast(avg(humidity) as decimal(5,2)) as averageHumidity
from dbo.tblDeviceB
group by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120))
order by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120))

The results from this query for Device B are shown below. We have reduced the data from 40 rows per second to 1 row per second.

One row per second for Device B

The last step in this tip will be to display the data from both devices together on one row. There are myriad ways to do this, but we will show one way that involves two subqueries with an INNER JOIN on the recording time columns.

select
a.recordingTime,
a.averageXAxis,
a.averageYAxis,
a.averageZAxis,
b.averageTemperature,
b.averageHumidity  
from
(
   select CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) as recordingTime, 
   cast(avg(xAxis) as decimal(5,3)) as averageXAxis,
   cast(avg(yAxis) as decimal(5,3)) as averageYAxis,
   cast(avg(zAxis) as decimal(5,3)) as averageZAxis
   from dbo.tblDeviceA
   group by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120))
)as A
inner join
( 
   select CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120)) as recordingTime, 
   cast(avg(temperature) as decimal(5,2)) as averageTemperature,
   cast(avg(humidity) as decimal(5,2)) as averageHumidity
   from dbo.tblDeviceB
   group by CONVERT(DATETIME, CONVERT(VARCHAR(20), recordingTime, 120))
) as B
on A.recordingTime=B.recordingTime
order by A.recordingTime

The results from the above query shows the data from both devices integrated on one row.

Reduced and integrated data between both devices
Next Steps

Temporary tables can be used to stage the data during the integration process. Also, this tip assumes that the clocks on the devices are synchronized. The DATEADD function can be used to adjust the times if the device clocks are not synchronized. Finally, read more about using the functions and clauses in T-SQL that are mentioned within this article in the following tips from MSSQLTips.com.

  • Add and Subtract Dates using DATEADD in SQL Server
  • Temporary Tables and Table Variables
  • Date and Time Conversions Using SQL Server
  • Decimal Conversions in SQL Server Change Datatype Format


  • Last Update:


    signup button

    next tip button



    About the author
    MSSQLTips author Dallas Snider Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

    View all my tips





    Post a comment or let the author know this tip helped.

    All comments are reviewed, so stay on subject or we may delete your comment. Note: your email address is not published. Required fields are marked with an asterisk (*).

    *Name    *Email    Notify for updates 


    SQL tips:

    *Enter Code refresh code     



    Tuesday, February 10, 2015 - 8:11:59 PM - Swati Dharia Back To Top

    I am new in SQL world. I am asked to learn SSIS, SSAS and SSRS. With very basic knowledge of SQL how do I go about getting help on stallation or configuring these.

    Any help would be appreciated.

     

     


    Wednesday, December 24, 2014 - 7:49:47 AM - WiseOldMan Back To Top

    While I agree with your theory for combining the data (other than the inner join which loses the other data when one piece is missing), I don't necessarily agree with averaging the data over time.  By sampling at 100hz and 25hz then reducing your results to 1hz, the average would only be accurate if your data was "slowly moving" and didn't really have a need for oversampling.  Temperature and humidity are probably fine, but the x, y, and z axes measurements would have severe issues if the object was rotating at anything faster than 0.4 hz.  Without averaging, you have a risk of aliasing, but by averaging you've completely lost the amplitudes.  For example, if the object was rotating at a constant 10hz, your original sampling would have 10 records per rotation, but your average would show that it wasn't moving at all.  If the temperature and humidity were replaced by factors that were largely dependent on the values of the x,y,and z axes, then you could only draw incorrect conclusions by the analysis. 

    We don't often work with scientific data measurements, so averaging is probably an acceptable method for most of what we do on a day-to-day basis. My only caution is to "know your data." Know how it was recorded, what is expected from the results, then address it in an appropriate manner. 


    Thursday, December 04, 2014 - 1:54:59 AM - ramesh bob Back To Top

    Great post, I was actually doing some analysis using DateADD function to get the results by 15 minutes.

    But, Will adopt this method

    Thanks,

    Ramesh


    Learn more about SQL Server tools