SQL Server IOT Database Design Example

By:   |   Comments (1)   |   Related: > Database Design


Problem

There are many teams out there that are working with sensor data using Microsoft SQL Server. Flexibility of the relational model can be sometimes overwhelming leading to unnecessary complicated solutions that are near impossible to use in final reports. The purpose of this article is to provide an easy to understand and efficient Kimball (star-schema) data architecture for an IOT (sensor data) projects.

Solution

The IOT DataMart

Following are steps to build an IOT data mart.

Create the SQL Server Database

If you are planning a single node SQL server and do not require High Availability (you can afford a downtime of couple of hours once a month for patching or other issues) use "Simple Recovery Model". This recovery model combined with the TABLOCK insert query hint will minimize the use of the database log file and primarily write directly to the data file. In such a way you can significantly optimize IO performance of data insert operations.

T-SQL code for the database:

CREATE DATABASE DataMartIOT
GO

ALTER DATABASE DataMartIOT SET RECOVERY SIMPLE
GO

Once we have a database in place, we can focus on the Kimball data model. Complicated as it may sound, things are easier than they look. Star architecture or the Kimball model specify that measures (the values that sensors are reading) should go to fact tables. Entries that you use to filter that data with should go to dimension tables. There is a one-to-many relationship between any given dimension and a fact table. Now we need to define the facts (sensor readings) and Dimensions (Sensor names, Date, Time, etc.).

SQL Server DataMart Dimensions for IOT Data Model

Dimensions will be used as filters and drilldowns in Power BI reports. A typical sensor dimension will look as follows:

CREATE SCHEMA Dim
GO
 
CREATE TABLE Dim.Sensors
(
Id smallint IDENTITY(-32768,1) NOT NULL, -- allows for total of 65536 sensors  
SensorName varchar(255) NOT NULL,
Unit varchar(255),
CONSTRAINT PK_Dim_Sensors_Id PRIMARY KEY (Id) WITH (FILLFACTOR = 100),
CONSTRAINT UX_Dim_Sensors_SensorName UNIQUE (SensorName)
)
GO

The reason I have selected a smallint datatype for SensorId is its size of just 2 bytes and the fact that SQL Server has no unsigned integer datatype. Therefore, the seed starts with -32768 to make use of all the dataspace reserved for this column in a page. Remember if 65K+ number of sensors is not enough for your project this needs to be scaled up to INT or BIGINT datatype in both Dim.Sensors and Fact.Readings. Don’t overdo it, this column is used in clustered indexes and will be used in almost any join when you select data, the bigger it is the slower your system will be due to excessive IO.

Let's also add one more dimension called Groups. It will enable the end user to quickly switch between pre-defined sensor configuration.

CREATE TABLE Dim.Groups
(
Id tinyint IDENTITY(0,1) NOT NULL, -- allows for total of 256 groups; scale up if needed
GroupName varchar(255) NOT NULL,
CONSTRAINT PK_Dim_Groups_Id PRIMARY KEY (Id) WITH (FILLFACTOR = 100),
CONSTRAINT UX_Dim_Groups_GroupName UNIQUE (GroupName)
)
GO

In case your IOT project is distributed over multiple geographical locations add a locations dimension:

CREATE TABLE Dim.Locations
(
Id tinyint IDENTITY(0,1) NOT NULL, -- allows for total of 256 locations  ; scale up if needed
LocationName varchar(255) NOT NULL
CONSTRAINT PK_Dim_Locations_Id PRIMARY KEY (Id) WITH (FILLFACTOR = 100),
CONSTRAINT UX_Dim_Locations_LocationName UNIQUE (LocationName)
)
GO

To start I have included a simple date dimension below, if you require a more sophisticated version, at the end of the article I have provided a link to a very well detailed date dimension.

CREATE TABLE Dim.Dates
(
Id smallint IDENTITY(-32768,1) NOT NULL, -- allows for total of 65536 records or almost 180 years
DateValue Date NOT NULL,
CONSTRAINT PK_Dim_Dates_Id PRIMARY KEY (Id) WITH (FILLFACTOR = 100),
CONSTRAINT UX_Dim_Dates_DateValue UNIQUE (DateValue)
)
GO
 
-- Populates Date Dimension with dates from 30 days back in time to almost 180 years in the future
DECLARE @loop_iterator AS int = -30 --starting 30 days back in time
DECLARE @date_var AS Date = DATEADD(DAY,@loop_iterator,GETDATE())
WHILE @loop_iterator < 65506 -- maximum for small int; change if using different datatype
BEGIN
   DECLARE @temp_date AS date = DATEADD(DAY,@loop_iterator,@date_var)
   INSERT INTO Dim.Dates VALUES(@temp_date);
   SET @loop_iterator = @loop_iterator + 1
END
GO

Followed by the time dimension, that holds an entry for every second in a day:

CREATE TABLE Dim.Times
(
Id time(0) NOT NULL, -- time(0) for a nearest second precision.
HourValue tinyint NOT NULL, --for drill down reports
MinValue tinyint NOT NULL, --for drill down reports
SecValue tinyint NOT NULL, --for drill down reports
--MSValue smallint NOT NULL, --uncomment this line if you changed the precision to milliseconds
CONSTRAINT PK_Dim_Times_Id PRIMARY KEY (Id) WITH (FILLFACTOR = 100),
CONSTRAINT UX_Dim_Times_Values UNIQUE (HourValue,MinValue,SecValue) –add MSValue column if in use
)
GO
 
--Populating Time Dimension with values
DECLARE @loop_iterator AS int = 0
DECLARE @time_var AS time(0) = N'00:00:00'
WHILE @loop_iterator < 86400
BEGIN
   DECLARE @temp_id AS time(0) = DATEADD(second,@loop_iterator,@time_var)
   INSERT INTO Dim.Times VALUES
(@temp_id,DATEPART(HOUR,@temp_id),DATEPART(MINUTE,@temp_id),DATEPART(SECOND,@temp_id));
   SET @loop_iterator = @loop_iterator + 1
END
GO

SQL Server Fact Table for IOT Data Model

Once done with dimensions we can proceed to the fact table. The table that stores actual readings from sensors and references to every dimension using the foreign key constraint. It will be the core of our star schema, where arms are dimensions. The T-SQL table definition is below:

CREATE SCHEMA Fact
GO

CREATE TABLE Fact.Readings
(
DateId smallint NOT NULL, --Must be identical datatype to Date Dim
TimeId time(0) NOT NULL, -- Must be identical datatype to Time Dim
LocationId tinyint NOT NULL, -- Must be identical datatype to Location Dim
GroupId tinyint NOT NULL, -- Must be identical datatype to Group Dim
SensorId smallint NOT NULL, -- Must be identical datatype to Sensor Dim
SensorValue decimal(9,2), -- this will cover almost all types of numeric data
CONSTRAINT PK_Fact_Readings PRIMARY KEY (DateId, TimeId, LocationId, GroupId, SensorId),
CONSTRAINT FK_Fact_Readings_Dim_Date_Id FOREIGN KEY (DateId) REFERENCES Dim.Dates(Id),
CONSTRAINT FK_Fact_Readings_Dim_Time_Id FOREIGN KEY (TimeId) REFERENCES Dim.Times(Id),
CONSTRAINT FK_Fact_Readings_Dim_Sensors_Id FOREIGN KEY (SensorId) REFERENCES Dim.Sensors(Id),
CONSTRAINT FK_Fact_Readings_Dim_Groups_Id FOREIGN KEY (GroupId) REFERENCES Dim.Groups(Id),
CONSTRAINT FK_Fact_Readings_Dim_Locations_Id FOREIGN KEY (LocationId) REFERENCES Dim.Locations(Id)
)
GO 

The final database diagram should look like this:

Database Diagram

When you insert, modify or delete data SQL Server will automatically check that your data operation does not violate any relevant constrains. As good as it sounds, this requires additional CPU and IO resources, that’s why for best performance we will disable newly created foreign key constraints in the Fact.Readings table. When disabled, these still can be used for documentation purposes in both SSMS and Power BI:

ALTER TABLE Fact.readings
NOCHECK CONSTRAINT FK_Fact_Readings_Dim_Date_Id
GO
 
ALTER TABLE Fact.readings
NOCHECK CONSTRAINT FK_Fact_Readings_Dim_Time_Id
GO
 
ALTER TABLE Fact.Readings
NOCHECK CONSTRAINT FK_Fact_Readings_Dim_Sensors_Id
GO
 
ALTER TABLE Fact.Readings
NOCHECK CONSTRAINT FK_Fact_Readings_Dim_Groups_Id
GO
 
ALTER TABLE Fact.Readings
NOCHECK CONSTRAINT FK_Fact_Readings_Dim_Locations_Id
GO

Sample SQL Server Data for IOT Data Model

To test our newly created data mart, use the sample data script provided:

INSERT INTO Dim.Sensors VALUES -- CREATES 10 sensor entries for sensors dimension
(N'Temperature Sensor 01',N'Deg.C.'),
(N'Humidity Sensor 01',N'%RH'),
(N'Temperature Sensor 02',N'Deg.C.'),
(N'Humidity Sensor 02',N'%RH'),
(N'Temperature Sensor 03',N'Deg.C.'),
(N'Humidity Sensor 03',N'%RH'),
(N'Temperature Sensor 04',N'Deg.C.'),
(N'Humidity Sensor 04',N'%RH'),
(N'Temperature Sensor 05',N'Deg.C.'),
(N'Humidity Sensor 05',N'%RH')
GO
 
INSERT INTO Dim.Locations VALUES -- CREATES 5 location entries for locations dimension
(N'Oslo, Norway'),
(N'Stockholm, Sweden'),
(N'Copenhagen, Denmark'),
(N'Helsinki, Finland'),
(N'Reykjavik, Iceland')
 
INSERT INTO Dim.Groups VALUES -- CREATES 2 groups for groups dimension
(N'Temperature Sensors'),
(N'Humidity Sensors')
GO
 
-- GENERATING SAMPLE TEMPERATURE AND HUMIDITY DATA FOR THE PAST 7 DAYS
 
PRINT N'Generating Sample sensor data, this will take some minutes'
DECLARE @DateId smallint, @DateValue date, @TimeId time(0);
DECLARE @RandomSeed int = 12345;
DECLARE @SinAngle decimal(9,2) = 0;
 
DECLARE date_time_cursor CURSOR  
    FOR SELECT Dates.Id AS DateId, Dates.DateValue AS DateValue, Times.Id AS TimeId FROM Dim.Dates AS Dates CROSS JOIN Dim.Times AS Times WHERE Dates.DateValue BETWEEN DATEADD(DAY,-7,GETDATE()) AND GETDATE() ORDER BY Dates.Id, Times.Id  
OPEN date_time_cursor
 
FETCH NEXT FROM date_time_cursor
INTO @DateId, @DateValue, @TimeId;
WHILE @@FETCH_STATUS = 0
BEGIN
   SET @SinAngle = @SinAngle + 0.01;
   DECLARE @Temperature1 decimal (9,2) = 24.5 + RAND() + 5.0 * SIN(@SinAngle);
   DECLARE @RHValue1 decimal(9,2) = 50 + RAND() + 20.0 * SIN(@SinAngle);
   DECLARE @Temperature2 decimal (9,2) = 20.5 + RAND() + 6.0 * SIN(@SinAngle + 90);
   DECLARE @RHValue2 decimal(9,2) = 60 + RAND() + 15.0 * SIN(@SinAngle + 90);
   DECLARE @Temperature3 decimal (9,2) = 22.5 + RAND() + 3.0 * SIN(@SinAngle + 45);
   DECLARE @RHValue3 decimal(9,2) = 70 + RAND() + 10.0 * SIN(@SinAngle + 45);
   DECLARE @Temperature4 decimal (9,2) = 27.5 + RAND() + 4.0 * SIN(@SinAngle + 30);
   DECLARE @RHValue4 decimal(9,2) = 35 + RAND() + 17.0 * SIN(@SinAngle + 30);
   DECLARE @Temperature5 decimal (9,2) = 10.0 + RAND() + 7.0 * SIN(@SinAngle + 60);
   DECLARE @RHValue5 decimal(9,2) = 40 + RAND() + 25.0 * SIN(@SinAngle + 60);
 
   INSERT INTO Fact.Readings VALUES
   (@DateId,@TimeId,0,0,-32768,@Temperature1),
   (@DateId,@TimeId,0,1,-32767,@RHValue1),
   (@DateId,@TimeId,1,0,-32766,@Temperature2),
   (@DateId,@TimeId,1,1,-32765,@RHValue2),
   (@DateId,@TimeId,2,0,-32764,@Temperature3),
   (@DateId,@TimeId,2,1,-32763,@RHValue3),
   (@DateId,@TimeId,3,0,-32762,@Temperature4),
   (@DateId,@TimeId,3,1,-32761,@RHValue4),
   (@DateId,@TimeId,4,0,-32760,@Temperature5),
   (@DateId,@TimeId,4,1,-32759,@RHValue5)
 
   FETCH NEXT FROM date_time_cursor
   INTO @DateId, @DateValue, @TimeId;
END;
CLOSE date_time_cursor
DEALLOCATE date_time_cursor
GO

The RAND() function generates a random number between zero and one. In the above script it is used to add noise to the data. The SIN() function is used to add periodic fluctuations with a given amplitude.

Using data with T-SQL from IOT Data Model

The Fact.Readings table has a composite clustered index. For this index to work efficiently we need to specify in our WHERE clause a value for every column that is part of index PK_Fact_Readings (DateId, TimeId, LocationId, GroupId, SensorId). Below is an example of such a T-SQL query:

SELECT DimDates.DateValue, FactReadings.Id, FactReadings.SensorValue
FROM   Dim.Sensors AS DimSensors
  JOIN Fact.Readings AS FactReadings ON DimSensors.Id = FactReadings.SensorId
  JOIN Dim.Dates AS DimDates ON DimDates.Id = FactReadings.DateId
  JOIN Dim.Times AS DimTimes ON DimTimes.Id = FactReadings.TimeId
  JOIN Dim.Groups AS DimGroups ON DimGroups.Id = FactReadings.GroupId
  JOIN Dim.Locations AS DimLocations ON DimLocations.Id = FactReadings.LocationId
WHERE DimDates.DateValue = 'Your Date Value'
  AND DimTimes.Id = 'Your Time Value'
  AND DimLocations.LocationName = N'Your Location Name'
  AND DimGroups.GroupName = N'Your Group Name'
  AND DimSensors.SensorName = N'Your Sensor Name'
Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Semjon Terehhov Semjon Terehhov is an MCSE in Data Management and Analytics, Data consultant & Partner at Cloudberries Norway.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, May 26, 2020 - 10:31:26 AM - bob Back To Top (85768)
SensorValue decimal(9,2), -- this will cover almost all types of numeric data

But not strings, timestamps, binary data and use a LOT more space than specific types and often with a smaller range.

This really needs a different schema. 















get free sql tips
agree to terms