We have a set of live data that frequently changes, even though there isn't many data points and we had developers introduce other architectural layers, like caching mechanisms, but these introduced new bugs we hadn't seen before their introduction. If we had a situation where we were reporting a thousand records that were constantly changing and we wanted to keep these records fresh, what would be some native ways in SQL Server to handle this problem without introducing other tools?
I enjoy the challenge of mixing technologies, but we should always consider the end user, report in question, and the environment, as mixing technologies might be an experience we don't want clients to feel, even if we are learning how technologies interact. A caching layer, which uses other technologies on top of SQL Server - as an example - can significantly boost performance for some applications, reports, or API layers. It can also create unpleasant interactions that require additional troubleshooting time. If you want to stick with a native SQL Server design because it's easier to manage, let's look at a few alternatives in SQL Server.
In memory tables
These offer one of the best tools within later versions of SQL Server for handling frequently changing data. Provided that the application or ETL layer loads these tables or updates their values quickly and doesn't become the bottleneck, in-memory tables will handle reporting the most up-to-date information quickly. If I face a situation where I need to keep historic records as well, for some historic reports - I will save the latest information to a table and update the information in an in-memory table. The architectural reason for this is that a historic table that stores all values will generally receive new data and have data read (insert and select operations). Let's look at a contrived example where I update weather data in the in memory table, and save the record in the historic table:
---- For information in memory, I would recommend a Key-Value design as much as possible UPDATE tblInMemoryExample SET WeatherValue = 33 WHERE WeatherKey = 'Phoenix' ---- Disk design would be standard: INSERT INTO tblSavedExample (CityId,WeatherValue,WeatherDate) VALUES (4,33,GETDATE())
In the above example, the historic table will use the CityID to link to a key lookup table, but since the in-memory table only has the latest updated information, I skip this link in the ETL or application layer. The above assumes that I need the historic data and the latter step can be skipped if this does not apply. If you use a later edition of SQL Server, I highly recommend this feature if you need to provide live data to clients and you've seen OLTP conflicts with these live data. While it may change how you import data, such as requiring an additional step in the extract part of ETL, the quick time and no conflict between read and writes makes this a useful tool when needed. As you can see from the above, I think of the in-memory table from an update process (after first load), while the historic information is a standard insert and archive process. There are a few drawback and limitations to in memory tables:
- They are unavailable in SQL Server versions earlier than 2014.
- Only some data types are supported such as integer types, decimal, numeric, float, real, money types, char and nvarchar types, sysnames, datetime types, and time (see here for 2014 list).
- There are limitations on using in-memory tables, such as CHECKDB, replication, MERGE operations, etc.
For caching layer designs, in-memory tables make a great feature, though they may come with limitations if you're applying them elsewhere. In my viewpoint, they are not useful for historic data storage, so I wouldn't use them as a way to store historic data (as of the current versions of SQL Server). SQL Server 2014 and 2016 both come with data warehouse features, such as clustered column store indexes, which are more useful for historic data and historic reporting. In my experience, for live reporting of some data compatible with the in-memory feature, I've seen read-write speeds as fast as six multiples - but I only use them in reports that must be refreshed and are compatible with data types they offer, like a report of live commodity prices. The usefulness of in-memory tables highly depends on their use and if you seldom experience read-write conflicts, they may carry little value.
Small and scaled tables
If in memory tables are not an option, one technique that I've used in SQL Server is to scale tables in a manner where the amount of data is significantly reduced to allow for fast delete and update operations. In SQL Server, inserting data is one of the fastest operations, while deleting and updating can be the most expensive. If I only want to report the most recent data to the user, I can either insert the new data as I receive it, while removing the old, or update the record, if a key applies (for an example, if the key is "Boston" and the value is the weather measured in Celsius, I can run an update). If some records have frequent updates while others have a few, I can break these tables into smaller tables which allow for faster operations on the table level, using a load balancing technique where I spread the most active alongside the least active. Consider multiple tables only if your report involves enough records for scale; for an example, this design wouldn't make sense if I only need to show 50 records on a site.
With this design, watch for the wait ASYNC_NETWORK_IO. Provided that your indexing matches the live data requirements, statistics are being refreshed often to match the OLTP load, and other configurations allow for frequent changes, the bottleneck may appear on the communication between the machine that adds, updates or reads the data, not on SQL Server machine.
- If you're using SQL Server 2014 and beyond, I recommend testing in-memory tables for reporting your latest data, provided that the limitations don't apply in your situation.
- I recommend simulating a "scaling data" approach earlier than later. I've been the consultant in many environments that did not imagine horizontal scale early, found themselves more successful than they expected, then had to re-work their environment. Assume that you may need to horizontally scale your environment now; how would you do it and how could you make it easy now to do it later, if you need to?
- See this tip on Key-Value basics and where these may be useful; for OLTP in memory tables, these designs can make reporting data changes quick, when read-write speed must be at its maximum.
Last Update: 5/15/2017
About the author
View all my tips