Visually Tracking Buy-Sell Time Series Models with SQL Server and Excel
By: Rick Dobson | Updated: 2022-10-07 | Comments | Related: More > TSQL
I enjoy learning about data mining, building models to predict performance, and data visualization. However, I rarely encounter articles that address all three topics. Please present examples of how to develop data in SQL Server for models that are implemented and visually tracked in Excel.
This tip presents a use case for processing time series data in SQL Server that is then modeled and visually examined in Excel.
- This tip demonstrates how to create and invoke a stored procedure for computing exponential moving averages. SQL code is used for compiling a table of dates, exponential moving average values, and selected columns of underlying time series values for six different financial ticker symbols. Four period lengths are examined, including 10, 30, 50, and 200 period lengths. The final output from SQL Server consists of a table where each row has a date, a couple of columns of underlying time series values, and a set of exponential moving averages with different period lengths.
- Separate models are created and visually tracked in Excel for subsets of the results set that were configured in and copied from SQL Server. Line charts in Excel visually compare a set of exponential moving average values to underlying time series values. Also, Excel's conditional formatting capabilities are applied to values copied from SQL Server to color code selected rows that conform to a criterion expression. Visually analyzing the time series data with color-coded rows derived from conditional formatting and line charts offers significantly improved insights beyond what can be obtained from a basic listing of raw data values from SQL Server.
This tip presents a walkthrough of the use case example. The tip starts with mathematical expressions and SQL code for computing the exponential moving averages. Next, more SQL code shows an approach to compiling a table of values for transfer to Excel. Datasets for each of the six tickers are compiled in SQL Server. The datasets for one ticker symbol in one timeframe are visually analyzed in Excel across exponential moving averages with four different period lengths.
Expressions and SQL Code for Exponential Moving Averages with Different Period Lengths
A set of exponential moving average values is a sequence of average values for a corresponding set of underlying time series values. The average values are not arithmetic moving averages like simple moving averages that apply the same weight to all values in a time series. Instead, exponential moving averages are weighted moving averages that apply more weight to the most recent underlying time series values. Therefore, exponential moving averages match more recent underlying time series values more closely than simple moving averages.
Any underlying set of time series values can have multiple corresponding sets of exponential moving averages. Each corresponding set of exponential moving average values is for a different period length. There are two mathematical expressions for exponential moving average values – one for the current period's exponential moving average (ema) and another for a multiplier (alpha) used for computing the current period's ema.
current ema value = alpha*(current time series value) + (1 – alpha)*(prior ema) alpha = 2/(period length + 1)
The initial expression above indicates that ema values are iteratively computed. That is, the current ema depends, in part, on the prior ema value. The current ema also depends on the current time series value.
The value of alpha depends on the period length. The longer the period length, the smaller the value of alpha. Also, the value of alpha plus (1-alpha) is always 1. Therefore, ema values based on a shorter period length place relatively more weight on the current period. Conversely, ema values based on a longer period length place relatively more weight on the prior ema, which is, in turn, dependent on the underlying time series values from prior time periods.
The expression for the current ema above depends partly on a prior ema, but there is no prior ema for the first period. So, the previous ema for the first period has a null value. Furthermore, the preceding ema value for the second period is arbitrary because there is no prior ema value for the first period.
The goal for assigning an initial prior ema in the second period is to minimize the differences between subsequent current period ema values and subsequent period underlying time series values. This tip's approach to minimizing the differences is to set the prior ema in the second period to the initial underlying in the first time series period. As the number of periods from the second period increases, the effect of the second period prior ema value assignment diminishes in significance because of the iterative nature of the computations. See this related tip, Exponential Moving Average Calculation in SQL Server, for a more detailed discussion of computing practices with SQL for exponential moving averages.
Here is a stored procedure named usp_ema_computer in the dbo schema of the default database for computing ema values. The table pulls values from the yahoo_finance_ohlcv_values_with_symbol table in the same schema and database as the stored procedure. The table contains open, high, low, and close price values and volume counts of the shares traded across trading dates. Creating and populating the yahoo_finance_ohlcv_values_with_symbol table is described in an earlier tip, Migrating Time Series Data to SQL Server from Yahoo Finance and Stooq.com.
The procedure has three input parameters:
- @symbol designates a ticker symbol for which to compute exponential moving averages.
- @period is for the period length of the exponential moving average series.
- @alpha is for the weighting factor for the current time series value; implicitly (1 - @alpha) is for the weighting factor for the prior ema value.
drop proc if exists dbo.usp_ema_computer go create procedure dbo.usp_ema_computer -- Add the parameters for the stored procedure here @symbol nvarchar(10) -- for example, assign as 'GOLD' ,@period int -- for example, assign as 10 ,@alpha float -- for example, assign as 2.0/(10 + 1) as begin -- suppress row counts for output from usp set nocount on; -- parameters to the script are -- @symbol identifier for set of time series values -- @period number of periods for ema -- @alpha weight for current row time series value -- initially populate #temp_for_ema for ema calculations -- @ema_first is the seed declare @ema_first money = (select top 1 [close] from [dbo].[yahoo_finance_ohlcv_values_with_symbol] where symbol = @symbol order by [date]) -- create base table for ema calculations drop table if exists #temp_for_ema -- ema seed run to populate #temp_for_ema -- all rows have first close price for ema select [date] ,[symbol] ,[close] ,row_number() OVER (ORDER BY [Date]) [row_number] ,@ema_first ema into #temp_for_ema from [dbo].[yahoo_finance_ohlcv_values_with_symbol] where symbol = @symbol order by row_number -- NULL ema values for first period update #temp_for_ema set ema = NULL where row_number = 1 -- calculate ema for all dates in time series -- @alpha is the exponential weight for the ema -- start calculations with the 3rd period value -- seed is close from 1st period; it is used as ema for 2nd period -- set @max_row_number int and initial @current_row_number -- declare @today_ema declare @max_row_number int = (select max(row_number) from #temp_for_ema) declare @current_row_number int = 3 declare @today_ema float -- loop for computing successive ema values while @current_row_number <= @max_row_number begin set @today_ema = ( -- compute ema for @current_row_number select top 1 ([close] * @alpha) + (lag(ema,1) over (order by [date]) * (1 - @alpha)) ema_today from #temp_for_ema where row_number >= @current_row_number -1 and row_number <= @current_row_number order by row_number desc ) -- update current row in #temp_for_ema with @today_ema -- and increment @current_row_number update #temp_for_ema set ema = @today_ema where row_number = @current_row_number set @current_row_number = @current_row_number + 1 end -- display the results set from temp_for_ema with the calculated values select date ,symbol ,[close] ,@period period_length ,ema ema from #temp_for_ema where row_number < = @max_row_number order by row_number end go
Here is a create table statement for a table to store computed output from the usp_ema_computer stored procedure. The table's primary key speeds up the configuring and copying of its contents to Excel. Notice that the primary key for this table consists of column values for symbol, date, and period length. Therefore, it can store multiple ema values for the same date and symbol values. Each set of ema values for a symbol and date is for a different period length.
-- create table to store emas for multiple ticker symbols -- with different period lengths -- primary key orders table rows by symbol, date, period_lengh values drop table if exists dbo.ema_period_symbol create table dbo.ema_period_symbol ( date date not null, symbol nvarchar(10) not null, [close] money, period_length int not null, ema money ) alter table dbo.ema_period_symbol ADD PRIMARY KEY(symbol, date, period_length)
Here is a code excerpt for invoking usp_ema_computer and storing its output in the ema_period_symbol table. This excerpt is for computing ema values for six ticker symbols (AAPL, GOOGL, MSFT, SPXL, TQQQ, and UDOW) with a period length value of 10. Three additional sets of code statements are in this tip's download. These additional sets of code are for populating the ema_period_symbol table with ema values having period lengths of 30, 50, and 200.
-- compute emas for AAPL, GOOGL, MSFT, TQQQ, SPXL, UDOW -- with period lengths of 10, 30, 50, 200 -- compute emas with a period_length of 10 declare @alpha float = (2.0/(10 + 1)) insert into dbo.ema_period_symbol exec dbo.usp_ema_computer 'AAPL', 10, @alpha insert into dbo.ema_period_symbol exec dbo.usp_ema_computer 'GOOGL', 10, @alpha insert into dbo.ema_period_symbol exec dbo.usp_ema_computer 'MSFT', 10, @alpha insert into dbo.ema_period_symbol exec dbo.usp_ema_computer 'SPXL', 10, @alpha insert into dbo.ema_period_symbol exec dbo.usp_ema_computer 'TQQQ', 10, @alpha insert into dbo.ema_period_symbol exec dbo.usp_ema_computer 'UDOW', 10, @alpha
The ema_period_symbol table is normalized. This is a common way to store data in relational database engines, such as SQL Server. However, normalized tables are often not a natural way to view data in other environments, such as Excel (or other environments for visualizing data). The more common way to store data in these environments is as de-normalized tables.
Also, it is common to display time series data indexed by date and the entity to which the time series data apply. In this tip, the entities are ticker symbols for financial securities. Additional columns in de-normalized tables can contain values, such as underlying time series data and computed values, such as exponential moving averages with period lengths of 10, 30, 50, and 200.
The following screenshot shows a simple script for displaying selected columns from the ema_period_symbol table joined to the source_data table along with the first sixteen rows from the results set. The source_data table is an excerpt from the yahoo_finance_ohlcv_values_with_symbol table, which contains the original underlying source for this tip. The image below shows a relational view of the data.
With some more SQL code, it is possible to reconfigure the ema values so that the ema values for each period length appears in a separate column. This representation, shown in the following screenshot, provides a de-normalized image of values like those in the preceding table. Most analysts working with time series data find this display a lot easier to read and process than the relational tabular view in the preceding screenshot. One approach for transforming the normalized view to the de-normalized view appears in the download for this tip. The next two sections of this tip illustrate how to process excerpts from the complete de-normalized table in Excel.
Visually Modeling SQL Server Data in Excel with Conditional Formatting and Line Charts
This tip evaluates a simple model based on the ema values compared to open prices from the underlying time series data. If the open price rises from below to above the ema value for a trading day, buy the security on the next trading date. On the other hand, sell the security on the next trading day when the open price falls from being above the ema. When the sell price is greater than the buy price, the buy-sell cycle generates a profit. Otherwise, the buy-sell cycle does not generate a profit.
The visual analysis is performed with Excel's conditional formatting feature and with line charts:
- The conditional formatting feature assigns a green background color to table rows copied from SQL Server to Excel for rows with an open price greater than the ema value for the current row; otherwise, a background color is not assigned.
- The source data for the visual modeling is derived from a selection of rows
copied from the denormalized_emas_by_symbol_with_open_close table in SQL Server
to a worksheet in an Excel workbook. The selection of excerpted rows is
chosen so that
- The first several rows start with trading days for which an ema value is greater than the close price
- A stretch of rows towards the end of the trading days is greater than the close price
- Between the first set of rows and rows towards the end of the excerpt, there can be one or more contiguous blocks of trading days (or even just a stand-alone single trading day) with a green background
- The line charts for the excerpted rows have two lines displayed over the
dates along the horizontal axis.
- The ema values appear in a blue line without line markers
- The open prices appear as unconnected red points above or below the
- A red point is above the blue line when the open price is greater than the ema value for a trading day
- A red point is at or below the blue line when the open price is not greater than the ema value for a trading day
- The red points are line markers for the open price values; the line between the markers is assigned a 100% transparency property so that it does not show on the line chart
Visualizing Contiguous Trading Days When the Open Price is Greater Than or Not Greater Than the ema Value
The following screenshot shows the first 40 rows in a worksheet named AAPL_2011_ema_10 within an Excel workbook file. See the worksheet identifier on the tab in the lower-left corner of the screenshot.
- Rows 2 through 40 contain symbol, date, ema_10, and open column values for the first set of excerpted rows from the denormalized_emas_by_symbol_with_open_close table in SQL Server.
- Columns A, B, C, and G in the first worksheet row show column labels for the symbol, date, ema_10, and open column values from the SQL Server table.
- Columns D, E, F, and H are hidden columns with contents for other columns from the denormalized_emas_by_symbol_with_open_close table in SQL Server.
- The green background color for selected rows in columns B, C, and G is assigned by the Excel conditional formatting feature.
- To the right of columns A through G is an Excel line chart.
- The chart graphs the ema_10 values as a blue line without markers to depict the position of ema_10 points along the line.
- The red points are for open values.
- The dates along the chart's horizontal axis are for the date values in column B.
The following screenshot below shows the bottom of the worksheet that appears above. Some data rows after the bottom of the preceding screenshot and the beginning of the next screenshot are excluded to conserve space.
- There is no line chart in this screenshot because all the rows are plotted in the preceding line chart.
- However, columns A, B, C, and G contain values through the last dates in
the preceding line chart.
- From the preceding screenshot, you can tell that the first date is for June 21, 2011. This date value appears in row 2.
- From the following screenshot, you can tell that the last date is for December 2, 2011. This date value appears in row 117.
Invoking and Applying Excel's Conditional Formatting Feature
You can invoke Excel's conditional formatting feature by opening the Home ribbon and choosing Conditional Formatting. The following screenshot shows these selections.
- Hovering your cursor over the Conditional Formatting menu item will show a brief description of the feature and a link for using it.
- I found this YouTube video by TeachExcel particularly helpful in applying Excel conditional formatting for the requirements of this tip.
By opening the Excel workbook file in this tip's download, you can click the Conditional Formatting menu item and choose Manage Rules. These actions can open the following dialog. This dialog opens whatever conditional rules are in place for a worksheet in a workbook.
- In the dialog below, you can see a single rule for the current worksheet (AAPL_2011_ema_10) in the workbook file.
- The rule applies to cells $B$2:$G$117.
- The format for the rule is to make the background color green.
- The format is assigned to cells in the rule's scope ($B$2:$G$117).
- The rule for applying the format is $G2>$C2
- Column G is for open price values
- Column C is for ema_10 values
- The $ before G and C indicates the rule applies to row 2 through to row 117
Clicking the "Edit Rule…" button towards the top of the Conditional Formatting Rules Manager window opens the following dialog. You can edit or review the settings for a conditional formatting rule in this dialog.
- The "Format values where this formula is true:" text box allows you to change the formula for the rule.
- The "Format…" button allows you to edit the format applied to rows when the format formula is true.
Clicking the "Format…" button in the preceding image allows you to select either a pre-defined color or specify a custom color.
- The image below on the left shows the dialog box for choosing a pre-defined color.
- The image below on the right shows the dialog box for specifying a custom
- For the current rule, the value of 255 was assigned to the Green combo box.
- The Red and Blue combo boxes both have values of 0.
- This combination of Red, Green, and Blue combo box values results in a deep green color, like the one for the spreadsheet shown at the beginning of the preceding subsection.
- Be sure to click OK in either of the two dialogs to commit your color choice from the dialog.
Instead of clicking the "Edit Rule" button in the "Conditional Formatting Rules Manager" dialog, there are options to click other buttons to delete, duplicate, or create a new rule (for subsequent editing if appropriate).
Open Price Values and ema Values Plotted Across Dates in a Line Chart
This subsection reviews the graphical user interface steps for creating a line chart in Excel for a date, ema_10, and open column values. The steps start from SQL Server by copying a selected set of rows from the denormalized_emas_by_symbol_with_open_close SQL Server table to an Excel workbook like the one in this tip's download.
The following two screenshots show the first and last 10 highlighted rows for an excerpt from SQL Server to the Windows clipboard. The highlighted rows start with a date of 2011-06-21 and end with a date of 2011-12-02. Right-click anywhere in the highlighted rows and choose "Choose Copy with Headers".
Next, go to an empty Excel worksheet tab, such as "redo of AAPL_2011_ema_10" in the screenshot below, and paste the copied rows from the Windows clipboard. The screenshot below displays an excerpt of the beginning copied rows.
After copying the selected rows from SQL Server to Excel, you can hide columns not required for the line chart on a sheet or for identifying the data. The following screenshot shows the worksheet with ema_30, ema_50, and ema_200 columns already hidden, and the close column is in the process of being hidden.
The next screenshot shows the beginning rows for the symbol, date, ema_10, and open columns. The line chart displays ema_10 and open column values on the vertical axis versus date values on the horizontal axis. The symbol column values are not strictly necessary for the line chart, but including this column is helpful to easily verify which symbol the line chart displays data.
- Start to prepare the line chart by selecting all data rows in columns B, C, and G.
- Next, choose the Line or Area Chart menu item from the Excel Insert ribbon.
- Then, choose the 2-D Line chart icon to display a basic line chart for ema_10 and open column values versus dates.
After the basic line chart appears on a worksheet, you may want to move and resize the chart on the worksheet. The following screen image shows the basic line chart resized over the first 40 rows of the worksheet.
- Click on the outside border of the chart and drag the chart image to any location you wish, such as cell J4.
- You may resize the line chart image by selecting any circle along the outside chart border and dragging the border in any direction to increase, decrease, and/or re-proportion the chart image.
Additionally, conditional formatting is applied to the data in columns B, C, and G according to the steps described in the "Visualizing Contiguous Trading Days when the open prices are greater than the ema values" subsection. Recall that conditional formatting makes it easier to interpret when open prices are rising or falling relative to an exponential moving average.
Configuring a Basic Line Chart
It can take a moderate amount of configuration to transform a basic line chart into an image that readily reveals useful information in an attractive format. Visualization developers may not be responsible for picking the final format for a chart, but their responsibility may include preparing a chart in whatever format an end-user needs it. You can perform chart configuration tasks in Excel by selecting successive chart elements and then making formatting choices for the currently selected element. The chart elements that need to be formatted to transform the basic line chart in the "redo of AAPL_2011_ema_10" sheet to the fully formatted line chart in the "AAPL_2011_ema_10" sheet include:
- vertical axis values
- horizontal axis values
- ema_10 line values
- open line values
- the major and minor gridlines from the vertical and horizontal axes
- The "redo of AAPL_2011_ema_10" sheet in the workbook file in this tip's download does not have the recommended changes mentioned below. This allows you to practice making the changes without changing a fully formatted chart.
To select the vertical axis values, left-click any of the vertical axis values. Then, right-click within the box that appears around the full set of vertical axis values. Choose the Format Axis menu item to open a dialog for selecting formatting choices.
- Adjust the Minimum Bounds value and the Maximum Bounds value to provide the maximum separation between the lines in the chart. The default values for the vertical axis bounds are 0 and 16, but all the vertical values for the two lines in the chart are from 11 through 15.5. Therefore, update the default minimum and maximum values, respectively, with 11 and 15.5.
- This tip also assigns values to major units and minor units text boxes. The values in these two text boxes control the spacing between major and minor gridlines extending from the vertical axis. The vertical and horizontal gridline spacing can improve the visual discernment of values in a line chart. Update the default major units and minor units text values, respectively, from 2 and .4 to .5 and .1 for the basic line chart.
To format the horizontal axis values, left-click anywhere in the range of values along the horizontal axis. This draws a selection box around the horizontal axis values. Then, right-click the box and choose Format Axis.
- For the basic line chart in this example, the minimum and maximum bounds values of 6/21/2011 and 12/2/2011 do not need to be re-configured. These values are derived by default from the date column.
- The major units and minor units text boxes specify the spacing in date values between major and minor gridlines extending from the horizontal axis. There are also major units and minor units combo boxes. In the dialog box below, these are for days, which is correct for the values in the date column. However, it sometimes happens that combo box settings are not appropriately set for your display requirements. When this happens, you can use the combo boxes to change the unit measure to a day, a month, or a year.
The next step is to update the format settings for the ema_10 line:
- Start by left-clicking the ema_10 line. Default markers appear along the line to indicate it is selected.
- Then, right-click the line and choose the Format Data Series menu item. In this tip, all we want to do is assign a custom color to the ema_10 line.
Therefore, click the paint can icon at the top of the dialog to select a color. Then click the Color control in the dialog. The control lets you pick from a pre-defined set of colors, or you can click the More Colors… menu option to specify a custom color. The example below shows the selection of the fifth color in the top row of theme colors.
The next actions show a more elaborate use of line formatting for the open line. Recall that the line for open values should not actually show a line. Instead, a sequence of round red markers represents the open values line.
- Start the formatting process for the open values line by left-clicking the line. Again, special markers appear to show that the line is selected.
- Then, right-click the line and choose Format Data Series.
- Click the paint can icon.
- Then, choose the Marker control that is to the right of the Line control at the top of the dialog
- Next, click the Marker Options menu item
- Follow this by choosing the Built-in option control below Marker Options
- Choose the round type of marker
- Then, click the Color control and choose the red pre-defined standard color to reproduce the example in this tip appearing in the AAPL_2011_ema_10 worksheet tab
- Your last Marker Options selection for this example is to choose the No Line option from the Border options for the line markers; this setting specifies that round-type markers do not have a border around their circumference
- Finally, click the Line control to the left of the Marker control at the
top of the dialog.
- Left-click the first marker on the open line to select the open line.
- Use the controls on the Transparency combo box to change its value from 0 to 100.
- This setting makes the line between the markers disappear in the open line on the chart and in the legend for open values below the horizontal axis values.
- The next two dialog images show the selections for the marker options followed by the Transparency setting for the visibility of the open line
The major units and minor units setting for the horizontal and vertical axes designate the spacing between major and minor gridlines in a line chart. The gridlines can extend from the horizontal or vertical axes of a line chart. While the spacing between gridlines is controlled by settings in the Format Axis dialogs for horizontal axis and vertical axis values (see preceding screenshots), the visibility and style of the gridlines can be managed through the Format Major Gridlines and Format Minor Gridlines dialogs associated with the horizontal and vertical axis values. Invoke its corresponding format gridlines context menu to make the major gridlines or minor gridlines visible for an axis. Then, close the dialog.
This tip uses default Format Major Gridlines and Format Minor Gridlines dialog settings for gridlines. These dialog settings appear below. In addition to managing the visibility of the gridlines through these dialogs, you can also designate the color, width, and dash style for the gridlines along with other style features.
Summary of Average Change Percentage in Open Brice by ema Period Length
The model for this tip can be expressed in terms of the color assignments from conditional formatting. Consider the following screen excerpt showing color assignments when the open price rises above the ema_10 value. The excerpt is for the first buy-sell cycle.
- The model designates a buy on the first row after the assigned row color changes from white to green. The date for this row is June 27, 2011. The open price on this date is 11.6996.
- The model also designates a sell on the first row after the assigned row color changes from green to white. The date for this row is August 8, 2011. The open price on this date is 12.9175
- The change amount for the open price values between the buy and sell dates is 1.2179, and the change percentage is 10.41%.
The following screenshot displays a table that shows all nine buy-sell cycles based on the relationships between ema_10 values and open prices.
- The bold values in columns B, C, and G are dates, ema_10 values, and open price values for the buy ends of the ema_10 buy-sell cycles.
- The bold values in columns I, J, and N are dates, ema_10 values, and open price values for the sell ends of the ema_10 buy-sell cycles.
- Columns P and Q display computed change amount (change amt) and percentage change for change amt (change %).
- Cell R22 displays the average percentage change for change amt across the nine buy-sell cycles.
Comparable tables were computed for each of the other three period lengths (30, 50, and 200) in this tip. A summary appears below across all four period lengths for the AAPL ticker symbol. As the period length grows from 10 to 200, the number of buy-sell cycles declines. Also, the average change percentage (avg change %) is dramatically larger for the 200-period-length ema values versus the buy-sell cycles based on shorter period lengths.
A full treatment of the factors underlying the differences is beyond the scope of this tip. However, one reason for the pattern of results in the preceding table is that the 200-day period length ema more readily identifies periods with elevated open prices than ema lines with period lengths of 10, 30, or 50. The following comparison of line charts for 10-day moving averages versus 200-day moving averages highlights this point.
The download for this tip contains eight files:
- One T-SQL script file with all the T-SQL code in the tip
- An XLSX file with all the worksheet tabs referenced in the tip
- Six CSV files with historical time series data for the six ticker symbols
This tip provides a walkthrough of how to visually mine adherence to a model for AAPL historical time series data. Verify if you can find other timeframes to confirm the results from 2011 and 2012. Here's a hint inspect these two timeframes: May 28, 2019, through March 20, 2020, and January 29, 2014, through August 13, 2015. Does the 200-day exponential moving average stand out from the other period lengths for its ability to identify elevated open prices relative to ema values?
Another interesting next step is applying the walkthrough steps for the other five ticker symbols -- GOOGL, MSFT, SPXL, TQQQ, and UDOW. Investigate whether the 200-day period yields the same results relative to the shorter period lengths for AAPL as for the other ticker symbols. You may achieve superior confirmation results by using a slightly different inspection timeframe than the 2011-2012 inspection timeframe for AAPL in this tip. Apply conditional formatting for data copied from SQL Server and look for long runs of green cells to find good inspection timeframes.
About the author
View all my tips
Article Last Updated: 2022-10-07