Automated Backtesting with SQL Server and Microsoft Copilot

Problem

This tip presents a walkthrough of an automated backtesting example implemented within SQL Server. Backtesting is a process for using historical data to evaluate likely outcomes based on a strategy, an analytical model, or a predictive model.

Microsoft Copilot, a generative AI assistant available from Microsoft in free and paid versions, is used in this tip to assist in T-SQL code development for the automated backtesting.

Solution

Backtesting can be used in weather forecasting, AI algorithm evaluation, traffic models, and stock trading models, among other application domains. This tip illustrates backtesting for a stock trading model. Even if you have no interest in stock data or trading analytics, this tip can still benefit you. It describes how to transfer data from a Google Sheets worksheet to a SQL Server table and build a simple analytical model implemented via a computed column in the SQL Server table. The tip also covers how to add a primary key constraint to a table, load data into a SQL Server table from a CSV file, and evaluate the performance of the model for the data loaded into the table with T-SQL. The processing steps in this tip provide a case study on how to backtest an analytical model with standard T-SQL development techniques.

The Source Data

The model and backtesting in this tip are implemented with T-SQL code, with its development being assisted by Microsoft Copilot’s generative AI features. The following three screenshots are excerpts from a Google Sheets worksheet. The data are originally derived from the Google Finance website with the GOOGLEFINANCE function in Google Sheets.

The top screenshot displays 11 rows that begin with the data column titles in row 1, followed by the first 10 rows of data for the SPY ticker, which tracks the daily performance in the S&P 500 index. The Date and Close columns are for the close price on a trading date for the SPY ticker. Columns EMA_10 through EMA_200 are named for exponential moving averages with period lengths of 10, 30, 50, and 200 periods. Exponential moving averages are just smoothed values for the close prices (they are not strictly speaking averages).
The middle screenshot displays the last 10 rows of data for the SPY ticker, as well as the first 10 rows of data for another ticker (SPXL). The SPXL ticker is for an exchange-traded fund that aims to return 300 percent the daily performance of the S&P 500 index.
The bottom screenshot displays the last ten rows of data for the SPXL ticker.
There are only two tickers (SPY and SPXL) in the source data for this backtesting case study. For a normal backtesting evaluation, you may use a broader range of tickers so that the backtesting outcome pertains to a broader range of tickers.

The next two screenshots show the first and last 10 rows from a Google Sheets worksheet whose excerpts appear in the preceding three screenshots. Both of the following screenshots are based on a CSV file (For Copy to SS ver 3.csv) from the Google Sheets worksheet.

The first screenshot shows the first row of the column names for the data in the CSV file, followed by the first nine rows of data for the SPY ticker.
The second screenshot shows the last 10 rows of data for the SPXL ticker.
You can verify the screenshots from the CSV file with the preceding screenshots from the source data in the Google Sheets worksheet.

T-SQL to Bulk Insert the CSV File

This section illustrates the two steps necessary to import the CSV file with a Bulk Insert statement:

Invoke a create table statement based on the data in the CSV file.
Invoke a Bulk Insert statement to load the data from the CSV file into the table created with the create table statement.

The next screenshot shows a Microsoft Copilot screen that can be used to attach the CSV file. The file attachment process for Microsoft Copilot makes the CSV file available so Microsoft Copilot can create a draft create table statement with the same structure as the values in the CSV file.

After clicking the upload button, you can navigate to the location of the CSV file via a dialog box provided by Microsoft Copilot. The header for each of the preceding two screenshots displays the CSV file name and path as “For Copy to SS ver 3. Csv” and “C:\Users\Public\Downloads\”, respectively.

After attaching the CSV file to Microsoft Copilot, you can enter a prompt to show a T-SQL create table statement for a table named FinancialData based on the attached file.

The following script shows a modified version of the create table statement returned by Microsoft Copilot pasted into SQL Server Management Services (SSMS). You can test and refine the returned code from Microsoft Copilot in SSMS. The returned code did not designate a database for the table. Developers are responsible for designating a default database. Also, Microsoft Copilot did not recognize Close as a SQL Server reserved word for closing a cursor. The embedding of the Close name in brackets is required to have SQL Server recognize Close as a column in a SQL Server table instead of a command to close a cursor.

-- designate a database into which to insert the table
use test
go
 
-- manually added code to conditionally drop FinancialData table
-- if it already exists
IF OBJECT_ID('dbo.FinancialData', 'U') IS NOT NULL
DROP TABLE dbo.FinancialData;
 
-- pasted statement from Microsoft Copilot
-- manual fix required to embrace Close in brackets 
-- because it is a SQL Server reserved word
-- for cursor processing
CREATE TABLE FinancialData (
    Ticker NVARCHAR(50),
    Date DATE,
    [Close] FLOAT,
    EMA_10 FLOAT,
    EMA_30 FLOAT,
    EMA_50 FLOAT,
    EMA_200 FLOAT
);

If you examine the two screenshots at the end of the preceding section, you can discover that the Close and EMA values are returned as decimal values with up to two places after the decimal point. Yet the T-SQL script returned by Microsoft Copilot specifies these values as having a float data type. This opens data processing with Close and EMA values to unnecessary rounding errors. Therefore, an additional prompt was made to Microsoft Copilot to “show me T-SQL code for converting close through ema_200 columns from float to a common decimal data type for currency values”. The returned T-SQL statements invoke alter table and alter column statements to perform the update to the designated columns in the FinancialData table.

ALTER TABLE FinancialData
ALTER COLUMN [Close] DECIMAL(18, 2);
 
ALTER TABLE FinancialData
ALTER COLUMN EMA_10 DECIMAL(18, 2);
 
ALTER TABLE FinancialData
ALTER COLUMN EMA_30 DECIMAL(18, 2);
 
ALTER TABLE FinancialData
ALTER COLUMN EMA_50 DECIMAL(18, 2);
 
ALTER TABLE FinancialData
ALTER COLUMN EMA_200 DECIMAL(18, 2);

The rows in the FinancialData table are unique by Ticker and Date column values. Therefore, this tip examines the ability of Microsoft Copilot to add a primary key based on Ticker and Date column values to the FinancialData table.

When initially prompted to add a primary key to the FinancialData table, Microsoft Copilot replied with the following T-SQL code.

ALTER TABLE dbo.FinancialData
ADD CONSTRAINT PK_FinancialData_Ticker_Date PRIMARY KEY (Ticker, Date);

However, when testing the code, SSMS returned the following error message:

Msg 8111, Level 16, State 1, Line 69
Cannot define PRIMARY KEY constraint on nullable column in table 'FinancialData'.
Msg 1750, Level 16, State 0, Line 69
Could not create constraint or index. See previous errors.

Here is an approach that resolves the issue:

-- When prompted to show a fix for the error message, 
-- Microsoft Copilot replied with a variation of the following code.
-- My variation to the code returned by Microsoft Copilot
-- was for the addition of the SSMS go keyword
ALTER TABLE dbo.FinancialData
ALTER COLUMN Ticker NVARCHAR(50) NOT NULL;
 
ALTER TABLE dbo.FinancialData
ALTER COLUMN Date DATE NOT NULL;
go
 
ALTER TABLE dbo.FinancialData
ADD CONSTRAINT PK_FinancialData_Ticker_Date PRIMARY KEY (Ticker, Date);
go

The last code excerpt for this tip section displays the bulk insert statement for loading the CSV file into the FinancialData table. After the bulk insert statement, a select statement echoes the data loaded into the table.

-- returned T-SQL for: show me bulk insert statement for populating 
-- FinancialData table from C:\Users\Public\Downloads\For Copy to SS ver 3.csv path
-- It was convenient to rename file name for downloaded csv file from Google Sheets
 
BULK INSERT FinancialData
FROM 'C:\Users\Public\Downloads\For Copy to SS ver 3.csv'
WITH (
    FORMAT = 'CSV',  -- Specifies that the file is in CSV format
    FIRSTROW = 2,    -- Skips the header row
    FIELDTERMINATOR = ',',  -- Specifies the delimiter between columns
    ROWTERMINATOR = '\n',   -- Specifies the end of a row
    TABLOCK,                -- Allows bulk insert with minimal locking
    CODEPAGE = 'ACP',       -- Adjusts encoding for compatibility
    KEEPNULLS               -- Keeps null values instead of default values
);
 
-- echo freshly populated table
select * from FinancialData

Adding a Computed Column to a Table to Implement an Analytical Model

The goal of this section is to demonstrate how to add and populate a fresh version of a computed column to a table to implement a simple analytical model.

Here is the code to add a fresh version of the computed column. The added column computes a value of TRUE or FALSE for each row in the table, depending on the close value relative to both the EMA_10 and EMA_200 values.

A value of TRUE for a row indicates that it is profitable to hold a security for a row’s date.
A value of FALSE for a row indicates that it is not profitable to hold a security for a row’s date.

-- adapted code from Microsoft Copilot for adding and populating a fresh 
-- version of a computed column (CloseAboveEMA) to the FinancialData table
 
-- start by conditionally dropping any existing computed column with the
-- name of CloseAboveEMA from the FinancialData table
IF COLUMNPROPERTY(OBJECT_ID('FinancialData'), 'CloseAboveEMA', 'IsComputed') = 1
BEGIN
    ALTER TABLE FinancialData DROP COLUMN CloseAboveEMA;
END
 
-- then, insert into SSMS Microsoft Copilot answer to this prompt:
-- show me T-SQL to return a computed column with TRUE and FALSE 
-- values based on the relationship of close prices to ema values, 
-- such as ema_10 and ema_200
ALTER TABLE FinancialData
ADD CloseAboveEMA AS 
    CASE 
        WHEN [Close] > EMA_10 AND [Close] > EMA_200 THEN 'TRUE'
        ELSE 'FALSE'
    END;

The following two screenshots show the Object Explorer view of the table before and after adding the computed column. The top screenshot shows the table structure before adding the computed column, and the bottom screenshot shows the table structure after adding the computed column.

Before programmatically backtesting the analytical model for profitability, this tip inspects the rows for the first 30 trading days for the SPXL ticker to determine if the model is roughly consistent with a subset of the whole dataset. For the model to work properly, the rows in the FinancialData table must contain sets of consecutive rows where the ending close value is greater than the beginning close value for rows with CloseAboveEMA column values of TRUE.

The following screenshot from SSMS for the first 30 rows for the SPXL ticker. In these rows, there are three sets of consecutive rows with TRUE values for the CloseAboveEMA column. Each set of consecutive rows has a green background for the Date, Close, and CloseAboveEMA columns.

The CloseAboveEMA column values serve to mark the rows in each of the three sets. A value of TRUE indicates the row belongs to one of the three sets.
The Date column is necessary because it can identify the beginning and ending dates for each of the three sets of rows. For the first set, the beginning date is 2024-01-08, and the ending date is 2024-01-16.
The Close column allows you to compute the difference between ending close value and the beginning close value. For the first set, the profitability criterion is not met because the ending close value (103.06) is less than the beginning close value (103.10). However, the profitability criterion is met for the second and third sets of consecutive rows with TRUE CloseAboveEMA values.

Programmatically Backtesting the Model

The objective of this section is to review T-SQL code for identifying sets of consecutive rows in the FinancialData table with a CloseAboveEMA column value of TRUE. Each set must have both Date values and Close values. The Date values are necessary for identifying the beginning and ending dates for each set. The Close values are required to compute the change in close values, which is the profitability, across a set of consecutive rows.

The key outputs from the analytical model backtested in this section depend on four blocks of T-SQL code. I found Microsoft Copilot especially helpful in returning draft code for identifying and enumerating sets of consecutive rows with a CloseAboveEMA column value of TRUE.

First Code Block: Deletes any existing versions of two temp tables.
Second Code Block: Creates and populates the #temp_FirstDate_LastDate table with an into clause in a select statement trailing a CTE (NumberedRuns). This table has two critical functions.
- First, it assigns a RunGroup identifier number to each set of consecutive rows with a CloseAboveEMA column value of TRUE.
- Second, it stores the first date and last date for each of these consecutive rowsets in the FinancialData table by Ticker and RunGroup.
Third Code Block: Creates and populates the #temp_FirstClose_LastClose table. This code block performs an analogous pair of functions to the second code block – except this code block is for processing beginning and ending close values instead of beginning and ending dates for consecutive sets of rows with CloseAboveEMA column values of TRUE. While the name of the CTEs in the second and third code blocks is identical, the result sets returned by each CTE are different because the code for each CTE is different. The #temp_FirstClose_LastClose table stores the first and last close value for each distinct Ticker and RunGroup pair.
Fourth Code Block: Joins the #temp_FirstDate_LastDate and #temp_FirstClose_LastClose tables by RunGroup and Ticker to compute the close change value for each distinct pair of Ticker and RunGroup values in the two temp tables.

First Code Block:

-- T-SQL for conditionally dropping #temp_FirstDate_LastDate and
-- #temp_FirstClose_LastClose
IF OBJECT_ID('tempdb..#temp_FirstDate_LastDate') IS NOT NULL
BEGIN
    DROP TABLE #temp_FirstDate_LastDate;
END
 
IF OBJECT_ID('tempdb..#temp_FirstClose_LastClose') IS NOT NULL
BEGIN
   DROP TABLE #temp_FirstClose_LastClose;
END
go

Second Code Block:

-- T-SQL code adapted from Microsoft Copilot
-- returns FirstDate and LastDate by Ticker and RunGroup
-- in a temp table named #temp_FirstDate_LastDate 
WITH NumberedRuns AS (
    SELECT
        Ticker,
        [Date],
        CloseAboveEMA,
        ROW_NUMBER() OVER (PARTITION BY Ticker ORDER BY [Date]) -
        ROW_NUMBER() OVER (PARTITION BY Ticker, CloseAboveEMA ORDER BY [Date]) AS RunGroup
    FROM dbo.FinancialData
)
SELECT
    Ticker,
    RunGroup,
    MIN([Date]) AS FirstDate,
    MAX([Date]) AS LastDate
into #temp_FirstDate_LastDate
FROM NumberedRuns
WHERE CloseAboveEMA = 'TRUE'
GROUP BY Ticker, RunGroup
ORDER BY Ticker, RunGroup;
 
-- optionally echo values in #temp_FirstDate_LastDate
-- select * from #temp_FirstDate_LastDate order by Ticker, RunGroup

Third Code Block:

-- T-SQL code adapted from Microsoft Copilot
-- returns FirstClose and LastClose by Ticker and RunGroup
-- in a temp table named #temp_FirstClose_LastClose 
 
WITH NumberedRuns AS (
    SELECT
        Ticker,
        [Date],
        [Close],
        CloseAboveEMA,
        ROW_NUMBER() OVER (PARTITION BY Ticker ORDER BY [Date]) -
        ROW_NUMBER() OVER (PARTITION BY Ticker, CloseAboveEMA ORDER BY [Date]) AS RunGroup
    FROM dbo.FinancialData
)
SELECT DISTINCT
    Ticker,
    RunGroup,
    FIRST_VALUE([Close]) OVER (PARTITION BY Ticker, RunGroup ORDER BY [Date]) AS FirstClose,
    LAST_VALUE([Close]) OVER (
        PARTITION BY Ticker, RunGroup 
        ORDER BY [Date] 
        ROWS BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
    ) AS LastClose
into #temp_FirstClose_LastClose
FROM NumberedRuns
WHERE CloseAboveEMA = 'TRUE'
ORDER BY Ticker, RunGroup;
 
-- optionally echo values in #temp_FirstClose_LastClose
-- select * from #temp_FirstClose_LastClose order by Ticker, RunGroup

Fourth Code Block:

-- custom code to join the two temp tables
-- to display matched contents from
-- #temp_FirstDate_LastDate and #temp_FirstClose_LastClose
-- plus a computed column named [Close Change]
 
select 
 #temp_FirstDate_LastDate.Ticker
,#temp_FirstDate_LastDate.RunGroup
,#temp_FirstDate_LastDate.FirstDate
,#temp_FirstDate_LastDate.LastDate
,#temp_FirstClose_LastClose.FirstClose
,#temp_FirstClose_LastClose.LastClose
,#temp_FirstClose_LastClose.LastClose
 -
 #temp_FirstClose_LastClose.FirstClose [Close Change]
from #temp_FirstDate_LastDate
join #temp_FirstClose_LastClose
on #temp_FirstDate_LastDate.Ticker =
   #temp_FirstClose_LastClose.Ticker and
   #temp_FirstDate_LastDate.RunGroup =
   #temp_FirstClose_LastClose.RunGroup

The following pair of screenshots displays two excerpts for the result set displayed by the fourth script. The top excerpt is for the first 10 rows for the SPXL ticker, and the bottom excerpt is for the last 10 rows for the SPY ticker. Each row in the result set is for a distinct ticker-RunGroup pair.

The Close Change column values are for assessing the profitability of owning a ticker within a RunGroup. As you can see, the overwhelming majority of the Close Change column values are positive. This means the analytical model is generally successful at finding profitable times to own the SPXL and SPY tickers. This tip is meant as a demonstration of backtesting. More testing will likely be appropriate for important business analytical models.

Next Steps

This tip introduces the basics of backtesting for an analytical model implemented and evaluated in SQL Server with T-SQL. The case study presented in this tip demonstrates a variety of T-SQL processing techniques, making it a valuable resource for practical learning.

Be sure to bookmark this tip for future reference, as it provides clear examples of how to:

Generate T-SQL code with Microsoft Copilot.
Use T-SQL statements such as create table, alter table/alter column, bulk insert, alter table/add constraint.
Create and process a computed column.
Create CTEs and use them with trailing select statements and an into clause to create and populate temp tables.
Identify consecutive sets of rows in an underlying dataset with identifiers.
Join temp tables and compute new columns from the joined result set.

Finally, it may also be beneficial to evaluate the model created and backtested in this tip for more tickers and/or different timeframes. This last step can help you get a feel for the robustness of the model covered in this tip as well as familiarize you with the model’s code in ways that are likely to suggest alternative model designs that are worthy of examination.

Check out these other T-SQL articles.

Resources

Community

Subscribe