Download and Analyze Stooq Historical Stock Price and Volume Data in SQL Server


By:   |   Updated: 2021-09-02   |   Comments   |   Related: More > Import and Export


Problem

Show how to use Stooq for downloading daily historical price and volume data for financial securities to SQL Server. Pay special attention to two sub-topics: the Stooq user interface and the T-SQL code for transferring Stooq data into SQL Server. Also, please address how to confirm the data transferred from Stooq to SQL Server validly.

Solution

Stooq is an interesting alternative to the manual Yahoo Finance historical data downloading feature and the Pandas datareader used in combination with the Yahoo Finance application programming interface for downloading historical price and volume data to SQL Server. All three options are available without charge in contrast to other data providers who supply historical stock price and volume data for a fee.

  • Yahoo Finance offers a manual interface for downloading historical stock prices from its site. This manual interface works for one stock symbol at a time, and it offers Yahoo Finance the opportunity to expose users to the advertisements at its site.
  • The Pandas datareader works from within Python so that you can programmatically control the symbols and date ranges for which data get stored in SQL Server. The Pandas datareader returns historical price and volume data from a programmatic interface maintained by Yahoo Finance. This feature is especially handy when you seek to download historical stock prices for more than one or two stock symbols at a time.
  • Stooq is a site managed by a Polish firm, which became operational more recently than either Yahoo Finance or the Pandas datareader for Python. Despite Stooq’s Polish origin, historical stock prices in US dollars can be downloaded with symbols that are familiar to US stock traders and analysts. Stooq does not offer programmatic access to its treasure trove of historical prices, but it is easy to download several directories of zipped csv files with data for all the symbols you are likely to need.

This tip offers step-by-step instructions on how to access Stooq. Additionally, the tip provides several examples of how to download from Stooq csv files with daily historical price and volume stock data. The tip demonstrates the use of a very simple T-SQL technique for importing downloaded csv files into a SQL Server database.

Why learn about Stooq?

You would not need to learn about Stooq if Yahoo Finance offered through its manual interface the ability to download data for hundreds or thousands of stock symbols at a time. However, the Yahoo Finance manual interface only allows the downloading of historical data for one symbol at a time. Additionally, the Yahoo Finance manual interface requires you to reset the symbol and date range every time you attempt to download a batch of historical data for a symbol.

The Pandas datareader when used with the Python programming language can allow the programmatic download of data for many stock symbols in a single batch of data. Because the specifications for a set of symbols can be stored and edited in a program, there is no need to manually set the specifications for each new download unless you wish to collect data for a fresh set of symbols or date ranges.

  • Unfortunately, the connection between Pandas datareader and the Yahoo Finance application programming interface is not always operational. For example, as of the time that this tip is being prepared (July, 2021), the connection is not functioning. Over the years that MSSQLTips.com demonstrated how to program the Pandas datareader through Python, there have been at least a couple of outages that lasted for as long as a month or longer. Fixes were released for the outages from previous occasions, and that may happen this time as well.
  • Besides the stability of the connection, there is also the matter of having to program the download of historical data via Python. Some business professionals and SQL developers may not prefer to use Python just to obtain historical price and volume data for stocks and other financial securities.

Stooq solves all the problems itemized in this section. With Stooq,

  • A single download allows you to obtain historical price and volume data on over 11,000 security symbols
  • There is no need for Python programming
  • At least as of the current Pandas datareader outage for historical price and volume data, Stooq is fully operational

Learning the Stooq interface

The following screen shot shows a browser with the settings for an initial connection to the Stooq site for historical price and volume data. There are three critical elements to the interface. These elements are highlighted in the following screen shot.

  • First, the address box contains "stooq.com/db/h".
  • Second, the text Historical Data is selected and displays with a highlighted background just above a table of links. In the table of links,
    • The "Daily" header signifies the period of the data (daily).
    • The ""ASCII" sub-header below Daily denotes a column for downloading folders of text files. The text files within these folders have a csv format, but they have a txt file type.
    • The 333 MB link in the U.S. row within the ASCII sub-header column can be double-clicked to download a folder with sub-folders for zip (compressed) files.
Stooq_to_SQL_Server_fig_1

After clicking the 333 MB link and waiting several minutes for the transfer from Stooq to your workstation, the following screen opens in the Downloads folder.

  • The highlighted file is the name of the file downloaded from Stooq to your workstation.
  • The (7) in the filename signifies there are 6 prior versions of the d_us_txt.zip file in the Downloads folder.
  • The zip file type indicates the downloaded file is a zip (compressed) file.
Stooq_to_SQL_Server_fig_2

After selecting the zip file, use the Windows File Explorer Extract All… command to open the file path and folders in the compressed file. This may take 10 to 20 minutes depending on the amount of traffic at the Stooq server and the performance of your internet connection. By selecting the Show extracted files when complete checkbox in the Extract Compressed (Zipped) Folders dialog, you can add a path to your workstation’s Downloads folder for the extracted txt files in the compressed file from Stooq.

Stooq_to_SQL_Server_fig_3

The following screen shot shows the top-level folder extracted from the compressed file. Notice the returned file folder has the name d_us_txt (7).

Stooq_to_SQL_Server_fig_4

By clicking through the downloaded folder path, you can arrive at a set of six file folders with names appearing in the following screenshot. By right-clicking each file folder and selecting Properties, you can determine the number of files in a folder. Across all six folders, this process discovered over 11,000 files. Each file within a folder is for a stock or exchange traded fund (etf) symbol.

Stooq_to_SQL_Server_fig_5

You can drill down to the symbols for individual securities from the folders in the preceding screenshot. For example, double-clicking nasdaq etfs folder allows you to navigate to a file named of tqqq.us.txt. The tqqq.us.txt file is a csv file with price and volume data for the TQQQ symbol that seeks three times the daily performance of the NASDAQ-100 index.

Stooq_to_SQL_Server_fig_6

The following two screenshots display the first and last five rows of data for the TQQQ symbol from within a Notepad++ application.

  • Notice that the first of the following screen shots starts with a row of column headers starting with <TICKER> and running through <OPENINT>.
  • Not every column is for a classic display of price and volume data. The column headers for classic price and volume data are these.
    • TICKER is for symbol.
    • DATE is for date.
    • OPEN is for the open price on a trading date.
    • HIGH is for the high price on a trading date.
    • LOW is for the low price on a trading date.
    • CLOSE is the close price on a trading date.
    • VOL is for the trading volume of shares on a trading date.
Stooq_to_SQL_Server_fig_7
Stooq_to_SQL_Server_fig_8

As you can tell from the tabs in the preceding two screen shots, data were downloaded for six security symbols:

  • AAPL, GOOGL, and MSFT are, respectively, for the Apple, Alphabet, and Microsoft company stocks.
  • SPXL, TQQQ, AND UDOW are for three etfs.
    • SPXL seeks three times the daily return of the S&P 500 index.
    • TQQQ seeks three times the daily return of the NASDAQ-100 index.
    • UDOW seeks three times the daily return of the Dow Jones Industrial Average.

Reading the Stooq csv files into a SQL Server table

This section reviews the T-SQL code for transferring the downloaded Stooq data into SQL Server.

  • The range of dates with historical price and volume data varied by symbol. The range of dates with historical price and volume data depends on the initial date for when a security listed on an exchange.
  • No matter what the initial listing date on an exchange, this tip stores data starting at the first trading date in 2011 and running through June 30, 2021. These data will eventually be utilized in two subsequent tips that illustrate two related use cases for the data collected in this tip.

The first script segment for transferring the downloaded txt files to SQL Server appears below.

  • The full script resides in a default database named DataScience.
  • After the default database declaration, a drop table if exists statement conditionally removes the ohlcv_strings table. This table is reused for the set of rows matching each of the six symbols for which data is stored in this tip.
  • Next, a create table statement makes a fresh copy of the ohlcv_strings table. Notice that the table has more columns than those required for storing ohlcv data.
    • For example, there is a period_length column.
    • There is also an openint column which does not appear to be used in the download for this tip.
    • The ticker column is for the symbol of either a stock or etf.
  • A bulk insert statement is the last statement in the script segment.
    • The role of the statement is to transfer the contents of the aapl.us.txt file, which has a csv file format to the ohlcv_strings table. Instead of excluding unused columns in this tip, the bulk insert statement transfers all the columns from the aapl.us.txt file.
    • Within the context of this tip, the aapl.us.txt file resides in a subfolder (tickers_for_modeling_with_ohlcv) of the DataScienceSamples folder of the C drive.
    • The first row from the txt file is skipped because it contains column headers.
use DataScience
go
 
-- initialize dbo.datedimension table and dbo.stooq_prices
-- based on a symbol with data for all dates (AAPL)
-- to re-run this code after the first execution
-- delete with Object Explorer or T-SQL code these tables in this order:
-- dbo.stooq_prices followed by datedimension
 
-- read all data, except ohlcv, as strings for APPL
drop table if exists dbo.ohlcv_strings;
 
create table [dbo].[ohlcv_strings](
 ticker nvarchar(10)
,period_length nvarchar(1)
,date nvarchar(8)
,time nvarchar(8)
,[open] money
,high money
,low money
,[close] money
,volume bigint
,openint nvarchar(1)
)
 
-- bulk insert aapl.us.txt into empty [dbo].[ohlcv_strings]
bulk insert dbo.ohlcv_strings
from 'c:\DataScienceSamples\tickers_for_modeling_with_ohlcv\aapl.us.txt'
with( FORMAT='CSV' , FIRSTROW=2);

The following screenshot shows the first and last five rows from the ohlcv_strings table.

  • The first row is for the date: September 7, 1984.
  • The last row is for the date: July 16, 2021.
  • Notice these dates extend outside the date range for stored data by this tip, which starts on the first trading date in 2011 and extends through June 30, 2021.
  • A subsequent portion of the script extracts just the required dates and moves the rows to another table that stores just a subset of the full set of columns for all the symbols.
Stooq_to_SQL_Server_fig_9

The next script excerpt continues from the previous script segment, which creates and populates the ohlcv_strings table in the dbo schema of the DataScience database. The script segment below creates, populates, and configures two additional tables both of which are based to some degree on the ohlcv_strings table.

  • The datedimension table is created, populated, and configured first. This table keeps track of all the dates for which price and volume are saved.
    • The datedimension table derives all its date values from the ohlcv_strings table, which is, in turn, dependent on the aapl.us.txt file from Stooq. The results sets from a pair of select statements are concatenated with a union operator to generate the rows for the datedimension table
      • The first select statement for the datedimension rows has a criterion for all trading dates from the start of 2011 through the end of 2020.
      • The second select statement for the datedimension rows has a criterion for all trading dates from the start of 2021 through June 30, 2021.
    • Aside from the date column in the datedimension table, there are five additional columns that display the weekday name, day of the month, the month name, the month number, and the year that correspond to the date in the current row. These additional columns are meant to illustrate the kinds of columns that you may care to include in an application requiring a table like a datedimension table.
    • Two alter table statements trail the select statement to further configure the datedimension table.
      • The first alter table statement adds a non-null property to the date column. This is necessary because the date column serves as a primary key.
      • The second alter table statement adds a primary key constraint to the datedimension table based on the date column.
  • The stooq_prices table is created, populated, and configured by the select statement trailing the two alter table statements for the datedimension table within the following script segment.
    • The from clause designates the ohlcv_strings table inner joined to the datedimension table as the source for the select statement. The inner join restricts rows to the stooq_prices table to those with dates that match the dates in the datedimension table.
    • The select list items perform two key roles.
      • First, they restrict the column list to standard price and volume data.
      • The columns start with a date and a symbol.
        • Each date row value denotes a new row of time series data.
        • The symbol column is necessary because the stooq_prices table will eventually be populated for the remaining symbols that were downloaded from Stooq.com.
    • An alter table statement adds a foreign key reference in the stooq_prices table to the date column in the datedimension table.
-- create and populate a fresh copy of dbo.datedimension
-- with a non-nullable column and a primary key based on the column
drop table if exists dbo.datedimension
 
select
 [date]
,dayname
,daynumber
,monthname
,monthnumber
,year
into dbo.[datedimension]
from
(
select 
 convert(date, [date]) [date]
,datename(weekday,convert(date, [date])) dayname
,datepart(day, convert(date, [date])) daynumber
,datename(m,convert(date, [date])) monthname
,datepart(m,convert(date, [date])) monthnumber
,year(convert(date, [date])) year
from   [DataScience].[dbo].[ohlcv_strings]
where(year(date) >= 2011) and (year(date) <= 2020)
 
union
 
select 
 convert(date, [date]) [date]
,datename(weekday,convert(date, [date])) dayname
,datepart(day, convert(date, [date])) daynumber
,datename(m,convert(date, [date])) monthname
,datepart(m,convert(date, [date])) monthnumber
,year(convert(date, [date])) year
from   [DataScience].[dbo].[ohlcv_strings]
where year(date) = 2021 and month(date) <= 6
) for_datedimension
 
order by date
 
alter table [dbo].[datedimension] ALTER COLUMN [date] date NOT NULL
alter table [dbo].[datedimension]
add constraint PK_date PRIMARY KEY(date);
 
-- create and populate a fresh copy of dbo.stooq_prices
-- with a non-nullable column and a primary key based on the column
drop table if exists dbo.[stooq_prices]
select
 datedimension.date [date]
,left(ticker,CHARINDEX( '.US' , ticker)-1) symbol
,[open]
,high
,low
,[close]
,volume
into dbo.[stooq_prices]
from   [DataScience].[dbo].[ohlcv_strings]
inner join [DataScience].[dbo].[datedimension]
on [ohlcv_strings].date = [datedimension].date
 
alter table dbo.[stooq_prices]  WITH CHECK ADD  CONSTRAINT [fk_date] FOREIGN KEY([Date])
references [dbo].[datedimension]([Date])
on UPDATE CASCADE
on DELETE CASCADE

The following two results sets may help you follow the code for creating, populating, and configuring the datedimension and stooq_prices tables. The following screenshot displays the first five rows from each table.

  • The top pane shows the first five rows for the datedimension table.
  • The bottom pane shows the first five rows for the stooq_prices table.
  • Each table has 2641 rows of non-null column values for the AAPL symbol.
Stooq_to_SQL_Server_fig_10

The next script segment shows the code for adding the price and volume data for the GOOGL and MSFT symbols. This code only imports data, respectively, from the googl.us.txt and msft.us.txt files to the stooq_prices table.

  • Before bulk inserting data from either googl.us.txt or msft.us.txt into SQL Server, the code truncates the rows in the ohlcv_strings table.
  • After bulk inserting data from either googl.us.txt or msft.us.txt into SQL Server, the code runs an insert into statement based on a select statement that subsets columns from the ohlcv_strings table and transforms values for the symbol column.
  • Each of the stock symbols has 2641 rows of non-null data. This is because there is a row of price and volume data for each date associated with each stock symbol.
---------------------------------------------------------------------------------------
 
truncate table [dbo].[ohlcv_strings]
 
-- bulk insert googl.us.txt into empty [dbo].[ohlcv_strings]
BULK INSERT dbo.ohlcv_strings
FROM 'c:\DataScienceSamples\tickers_for_modeling_with_ohlcv\googl.us.txt'
WITH( FORMAT='CSV' , FIRSTROW=2);
 
-- add rows from googl.us.txt to dbo.stooq_prices
insert into dbo.stooq_prices
select
 datedimension.date [date]
,left(ticker,CHARINDEX( '.US' , ticker)-1) symbol
,[open]
,high
,low
,[close]
,volume
from   [DataScience].[dbo].[ohlcv_strings]
inner join [DataScience].[dbo].[datedimension]
on [ohlcv_strings].date = [datedimension].date
 
---------------------------------------------------------------------------------------------------
 
truncate table [dbo].[ohlcv_strings]
 
-- bulk insert msft.us.txt into empty [dbo].[ohlcv_strings]
BULK INSERT dbo.ohlcv_strings
FROM 'c:\DataScienceSamples\tickers_for_modeling_with_ohlcv\msft.us.txt'
WITH( FORMAT='CSV' , FIRSTROW=2);
 
-- add rows from msft.us.txt to dbo.stooq_prices
insert into dbo.stooq_prices
select
 datedimension.date [date]
,left(ticker,CHARINDEX( '.US' , ticker)-1) symbol
,[open]
,high
,low
,[close]
,volume
from   [DataScience].[dbo].[ohlcv_strings]
inner join [DataScience].[dbo].[datedimension]
on [ohlcv_strings].date = [datedimension].date

Notice that for the last two stock symbols (GOOGL and MSFT), the approach to transferring historical price and volume data from Stooq to SQL Server is the same, except for the Stooq filename with the historical prices. The same holds true for collecting historical price and volume data for the three etf symbols.

  • You just find the downloaded file with historical price and volume data. When looking for historical data for etfs, recall that you must search a folder with etf in its name, such as nasdaq etfs.
  • From within an etf folder, you must search for a filename that contains the symbols you want to transfer to SQL Server.
    • The historical data for two etf symbols (UDOW and SPXL) are in the nyse etf folder.
      • The filename for the UDOW symbol is udow.us.txt.
      • The filename for the SPXL symbol is spxl.us.txt.
    • The historical data for the third etf symbol (TQQQ) is in the nasdaq etfs folder. The filename for the symbol is tqqq.us.txt.

Here is a T-SQL script example for the SPXL symbol.

  • The source filename (spxl.us.txt) appears in the from clause of the bulk insert statement.
  • The table name in the insert into statement (stooq_prices) designates the SQL Server destination table for the SPXL historical price and volume data.
truncate table [dbo].[ohlcv_strings]
 
-- bulk insert spxl.us.txt into empty [dbo].[ohlcv_strings]
bulk insert dbo.ohlcv_strings
from 'c:\DataScienceSamples\tickers_for_modeling_with_ohlcv\spxl.us.txt'
with( FORMAT='CSV' , FIRSTROW=2);
 
-- add rows from spxl.us.txt to dbo.stooq_prices
insert into dbo.stooq_prices
select
 convert(date, [date]) [date]
,left(ticker,CHARINDEX( '.US' , ticker)-1) symbol
,[open]
,high
,low
,[close]
,volume
from   [DataScience].[dbo].[ohlcv_strings]
where((year([date]) >= 2011) and(year([date]) < 2022) and(month([date]) <= 6)) 
order by date

Did the data load validly from Stooq to SQL Server?

Whenever you are working with a relatively new data source as is the scenario for this tip, it is prudent to perform some unit testing to ascertain if the load went according to plan. In this tip, price and volume historical data are loaded from Stooq into SQL Server.

At a very top-line level, we can ask the question are data for six symbols in the stooq_prices table? The following query verifies the answer to this question by returning a results set that enumerates all distinct symbols in the stooq_prices table.

-- display distinct symbols in cleaned and imported data
select distinct symbol FROM [DataScience].[dbo].[stooq_prices]

Here is the results set from the preceding query. As you can see, there are six distinct symbols in the stooq_prices table within the dbo schema of the DataScience database. These symbols are the same ones for which data was loaded from downloaded Stooq files with historical price and volume data.

Stooq_to_SQL_Server_fig_11

Another unit test would be to verify that there are the same number of rows per symbol. In the context of this time series data sample, having the same number of rows per symbol means all symbols have data for the same set of dates. The following query counts the number of non-null symbol rows. Theses counts are for those rows loaded into the stooq_prices table from the downloaded Stooq data files. Recall the code for loading rows from Stooq files only gets data in the range from the first trading date in 2011 through June 30, 2021.

-- display count of time series rows by symbol
select symbol, count(*) [number of rows]
from DataScience.dbo.stooq_prices
group by symbol

Here is the results set for the count of symbol values in the stooq_prices table. The number of rows is not the same for all symbols. The stock symbols of AAPL, GOOGL, and MSFT have 2641 rows each. On the other hand, the etf symbols of SPXL, TQQQ, and UDOW have just 2640 rows each. The row counts for etf symbols is one less than the row counts for stock symbols.

With some queries and/or some visual examination of the values, it can be detected that the etf symbols do not have price and volume data for 2011-02-17 whereas the stock symbols do have price and volume data for 2011-02-17. The following query verifies the missing etf rows by showing the rows for each date from 2011-02-16 through 2011-02-18.

-- display all cleaned and imported data
-- around date (2011-02-17) with missing etf rows
select [date]
      ,[symbol]
      ,[open]
      ,[high]
      ,[low]
      ,[close]
      ,[volume]
from   [DataScience].[dbo].[stooq_prices]
where [date] >= '2011-02-16' and [date] <= '2011-02-18'
order by date,symbol

Here is the results set from the preceding query.

  • As you can see, the number of rows (6) is the same for both 2011-02-16 and 2011-02-18 for both stock and etf symbols.
  • However, the number of rows for 2011-02-17 is just 3, and these rows are exclusively for the stock symbols of AAPL, GOOGL, and MSFT.
  • Therefore, the following results set confirms that the missing rows are all for etf symbols on 2011-02-17.
Stooq_to_SQL_Server_fig_13

The preceding results set is for the cleaned and processed Stooq files, but what if there was something wrong with the cleaning or processing code? For your convenience, I am including the uncleaned and unprocessed Stooq files for the six symbols referenced in this tip’s download (see below). Therefore, you can verify for yourself the same outcome about missing data for etf symbols on 2011-02-17 holds for the uncleaned and unprocessed data. I ran the check for this tip, and they do confirm the missing symbols values are for etf files on 2011-02-17.

Reusing the script with fresh data

Those who care to re-run the code in this tip may discover that it fails on the second try. In my experience, this outcome is the result of a foreign key reference in the stooq_prices table to the datedimension table on the date column. There are multiple workarounds to this issue. Here is a short script that can be run before an attempt to re-run the T-SQL code for loading the Stooq historical data into SQL Server. By dropping the stooq_prices table before the datedimension table, the foreign key reference in the stooq_prices table to the datedimension table is removed before an attempt to drop the datedimension table. Therefore, this order of table removal (stooq_prices before datedimensiom) is necessary to avoid a foreign key reference error.

use DataScience
go
 
drop table if exists dbo.stooq_prices;
 
drop table if exists dbo.datedimension;
Next Steps

This tip’s download file contains nine files to help you get a hands-on feel for using T-SQL for downloading to SQL Server historical price and volume data from the Stooq website.

  • Six of the files are the txt files in a csv format from the Stooq website. These files contain historical price and volume data for six symbols (AAPL, GOOGL, MSFT, SPXL, TQQQ, UDOW) loaded into SQL Server by the code in this tip.
  • Three .sql file are also in the download for this tip.
    • The "create and populate datedimension and stooq_prices from stooq_dot_com.sql" file includes the T-SQL code for transferring a subset of the historical price and volume data from the Stooq website to SQL Server. Recall that the code
      • Has hard code for a default database named DataScience.
      • Also, the code expects to find the six data files from Stooq in this file path: "C:\DataScienceSamples\tickers_for_modeling_with_ohlcv". You may care to update the code to use a different path for security reasons.
    • The "unit tests for create and populate datedimension and stooq_prices.sql" file includes the T-SQL code to verify the data loaded properly.
    • The "refresh for create and populate datedimension and stooq_prices.sql" file contains a small script for dropping the foreign key reference in the stooq_prices table to the date column of the datedimension table before dropping the datedimension table. This script allows you bypass a potential error when re-running the code in "create and populate datedimension and stooq_prices from stooq_dot_com.sql" with a fresh set of data files from Stooq.

After verifying the code works as described with the sample data files, you are encouraged to use other date ranges and/or symbols than those reported on in this tip.






get scripts

next tip button



About the author
MSSQLTips author Rick Dobson Rick Dobson is a Microsoft Certified Technical Specialist and well accomplished SQL Server and Access author.

View all my tips


Article Last Updated: 2021-09-02

Comments For This Article





download














get free sql tips
agree to terms