Migrate Key-Value Pairs from Python Dictionary to SQL Server via JSON


By:   |   Updated: 2021-06-22   |   Comments (1)   |   Related: More > T-SQL


Problem

This tutorial will demonstrate how to migrate via JSON key-value pairs from a Python dictionary object to a SQL Server table. It will provide an in-depth review of how the key-value pairs are originally stored in Python, transformed to JSON, and exported to a file along with a use case example that reinforces how to import JSON data from a file and display the data within SQL Server.

Solution

JSON (JavaScript Object Notation) is a popular open file format for exchanging data between applications as well as receiving data from a server. Because of the widespread adoption of JSON for exchanging data, it is increasingly important for SQL Server administrators to become familiar with how to retrieve and save JSON data for input to SQL Server tables. Also, SQL Server administrators may participate in projects for receiving content in JSON data files originating from other applications, such as Yahoo Finance. This tip presents a use case example for retrieving key-value paired data from Yahoo Finance with Python in order to export the key-value pairs to a file with a JSON data format for retrieval and saving in a SQL Server table.

Summary of the use case

Python is especially prized for its ability to interface with various application programming interfaces, such as those from Yahoo Finance for financial data and the National Oceanic and Atmospheric Administration for historical weather data. A recent MSSQLTips.com tip includes coverage of the yfinance library for use with Python. The yfinance library for Python can facilitate downloading historical price and volume data as well as fundamental data, such as number of employees, total public shares available for trading (float), and the number of shares shorted in a security. The fundamental data is available for each security via a Python dictionary object.

The dictionary object is for storing ordered key-value pairs in a way that is proprietary to Python. While the format for Python dictionary objects is similar to JSON, the formats for data dictionary objects and JSON are not identical. Furthermore, because JSON is not a native Python format, Python code that links to remote application programming interfaces tend to download data to dictionary objects or other common Python data objects. For example, the yfinance library facilitates downloading fundamental metrics for a financial security to a dictionary object, which stores data in key-value pairs.

This tip shows how to track the number or percentage of short shares for a financial security, such as a company’s stock shares.

  • When a stock trader initiates a short trade, the trader essentially borrows a number of shares of a financial security at a borrow price (p1).
  • A stock trader closes a short trade at some point in the future by buying the same number of shares for the same security at a buy price (p2) and then returns the borrowed shares to the lender, who is usually a stockbroker.
  • So long as p2 is less than p1 plus the cost of maintaining the short trade, then the trader can earn a profit.
  • On the other hand, as p2 rises above p1 plus the cost of maintaining the short trade, the trader incurs a loss, which grows as p2 exceeds p1 plus the cost of the maintaining the short trade.
  • As the current price for a security grows progressively larger, the cost for maintaining a short trade can increase, which adds to the size of the potential loss when the shares are bought for return to the lender.
  • If you are interested in learning more about short trades, Wikipedia offers more details for you to start your learning curve.

The important point about short sales for this tip is that they reflect sentiment among stock traders about a financial security. As the number of short shares or the percent of short shares relative to total number of shares available to traders is large, then the community of stock traders has a low sentiment about the security. As the number of short trades approaches zero, then the sentiment among traders for the future of the security grows less negative. Therefore, by tracking the number or percentage of short shares for a financial security, you can track trader sentiment about the financial security’s future price.

The yfinance library for Python allows you to track short shares for a financial security as reported by Yahoo Finance. The number of short sales is sometimes referred to as a fundamental feature for a financial security. Python can use the yfinance library to track around 100 different fundamental features for a financial security. The features are tracked with the help of a Python dictionary object. The dictionary object tracks fundamental features with key-value pairs. The key name reflects a specific fundamental feature, such as short sales. The key value stores a measurement for the key, such as the number of shorted shares for a financial security. Because a dictionary object can store many different key-value pairs, it can store many fundamental features about a financial security.

Python steps for exporting JSON files with key-value pairs from a dictionary object

This section of the tip displays and describes a Python script for populating two .json files. Each file contains fundamental data on one member of the pair of stocks. The next section in this tip reviews selected output from the script in order to clarify further how the code works. The next-to-last section of this tip presents and describes T-SQL code for importing into SQL Server the two files exported by the script in this section.

The Python script resides in an IDLE window. IDLE downloads automatically when you install Python from Python.org. IDLE is one of several developer environments for writing, saving, and running Python scripts. The script’s filename (get_fundamental_data.py) appears at the top of the screen shot below. SQL Server professionals may find it useful to think of IDLE for Python code in the same way that think of SSMS for T-SQL code. Comment lines in the script segment the code into seven steps. Each step starts with a marker for a comment line (#).

  • The first step includes two library declarations that contain objects with methods used within the script. It also includes a declaration for a Python package.
    • The yfinance library is an external library associated with an alias name (yf). One method in this library can populate a Python dictionary object with fundamental data from Yahoo Finance. Yfinance is called an external library because it is supported by a third party that is different from Python.org.
    • The os library is an internal Python library. This tip uses the os library to manage the current working directory. There is no need to use an alias when referring to methods from an internal Python object.
    • The JSON package appears somewhat similar in functionality to an internal Python library. Within this tip, the JSON package supports the conversion of a Python dictionary object to a string with JSON formatted data. This reference provides especially clear documentation on the Python JSON package.
  • The second step invokes the Ticker method in the yf library; the name for this method is case sensitive. The method returns a ticker object.
    • The argument for the Ticker method is a symbol that is not case sensitive.
    • The return value from the method is not case sensitive either.
    • In the Python script for this tip,
      • the symbol name is typed in upper case letters (e.g., TSLA)
      • the returned ticker object is represented in lower case letters (e.g., tsla)
  • The third step invokes the info method successively for the tsla and fb ticker objects created in step2. The info method returns a Python dictionary object (for example, either tsla.info or fb.info).
    • As indicated in the prior section, a Python dictionary object is for storing an ordered set of key-value pairs.
    • Python code can iterate through the key-value pairs with the help of a for loop based on the items() method for the pairs in the dictionary object. The items() method returns a Python view object; the key-value pairs of a dictionary object are in the view object.
    • The two for loops in step 3 iterate, respectively, through the key-value pairs in the tsla.info dictionary object and the fb.info dictionary object. The purpose of the for loops is to print out the key-value pairs in each dictionary object. This allows the inspection of the key-value pairs for each dictionary object. Notice that the format for the printed key-value pairs is: key name followed by a colon followed by a value.
    • A print statement adds a carriage return and linefeed (\r\n) between the key-value pairs for each dictionary object.
    • All the print statement outputs in this script appear as log items in the IDLE parent window for the Python script. This parent window is also commonly called the Python Shell.
  • The fourth step invokes the dumps function from the JSON package. The dumps function in the JSON package converts a Python dictionary object, such as tsla.info or fb.info, to a JSON string. Within the fourth step,
    • The converted string for the tsla.info dictionary object is named tsla_jsonStr.
    • The converted string for the fb.info dictionary object is named fb_jsonStr.
  • The fifth step attempts to change the Python current working directory in order to write in step six JSON files to another specific directory (namely, C:\python_programs_output) especially named for storing output from Python programs. The fifth step uses a Python try…except statement that has a similar syntax and role to the try…catch statement in T-SQL. If the attempt to change the current working directory in the try clause of the try…except method fails for any reason, the current working directory remains unchanged. The default current working directory within the context of this tip is C:\python_programs. Print statements log all their output to the IDLE parent window for the Python script (namely, the Python shell).
  • The sixth step invokes Python open, write, and close methods for copying the two JSON strings to files in the current working directory.
    • The first invocation of the three Python methods saves the tsla_jsonStr string to the tsla.json file in the current working directory (C:\python_programs_output).
    • The second invocation of the three Python methods saves the fb_jsonStr string to the fb.json file in the current working directory.
  • Finally, step seven attempts to revert the current working directory to its default value (C:\python_programs).
Migrating_json_to_SQL_Server_fig_1

Output from the get_fundamental_data.py script

Your understanding of the Python script described in the preceding section is likely to be enhanced by reviewing selected excerpts from the script’s output.

The following screen shot is an excerpt from the top of the Python shell after running the script.

  • You can see from the first line inside the shell window that the script is run with Python version 3.8.3. Just like any application, Python is updated regularly with new features and capabilities. Therefore, you should run a script with a version that supports the features in the script.
  • The first line of output also indicates the creation date for the current Python version, namely May 13, 2020 at 22:20:19.
  • The fourth line within the shell window displays the script’s filename and path.
  • The fifth line in the window displays the output from the first print statement in the script. It indicates that some following lines will be key-value pairs for the TSLA symbol.
  • The sixth, seventh, and eighth lines of output show the first three key-value pairs for the dictionary object based on the TSLA ticker symbol. For example, the TSLA symbol is in a sector of symbols called Consumer Cyclical. This is because Tesla, Inc. manufactures electric vehicles that need to be replaced over various cycles of time.
  • The ninth line of output shows an excerpt of the string value for the longBusinessSummary key.
  • The last line of output in the screen shot below is for the companyOfficers key in the tsla.info dictionary object. Notice that it just contains a pair of square brackets ([]).
    • The JSON array object type is for an ordered collection of elements, such as the names of company officers.
    • While the companyOfficers key does not list any officer names, the key value is not empty (or unspecified) because it contains the container for an array.
Migrating_json_to_SQL_Server_fig_2

The next screen shot shows the last several key-value pairs for the tsla.info dictionary object, and the first several key-value pairs for the fb.info dictionary object. A couple of blank lines separate the last key-value pairs for the tsla.info dictionary object from the start of output for the fb.info dictionary object.

  • Several key-value pairs towards the end of the tsla.info dictionary object (lastCapGain, impliedSharesOutstanding, category, and fiveYearAverageReturn) are populated with a value of None. The None keyword within this context in Python output is highly similar in meaning to the NULL keyword in output from T-SQL code. Both keywords indicate that a value is not specified (or missing) for a field.
  • The key-value pairs for the fb.info dictionary object have the same key names and number of keys in the same order as for the tsla.info dictionary object. The screen shot below shows the FB ticker symbol is in the Communication Services sector.
  • Because not all ticker symbols have the same number of keys and key names, it may be important to group ticker symbols based on the similarity of their key-value pairs. For example, ticker symbols for company stock securities and exchange traded fund securities do not have identical key names.
Migrating_json_to_SQL_Server_fig_3

The next screen shot shows a view of the first several JSON key-value pairs in the tsla.json file. The screen shot is from the Windows Notepad application.

  • Notice key names and values for tsla.json file match those in the tsla.info dictionary object. However, the format for displaying key names and values are different. The transformation from the dictionary object format to the JSON file format is handled by the dumps function in the JSON package.
  • In the tsla.info dictionary object, the sector key name and value appear as sector : Consumer Cyclical.
  • In the tsla.json file, the sector key name and value appear as "sector" : "Consumer Cyclical"
    • JSON embraces all key names in double quotes, but Python dictionary objects do not.
    • JSON embraces string values in double quotes, but Python dictionary objects do not.
    • On the other hand, number values are not embraced in double quotes in either dictionary objects or JSON files.
Migrating_json_to_SQL_Server_fig_4

The bottom of the tsla.json file within a Windows Notepad application appears in the next screen shot.

  • The values for the impliedSharesOutstanding, category, and fiveYearAverageReturn keys have key values of null.
  • In the tsla.info dictionary object, these three keys have values of None.
  • The dumps function in the JSON package is responsible for these transformations.
Migrating_json_to_SQL_Server_fig_5

Importing and using JSON files with T-SQL

SQL Server expects JSON formatted data to be encoded in Unicode characters. However, the preceding Python script populates the tsla.json and fb.json files in ascii characters with non-ascii characters escaped; the dumps function in the Python JSON package operates this way with its default settings. The general topic of the character set output format from json.dumps() with Python has a variety of special issues that goes well beyond the scope of this tip (see more details about json.dumps() character set output format here, here, here and here).

All the characters in the tsla.json and fb.json files are ascii characters. When your data conform to this standard, then you can use the guidelines demonstrated below for importing and using JSON formatted files from Python into SQL Server. The following T-SQL script shows how to import the tsla.json file into a SQL Server temp table named #tlsa. Because the source data in the file consists of ascii characters, the content in the temp table consists of ascii characters.

The T-SQL script below uses the openrowset() function in the from clause of a pair of select statements to return the ascii characters in tsla.json and fb.json, respectively. If you seek a refresher for the openrowset function, see this prior MSSQLTips.com tip. The into clause in the select statements copies the results set from each select statement into a temp table (namely, #tsla or #fb).

The final step for displaying the full set of JSON key-value pairs depends on two remaining T-SQL statements.

  • The first statement copies the results set from a temp table into a local variable with a nvarchar data type. The local variables named @json_tsla and @json_fb have a Unicode character set because of their data type.
  • The second statement invokes the openjson() function in the from clause of a select statement. The results set from this select statement includes a separate row \ tuple for each key-value pair in the nvarchar local variables.
use JsonDataProcessing
go
 
-- populate tsla temp table from 
-- json formatted file at c:\python_programs\tsla.json
drop table if exists #tsla 
select BulkColumn
into #tsla
from openrowset (BULK 'c:\python_programs_output\tsla.json', SINGLE_CLOB) as j
 
go
-- declare @json_tsla local variable as a nvarchar data type
declare @json_tsla nvarchar(max) = (select * from #tsla)
 
-- display using openjson default schema
select *
from openjson(@json_tsla)
 
-------------------------------------------------------------------------------------------
 
-- populate fb temp table from 
-- json formatted file at c:\python_programs\fb.json
drop table if exists #fb 
select BulkColumn
into #fb
from openrowset (BULK 'c:\python_programs_output\fb.json', SINGLE_CLOB) as j
 
go
-- declare @json_fb local variable as a nvarchar data type
declare @json_fb nvarchar(max) = (select * from #fb)
 
-- display using openjson default schema
select *
from openjson(@json_fb)

The following screen shot shows the first ten rows of key-value pairs from @json_tsla and @json_fb. The rows for Tesla, Inc. appear in the top pane, and the rows for Facebook, Inc. appear in the bottom pane.

  • The output showing in both panes results from the default schema associated with the openjson function. This schema has three columns.
    • The first column is for key names.
    • The second column is for values associated with the keys.
    • The third column is a numeric indicator for the JSON data type for the value in the second column.
      • 1 is for a number data type.
      • 2 is for a string data type.
      • 4 on row 8 is for an array of ordered elements; square brackets denote an array value. The companyOfficers value on row 8 does not indicate any specific company officers. The value is not Null because the square brackets denote the JSON data type for the key’s value.
  • The type column can be especially helpful for discerning the data type for a value. For example, the values for zip and fullTimeEmployees keys appear as a sequence of digits, but the data types are different.
    • The zip key has a string data type value.
    • The fullTimeEmployees key has a number data type.
Migrating_json_to_SQL_Server_fig_6

The next screen shot displays the last ten key-value pairs for the Tesla, Inc. and Facebook, Inc. in the top and bottom panes, respectively.

  • Both panes have the same number of key-value pairs because the number of rows is 123 in both panes.
  • In these two panes, you can see that null values with the NULL keyword which is typical for T-SQL results sets from a select statement. As you can see from the third column in the results sets, a null value is associated with a type value of 0.
  • Aside from string, number, array, and null data types, there are two additional types that do not appear in the results sets for this tip.
    • 3 is for a JSON Boolean data type; this type can have values of either true or false.
    • 5 is for a JSON object data type. An object data type can contain one or more other data types that are separated by commas. The collection of elements within a JSON data type must be enclosed with braces ({}).
Migrating_json_to_SQL_Server_fig_7

You can display a more focused subset of keys across multiple symbols by using the openjson() rowset function and a with clause in a collection of select statements concatenated by union operators. The union operators form a results set with just one row per select statement but with multiple rows across the select statements. The following script illustrates the syntax for achieving this outcome.

  • The script includes two select statements – the first one is for the local variable named @json_tsla, and the second one is for the local variable named @json_fb.
  • The script also adds an identifier in the list for each select statement.
    • The identifier name is ticker.
    • The values of the ticker column are tsla for the first select statement and fb for the second select statement.
  • The union operator appears between the two select statements. Therefore, the results set from the script consists of two rows – one for each ticker column value.
-- declare @json_tsla local variable as a nvarchar data type
declare @json_tsla nvarchar(max) = (select * from #tsla)
-- declare @json_fb local variable as a nvarchar data type
declare @json_fb nvarchar(max) = (select * from #fb)
 
 
-- display a select set of json data 
-- using a custom schema
select 
 'tsla' as ticker
,*
from openjson(@json_tsla)
  with (
    shortName nvarchar(20),
    fullTimeEmployees int,
    sector nvarchar (50),
    industry nvarchar (50),
    floatShares bigint,
    sharesShort bigint,
    shortPercentOfFloat real
  )
 
union
 
-- display a select set of json data 
-- using a custom schema
select 
 'fb' as ticker
,*
from openjson(@json_fb)
  with (
    shortName nvarchar(20),
    fullTimeEmployees int,
    sector nvarchar (50),
    industry nvarchar (50),
    floatShares bigint,
    sharesShort bigint,
    shortPercentOfFloat real
  );

Here is the results set from running the preceding script. The with clause in each select statement returns just a small subset of key-value pairs.

  • As you can see in the screen shot below, there are two rows in the results set – the first one is for the fb ticker and the second one is for the tsla ticker.
  • One key reason for selecting a subset or key-value pairs is to help add multiple context indicators for each row in the results set and to make it easy to contrast the tickers in terms of investor sentiment as indicated by the shortPercentOfFloat key.
    • The row with a ticker column value of fb is for Facebook and the other row is for Tesla.
      • As can see, the shortName column indicates the name of the company represented by the ticker value.
      • The number of employees is very roughly the same (with about 1000 more full time employees for Tesla).
      • Facebook is in the Internet Content & Information industry, but Tesla is in the Auto Manufacturers industry.
    • The shortPercentOfFloat column reveals consumer sentiment towards each ticker.
      • By dividing the sharesShort column value by the floatShares column value, you can manually confirm the value appearing in the shortPercentOfFloat column.
      • The shortPercentOfFloat row value is about five times larger for Tesla than for Facebook. By this measure, Tesla has a more negative sentiment among stock traders than Facebook. The "Summary of the use case" section in this tip describes the rationale for why trader sentiment is worse for Tesla than Facebook.
      • Confirming whether the shortPercentOfFloat and the sharesShort indicate a likely decline in a ticker’s future closing price is beyond the scope of this tip. Nevertheless, some modeling and validation research may be able to uncover a relationship between shortPercentOfFloat and the likelihood of a future price decline. This type of research has the potential to provide useful guidelines about whether it might be worth taking the risk of shorting a company’s stock when the shortPercentOfFloat rises above a criterion value.
Migrating_json_to_SQL_Server_fig_8
Next Steps

This tip’s download file contains the Python and T-SQL scripts presented in this tip. If you re-run the Python script, the values for key-value pairs are likely to change from those reported in the tip because key values change on a regular if not daily basis.

There are really two parts to this tip.

  • The first part collects with a Python script some fundamental data from Yahoo Finance about a pair of stock tickers. You can easily modify the scope of the script
    • by replacing the two ticker symbols used in this tip with another pair of ticker symbols or
    • by including more than two ticker symbols
  • The second part imports into and displays in SQL Server a subset of the key-value pairs exported as an ascii file by the first part. Next steps for this second part can include the tracking of different fundamental measures than those reported on this tip.

The "Summary of the use case" and "Python steps for exporting JSON files with key-value pairs from a dictionary object" sections offer another type of next step. This tip does not necessarily apply to attributes of ticker symbols. If you are working with another type of data that exports key-value pairs for other kinds of values, then you can modify the code to work with attributes for that kind of data. For example, if you had a data source with weather observations for amounts of snow, rain, temperature, and wind speed from different weather stations in different geographic locales, then you could document weather station attributes in a dictionary object. Possible weather station attributes might include state, county, and city jurisdiction names as well as latitude, longitude, and hemisphere for the weather stations.



Last Updated: 2021-06-22


get scripts

next tip button



About the author
MSSQLTips author Rick Dobson Rick Dobson is a Microsoft Certified Technical Specialist and well accomplished SQL Server and Access author.

View all my tips



Comments For This Article




Friday, July 16, 2021 - 10:30:33 AM - JEROME BOYER Back To Top (89018)
A well rounded step by step JSON to SQL tables. I use similar processes to harvest Web orders from a number of web sites. My approach is to use a SQLCLR for the harvest, returning JSON data as a string to a SQL table, thus managing all code from a series of Stored Procedures.
Your tip would enhance the BOL on JSON -- still a bit terse. Your Python/JSON/SQL is very helpful and will clarify some mysteries on how to handle this ubiquitous (new) string format.
Thank you.


download





Recommended Reading

SQL Server Cursor Example

Using MERGE in SQL Server to insert, update and delete at the same time

Rolling up multiple rows into a single row and column for SQL Server data

SQL Server Loop through Table Rows without Cursor

SQL Server DROP TABLE IF EXISTS Examples














get free sql tips
agree to terms