By: Nai Biao Zhou | Comments (2) | Related: > Python
Problem
When writing Python programs to automate daily business transactions, we often need to perform database operations. There are four basic database operations: create, read, update, and delete (CRUD). Combining these four operations, which usually results in many SQL statements, can implement complex business requirements. In practice, we can either run ad-hoc SQL statements directly to perform database operations or execute stored procedures that encapsulate these SQL statements. Thus, the Python language should support the executions of both ad-hoc SQL statements and stored procedures. Many companies use Microsoft SQL Server technologies. Therefore, IT professionals working in these companies want to know how to perform database operations on Microsoft SQL Server using Python.
Solution
We use pyodbc (PYPI, 2020), an open-source Python module, to access SQL Server databases. Inspired by Mitchell’s book: Web Scraping with Python (Mitchell, 2018), we use web scraping techniques to gather data from MSSQLTips. With the collected data, we then demonstrate how a Python program performs database operations. To make the program in this article practical, let us look at a real-world project.
Articles on the site MSSQLTips contain many hyperlinks that link to internal or external web pages. As time goes by, some hyperlinks become inaccessible. This article uses web scraping techniques to find hyperlinks in all articles written by an individual author and verify each hyperlink’s accessibility. Rather than using recursive programming techniques (Zhou, 2020) to assess all hyperlinks simultaneously, we check hyperlinks from one author to another. For example, one day, we look at all hyperlinks in Greg’s articles, and on another day, we investigate all the hyperlinks in Jeremy’s articles. This strategy avoids causing performance issues on web servers.
This article goes through a Python program that provides a solution for this project. The Python program demonstrates how to make a SQL Server database connection and perform database operations. Although the program includes several user-defined Python functions for web scraping, we focus on code that performs database operations. We do not need to understand these web scraping functions to learn database operation functions. The article covers the following topics:
- 1 – Connecting to Microsoft SQL Server in Python
- 2 – Creating Records
- 3 – Reading Records
- 4 – Updating Records
- 5 – Deleting Records
We need to create a virtual environment to run the Python program used in this article. The article Learning Python in Visual Studio 2019 offers a tutorial in creating a virtual environment in Visual Studio 2019 and installing the pyodbc package into the environment. To use the web scraping techniques, we also need to install the beautifulsoup4 package (Crummy, 2020) into this environment. We can use the “requirements.txt” file, shown as follows, to create the virtual environment quickly:
beautifulsoup4==4.9.3 bs4==0.0.1 pip==20.1.1 pyodbc==4.0.30 setuptools==47.1.0 soupsieve==2.0.1
The author tested code in this article with Visual Studio 2019 (Community Edition) and Python 3.7 on Windows 10 Home 10.0 <X64>. The DBMS is Microsoft SQL Server 2019 Developer Edition (64-bit). The article uses SQL Server Management Studio v18.6 to produce an ER diagram. The Python script, shown in the following code snippet, imports several Python modules. For simplicity, the sample code snippets in the following sections do not include these import statements again.
import sys, re, pyodbc from bs4 import BeautifulSoup from datetime import date, datetime from urllib.parse import urljoin, urlparse from urllib.request import urlopen, Request, URLError, HTTPError
1 – Connecting to Microsoft SQL Server in Python
In using Python programs to perform database operations, we need to establish database connections between the programs and the SQL Server databases. The Python programs can send SQL statements to the SQL server databases and receive result sets through the database connections. To prepare for executing the SQL statements in this article, we first create a database with four tables. We then create re-usable functions to establish a database connection and close the connection. Creating a re-usable function is always the best practice so that we can use these functions later. We even can use these functions in other projects.
1.1 Preparing Database Tables for Data Collection
Many authors made contributions to MSSQLTips, and each article contains some hyperlinks to provide further information. When we work on the project, three entities are of interests: Author, Article, and Hyperlink. The relationship between the Author entity and the Article entity is one-to-many, but the relationship between the Article entity and the Hyperlink entity is many-to-many. We create an entity-relationship diagram shown in Figure 1.
Figure 1 The ER Diagram
According to the ER diagram, we create four tables using the following SQL script. We also create two stored procedures to add authors. The stored procedure "dbo.sp_add_author" takes four parameters and inserts these parameter values into the table "[dbo].[Author]." The other stored procedure, "dbo.sp_add_author2," is for illustrative purposes only. The procedure has four input parameters and one output parameter and returns a value to the caller.
USE [Demo] GO CREATE TABLE [dbo].[Author]( [Author] [nvarchar](50) NOT NULL, [AuthorProfileUrl] [nvarchar](250) NOT NULL, [AuthorSinceYear] [int] NOT NULL, [TipContribution] [int] NOT NULL, CONSTRAINT [PK_Author] PRIMARY KEY CLUSTERED([Author] ASC) ) GO CREATE TABLE [dbo].[Article]( [ArticleId] [int] IDENTITY(1,1) NOT NULL, [Title] [nvarchar](500) NOT NULL, [Author] [nvarchar](50) NOT NULL, [ArticleUrl] [nvarchar](1024) NOT NULL, [ArticleLastUpdate] [smalldatetime] NOT NULL, CONSTRAINT [PK_Article] PRIMARY KEY CLUSTERED([ArticleId] ASC), CONSTRAINT [FK_Article_Author] FOREIGN KEY([Author]) REFERENCES [dbo].[Author]([Author]) ) GO CREATE TABLE [dbo].[Hyperlink]( [HyperlinkId] [int] IDENTITY(1,1) NOT NULL, [LinkTitle] [nvarchar](1024) NOT NULL, [HyperlinkUrl] [nvarchar](1024) NOT NULL, [StatusCode] [nchar](10) NOT NULL, [ResponseMessage] [nvarchar](max) NOT NULL, [VisitedDate] [smalldatetime] NOT NULL, CONSTRAINT [PK_Hyperlink] PRIMARY KEY CLUSTERED([HyperlinkId] ASC) ) GO CREATE TABLE [dbo].[HyperlinkIdInArticle]( [ArticleId] [int] NOT NULL, [HyperlinkId] [int] NOT NULL, CONSTRAINT [PK_HyperlinkIdInArticle] PRIMARY KEY CLUSTERED([ArticleId] ASC,[HyperlinkId] ASC), CONSTRAINT [FK_HyperlinkIdInArticle_Article] FOREIGN KEY([ArticleId]) REFERENCES [dbo].[Article]([ArticleId]), CONSTRAINT [FK_HyperlinkIdInArticle_Hyperlink] FOREIGN KEY([HyperlinkId]) REFERENCES [dbo].[Hyperlink]([HyperlinkId]) ) GO CREATE PROCEDURE [dbo].[sp_add_author] @Author nvarchar(50), @AuthorProfileUrl nvarchar(250), @AuthorSinceYear int, @TipContribution int AS BEGIN SET NOCOUNT ON; INSERT INTO [dbo].[Author] ([Author],[AuthorProfileUrl],[AuthorSinceYear],[TipContribution]) VALUES (@Author,@AuthorProfileUrl,@AuthorSinceYear,@TipContribution) END GO CREATE PROCEDURE [dbo].[sp_add_author2] @Author nvarchar(50), @AuthorProfileUrl nvarchar(250), @AuthorSinceYear int, @TipContribution int, @RowCountVar int output AS BEGIN DECLARE @ErrorVar int; SET NOCOUNT ON; INSERT INTO [dbo].[Author] ([Author],[AuthorProfileUrl],[AuthorSinceYear],[TipContribution]) VALUES (@Author,@AuthorProfileUrl,@AuthorSinceYear,@TipContribution) SELECT @ErrorVar = @@ERROR ,@RowCountVar = @@ROWCOUNT; -- Test the error value. IF @ErrorVar <> 0 RETURN -1; ELSE RETURN 0; END GO
1.2 ODBC Driver for SQL Server
Python programs can use an ODBC driver to access data in database management systems (DBMS). Microsoft has distributed several ODBC drivers for SQL Server. We use the driver, {ODBC Driver 17 for SQL Server}, that supports SQL server 2008 through 2019. To verify whether a computer has the driver installed, we can navigate to “Administrative Tools -> ODBC Data Sources (64-bit).” Double-click on the “ODBC Data Sources (64-bit)” item. The “ODBC Data Source Administrator” window appears, and we switch to the “Drivers” tab. We should see the driver in the list, as shown in Figure 2. If the computer does not install the driver, we can download the installer from Microsoft Download Center.
Figure 2 Check the Installed ODBC Driver
1.3 Establishing a Connection to the SQL Server Database
We must connect to a SQL Server instance and then perform database operations. We need to know the driver name, server instance name, database name, user id, and password to make the connection. With this information, we can then construct a connection string. According to Kleehammer, the connection string should have this format (Kleehammer, 2020):
DRIVER={ODBC Driver 17 for SQL Server};SERVER=test;DATABASE=test;UID=user;PWD=pwd
Once we construct an ODBC connection string, we can pass the connection string to the pyodbc.connection() method to establish a connection. The method takes the input connection string and returns a connection object responsible for connecting to the database, transferring the database information, handling rollbacks, and creating new cursor objects. In practice, we should always close the connection as soon as we complete the database operations. Connection leaks may occur if we do not close a database connection. We can use the following two functions to create a connection and close a connection. Since we may use variables to construct a connection string, we prefer the syntax in the “createDBConnection” function.
# Create a SQL Server database connection. def createDBConnection(server, database, user, password): ''' Take inputs server instance name, database name, username and password Return a SQL Server database connection ''' connection = pyodbc.connect(Driver="{ODBC Driver 17 for SQL Server}", Server=server, Database=database, UID=user, PWD=password) return connection # Close the database connection def closeDBConnection(connection): '''Take input connection and close the database connection''' try: connection.close() except pyodbc.ProgrammingError: pass
1.4 Using the fetchval() Method to Select a Single Value
Database programming commonly uses the connection/cursor model (Mitchell, 2018). After establishing a connection to the SQL Server database, we can create a cursor object to execute SQL statements, track the connection state, and travel over result sets. The connection object provides a method, cursor(), to create a cursor object. The cursor object offers a method, execute(), to execute SQL statements. We use the cursor object’s fetchval() method to get the first column of the first row if the SQL statements return a result set. The method returns None if the query has an empty result set.
The following code block demonstrates the syntax of using the fetchval() method. By calling the user-defined Python functions defined in Section 1.3, we create a connection. We then create a cursor object to run the query that checks the number of rows in the “[dbo].[Author]” table. Since we have not added any data into the database yet, the program should return 0. By the end of this step, The Python program will have successfully connected to the SQL Server database and read data from the database table.
# The entry point of the program if __name__ == "__main__": # Specify server name, database, user and password server = '192.168.2.17' database = 'Demo' user = 'user' password = 'password' connection = createDBConnection(server,database,user,password) cursor = connection.cursor() # Execute the query cursor.execute('SELECT COUNT(*) FROM [dbo].[Author]') # Access the result set count = cursor.fetchval() print('The number of authors in the database is {0}'.format(count)) cursor.close() closeDBConnection(connection)
1.5 Using the execute() Method to Execute SQL Statements
In Section 1.4, we used the cursor object to perform the fetch operation from the result set. We can also use the execute() method to execute SQL statements that do not return any result. We created four tables in Section 1.1. During the testing phase, we frequently need to clear data in the testing database. We want to create a Python helper function that deletes all data in the database and resets the identity column values. The following function serves this requirement. Since the execute() method can perform multiple database operations, it is worth noting that the connection object manages database transactions but not the cursor object (Erskine et al., 2020).
# Execute multiple SQL statements def clearDatabase(connection): '''Take input connection and delete all data in the database''' cursor = connection.cursor() # Execute the SQL statements cursor.execute('''DELETE FROM [dbo].[HyperlinkIdInArticle] DELETE FROM [dbo].[Hyperlink] DBCC CHECKIDENT ('dbo.Hyperlink', RESEED, 0) DELETE FROM [dbo].[Article] DBCC CHECKIDENT ('dbo.Article', RESEED, 0) DELETE FROM [dbo].[Author]''') connection.commit() # Close the cursor cursor.close()
2 – Creating Records
We used the cursor object’s execute() method to execute SQL statements. If we pass INSERT statements to the function as the follows, we can add records to a database table:
cursor.execute(''' INSERT INTO [dbo].[Author] ([Author],[AuthorProfileUrl],[AuthorSinceYear],[TipContribution]) VALUES ('Jeremy Kadlec','https://.../38/jeremy-kadlec/',2006 ,237) ''') # Call commit() method to save changes to the database connection.commit()
We often use the parameterized SQL statements in practice. The parameters protect the program from SQL injection. The syntax looks like the follows:
cursor.execute(''' INSERT INTO [dbo].[Author] ([Author],[AuthorProfileUrl],[AuthorSinceYear],[TipContribution]) VALUES(?,?,?,?)''','Greg Robidoux','https://.../37/greg-robidoux/',2006 ,185) # Call commit() method to save changes to the database connection.commit()
The previous Python code passed one set of parameters to the execute() method. The program used the set of parameters to construct one data row. When we have multiple rows, we can use the executemany() method. The method takes one parameter that must be a sequence of sequences or a generator of sequences (Erskine et al., 2020). The following code shows how we use this method. The cursor attribute “fast_executemany” defaults to False. We can use the SQL Server profiler to compare the executions when assigning different values to the attribute.
params = [ ('Jeremy Kadlec','https://.../38/jeremy-kadlec/',2006,237), ('Greg Robidoux','https://.../37/greg-robidoux/',2006,185), ] cursor.fast_executemany = True cursor.executemany(''' INSERT INTO [dbo].[Author] ([Author],[AuthorProfileUrl],[AuthorSinceYear],[TipContribution]) VALUES(?,?,?,?)''', params)
We can also use stored procedures to add new records to the database. When calling a stored procedure that takes only input parameters, the syntax should look like the following code block:
params = ('Jeremy Kadlec','https://.../38/jeremy-kadlec/',2006,237) cursor.execute("{CALL dbo.sp_add_author (?,?,?,?)}", params) connection.commit()
2.1 Calling the Stored Procedure to Insert a Data Row
Using stored procedures has many benefits (Stein et al., 2017). We can call stored procedures in Python programs. To demonstrate this technique, we extract author information from the MSSQLTips Authors page and then save the information into the database by calling a stored procedure. The following user-defined Python function finds the author’s profile page, the year the author started to contribute, and the number of tip contributions. The function needs a screen-scraping library, beautifulsoup4 (Crummy, 2020).
# Search author's profile def findAuthor(author, authorsPageUrl): ''' Take input author name and authors' page URL Return the individual profile page URL, the participation year, and the number of tips ''' req = Request(authorsPageUrl) req.add_header('User-Agent', 'Mozilla/5.0') response = urlopen(req) bsObj = BeautifulSoup(response, 'html.parser') authorLink = bsObj.find('a', href = True, text = author) authorProfileUrl = urljoin(authorsPageUrl,authorLink.get('href')) authorSinceYear = authorLink.parent.parent.strong.get_text() tipContribution = authorLink.parent.parent.strong.find_next('strong').get_text() return authorProfileUrl, authorSinceYear, tipContribution
After obtaining the author’s information from the web page, we use a Python function (shown as follows) to call the stored procedure “dbo.sp_add_author” and add the information to the database table. The stored procedure takes four parameters. When a procedure does not take any parameter, we can use this syntax: cursor.execute(‘{CALL procedure_name}’). The following Python user-defined function calls the stored procedure to create a new row in the database table. Python programs can call stored procedures that perform other database operations.
# Create an author row in the database table def saveAuthor(connection, author, authorProfileUrl, authorSinceYear, tipContribution): ''' Take input database connection and profile data Save author profile into the table [dbo].[Author] ''' cursor = connection.cursor() params = (author, authorProfileUrl, authorSinceYear, tipContribution) cursor.execute("{CALL dbo.sp_add_author (?,?,?,?)}", params) connection.commit() cursor.close()
In practice, a stored procedure may have output parameters and return values. As of this writing, pyodbc does not implement a method that allows us to access the output parameters and return values. Thompson provides a workaround for retrieving these values (Thompson et al., 2018). The solution is to use ad-hoc SQL statements to execute the stored procedure and then select these values. The fetchone() method used in the following block returns the next row in the result set or None when no more data is available (Erskine et al., 2020).
# Call a stored procedure with output parameters and return values def saveAuthor2(connection, author, authorProfileUrl, authorSinceYear, tipContribution): ''' Take input database connection and author profile data Save author profile data into the table [dbo].[Author] ''' cursor = connection.cursor() params = (author, authorProfileUrl, authorSinceYear, tipContribution) cursor.execute('''SET NOCOUNT ON; DECLARE @retValue int, @rowCountVar int; EXEC @retValue = dbo.sp_add_author2 ?,?,?,?, @rowCountVar OUTPUT; SELECT @retValue, @rowCountVar;''', params) row = cursor.fetchone() connection.commit() # For illustrative purposes only print('Return Value: {0}, Output Value: {1}'.format(row[0], row[1])) cursor.close()
2.2 Using the Cursor Object’s executemany () Method to Insert Multiple Rows
When we need to insert multiple data rows into the database, we can use the cursor object’s Executemany() method. Erskine discussed the method behavior in more depth (Erskine et al., 2020). We use this method to insert article data into the database table “dbo.Article.” The author’s profile page includes a list of all his/her articles. We use the following user-defined function to read all article titles and URLs. We pass a lambda function to the findAll() method to select all anchor tags pointing to the articles.
# Find all author's articles on the profile page def findAuthorArticles(author, authorProfileUrl): ''' Takes author name and author's profile page url Return a list of articles contributed by a specific author ''' rtnList = list() req = Request(authorProfileUrl) req.add_header('User-Agent', 'Mozilla/5.0') response = urlopen(req) bsObj = BeautifulSoup(response, "html.parser") tags = bsObj.findAll(lambda tag: tag.name == 'a' and len(tag.attrs) == 1 and 'href' in tag.attrs and tag.attrs['href'].startswith('/sqlservertip/')) for tag in tags: m = re.search('[\d]{1,2}/[\d]{1,2}/[\d]{4}', tag.next_sibling) articleLastUpdate = datetime.strptime(m.group(0), '%m/%d/%Y') title = tag.get_text() articleUrl = urljoin(authorProfileUrl, tag.get('href')) rtnList.append((title, author, articleUrl, articleLastUpdate,)) return rtnList
The Python user-defined function returns a list of tuples containing article titles, URLs, author names, and last update dates. We can then use the executemany () method to insert this list into the database. To perform the INSERT operation, we pass the list to the method. The following script demonstrates how the syntax should look:
# Save multiple articles into the database def saveAuthorArticles(connection, articles): '''Takes input database connection and a list of tuples containing article data''' cursor = connection.cursor() cursor.fast_executemany = True cursor.executemany(''' INSERT INTO [dbo].[Article]([Title],[Author],[ArticleUrl],[ArticleLastUpdate]) VALUES(?,?,?,?)''',articles) connection.commit() cursor.close()
2.3 Using the Cursor Object’s execute () Method to Insert Rows
When inserting a row to a database table, we may also need to implement other business logic. For example, we check if the row is in the database already. We can wrap all business logic into a stored procedure, then use Python to call the stored procedure as we did in Section 2.1. On certain occasions though, we want to embed SQL statements in Python code and use the cursor object’s execute method to execute multiple SQL statements.
To demonstrate executing complex SQL statements, we use web scraping techniques to gather data from MSSQLTips and save the data into the database. Inspired by Mitchell’s code (Mitchell, 2018), we extract all hyperlinks in an article. Using the top-down approach, we divide the task to find all hyperlinks into two sub-tasks: find all internal links and find all external links.
# Web Scraping with Python, Second Edition by Ryan Mitchell (O’Reilly). # Copyright 2018 Ryan Mitchell, 978-1-491-998557-1. # Find all internal hyperlinks in an article def findInternalLinks(bsObj, articleUrl): ''' Takes a BeautifulSoup object and the article URL Return a set of internal hyperlinks in the article ''' internalLinks = set() siteUrl = urlparse(articleUrl).scheme+'://'+urlparse(articleUrl).netloc #Finds all links that begin with a "/" links = bsObj.article.findAll('a', href=re.compile('^(/|'+siteUrl+')')) for link in links: if(link.attrs['href'].startswith('/cdn-cgi/l/email-protection')): pass elif (link.attrs['href'].startswith('/')): internalLinks.add((link.get_text(), urljoin(articleUrl, link.attrs['href']))) else: internalLinks.add((link.get_text(), link.attrs['href'])) return internalLinks # Find all external hyperlinks in an article def findExternalLinks(bsObj, articleUrl): ''' Takes a BeautifulSoup object and the article URL Return a set of external hyperlinks in the article ''' externalLinks = set() siteDomain = urlparse(articleUrl).netloc #Finds all links that start with "http" or "www" that do #not contain the current domain links = bsObj.article.findAll('a', href=re.compile('^(http|https|www)((?!'+siteDomain+').)*$')) for link in links: externalLinks.add((link.get_text(), link.attrs['href'])) return externalLinks # Find all hyperlinks in an article def findAllHyperLinks(articleUrl): ''' Takes an article URL Return a set of hyperlinks in the article ''' req = Request(articleUrl) req.add_header('User-Agent', 'Mozilla/5.0') response = urlopen(req) bsObj = BeautifulSoup(response, 'html.parser') # Decompose the task to smaller tasks internalLinks = findInternalLinks(bsObj, articleUrl) externalLinks = findExternalLinks(bsObj, articleUrl) return externalLinks.union(internalLinks)
When we have these hyperlink data, we want to add them into the database table “dbo.Hyperlink.” Since there is a many-to-many relationship between articles and hyperlinks, we also need to add the relationship into the table “dbo. HyperlinkIdInArticle.” The SQL statements passed to the execute method include SELECT statements and INSERT statements. The following Python user-defined function use pyodbc to interface with the database.
# Save all hyperlink data into the database def saveArticleHyperLinks(connection, articleUrl, hyperlinks): ''' Take input database connection, article URL and a list of hyperlinks. Insert data and their relationships into the database tables ''' cursor = connection.cursor() for link in hyperlinks: cursor.execute(''' DECLARE @ArticleUrl nvarchar(1024) = ? ,@LinkTitle nvarchar(1024) = ? ,@HyperlinkUrl nvarchar(1024) = ? ,@ArticleId int = 0 ,@HyperlinkId int = 0 SELECT @ArticleId = [ArticleId] FROM [dbo].[Article] WHERE [ArticleUrl] = @ArticleUrl SELECT @HyperlinkId = [HyperlinkId] FROM [dbo].[Hyperlink] WHERE [HyperlinkUrl] = @HyperlinkUrl AND [LinkTitle] = @LinkTitle IF NOT EXISTS (SELECT * FROM HyperlinkIdInArticle WHERE ArticleId = @ArticleId AND HyperlinkId = @HyperlinkId) BEGIN INSERT INTO [dbo].[Hyperlink] ([LinkTitle] ,[HyperlinkUrl] ,[StatusCode] ,[ResponseMessage] ,[VisitedDate]) VALUES (@LinkTitle ,@HyperlinkUrl ,'N/A' ,'N/A' ,'1900-01-01') SELECT @HyperlinkId = MAX(HyperlinkId) FROM [dbo].[Hyperlink] INSERT INTO [dbo].[HyperlinkIdInArticle] ([ArticleId] ,[HyperlinkId]) VALUES (@ArticleId ,@HyperlinkId) END ''',articleUrl, link[0], link[1]) connection.commit() cursor.close()
3 – Reading Data
When we need to pull data out of databases, we can use the cursor execute() method. The execute() method prepares, and executes SQL statements, then returns the cursor object itself. When passing SELECT statements to the method, we can retrieve data from databases if the statements return row objects. The row objects are similar to tuples, but we can access column values by name. Section 1.4 used the cursor object's fetchval() method to read a single value from the database. This section introduces three other fetch methods: fetchall(), fetchone(), and fetchmany(size).
3.1 The fetchall() Method
The fetchall() method fetches all remaining rows in a result set. The method returns an empty list when no more data is available (Vishal, 2020). Using this method can free up a cursor object and a database connection before processing the data. However, this method consumes more memory than other methods because we store all row objects in memory. The following Python function calls this method to read all hyperlinks that an author referenced in his/her articles.
# Find all hyperlinks an author referenced def findAllHyperlinkUrlbyAuthor(connection,author): ''' Take input author name, return a list of row objects. ''' cursor = connection.cursor() # Execute the query cursor.execute('''SELECT distinct h.HyperlinkUrl FROM [Demo].[dbo].[Article] a INNER JOIN [dbo].[HyperlinkIdInArticle] ha ON a.ArticleId = ha.ArticleId INNER JOIN [dbo].[Hyperlink] h ON ha.HyperlinkId = h.HyperlinkId WHERE [Author] = ?''',author) # All rows are stored in memory rows = cursor.fetchall() # Close the cursor cursor.close() return rows
3.2 The fetchone() Method
The fetchone() method returns the next row in a result set or None if there is no record to fetch. Therefore, we can use a loop to iterate over the result set and process all data. The following Python function demonstrates the syntax of reading data from the database. The while loop stops when the fetchone() method reaches the end of the result set. In each iteration, we access a column value via the column index or the column name. The Python function is for illustrative purposes only. If we want to store all rows in memory, we prefer the fetchall() method.
# Find all hyperlinks an author referenced and process each row data def findAllHyperlinkUrlbyAuthor2(connection, author): ''' Take input author name, process data in each row, return a list of row objects. ''' rows = list() cursor = connection.cursor() # Execute the query cursor.execute('''SELECT h.HyperlinkId, h.HyperlinkUrl FROM [Demo].[dbo].[Article] a INNER JOIN [dbo].[HyperlinkIdInArticle] ha ON a.ArticleId = ha.ArticleId INNER JOIN [dbo].[Hyperlink] h ON ha.HyperlinkId = h.HyperlinkId WHERE [Author] = ?''',author) while True: # Access the query result row = cursor.fetchone() if not row: break # Returns None when all remaining rows have been retrieved. hyperlinkId = row[0] # Access by column index (zero-based) hyperlinkUrl = row.HyperlinkUrl # Access by name rows.append(row) # Process data print('HyperlinkId: {0} URL: {1}'.format(hyperlinkId, hyperlinkUrl)) # Close the cursor cursor.close() return rows
3.3 The fetchmany(size) Method
If fetching data row by row is not efficient and loading all rows into memory is not feasible, the cursor object provides another method fetchmany(size). The method fetches the next set of rows of a result set. The size argument determines the number of rows be retrieved. Therefore, the method works in the same way as the fetchone() method when the size is 1. When there are no remaining rows in the result set, the method returns an empty list. The following Python function fetches five rows at once. The function is for demonstration purposes only. If we want to return all rows to a function caller, we may prefer the fetchall() method.
# Find all hyperlinks an author referenced and process row objects in batches def findAllHyperlinkUrlbyAuthor3(connection, author): ''' Take input author name, process row objects in batches, return a list of objects. ''' rows = list() cursor = connection.cursor() # Execute the query cursor.execute('''SELECT h.HyperlinkId, h.HyperlinkUrl FROM [Demo].[dbo].[Article] a INNER JOIN [dbo].[HyperlinkIdInArticle] ha ON a.ArticleId = ha.ArticleId INNER JOIN [dbo].[Hyperlink] h ON ha.HyperlinkId = h.HyperlinkId WHERE [Author] = ?''',author) while True: # Access the query result records = cursor.fetchmany(5) # The maximum number of rows in records is 5 if not records: break for row in records: hyperlinkId = row[0] # Access by column index (zero-based) hyperlinkUrl = row.HyperlinkUrl # Access by name rows.append(row) print('HyperlinkId: {0} URL: {1}'.format(hyperlinkId, hyperlinkUrl)) print('Loaded {0} rows from the database to a batch'.format(len(records))) # Close the cursor cursor.close() return rows
3.4 The nextset() Method
When a SQL script contains multiple SELECT statements, the execution of the script returns multiple result sets. We use the nextset() method to move the cursor to the next available result set. If there are no more result sets, the method returns None. We call the fetch methods to retrieve rows from the available result set. The following Python function executes three SELECT statements to get a summary of three tables; therefore, there should be three results sets:
# Get table summary from the database def getSummary(connection): '''Take input connection and retrieve summary information about the three tables''' cursor = connection.cursor() # Execute the SQL statements cursor.execute(''' SELECT [Author],[AuthorSinceYear],[TipContribution] FROM [dbo].[Author] SELECT YEAR([ArticleLastUpdate]) AS [YEAR], COUNT(*) AS SubTotal FROM [dbo].[Article] GROUP BY YEAR([ArticleLastUpdate]) SELECT COUNT(*) AS TotalLinks FROM [dbo].[Hyperlink] ''') print('Fetch rows from first set') rows=cursor.fetchall() for row in rows: print('Author: {0} Since Year: {1} Contributions: {2}'.format(row[0],row[1],row[2])) print('Fetch rows from second set') if cursor.nextset(): # fetch rows from next set, discarding first set rows=cursor.fetchall() for row in rows: print('Year: {0} Contributions Subtotal: {1}'.format(row[0], row[1])) print('Fetch rows from third set') if cursor.nextset(): # fetch a single value from next set count = cursor.fetchval() print('Total links: {0}'.format(count)) # Close the cursor cursor.close()
4 – Updating Records
When passing an UPDATE statement to the execute() method, we can update data in a database. Section 3.2 retrieves all hyperlinks in every article of a particular author. We then use the Python program to make HTTP requests. Web servers release status codes in response to the HTTP requests. We determine the hyperlink accessibilities according to these status codes and other response messages and update the database. The following Python function can evaluate the accessibility of a hyperlink:
# Access an Internet address to check the accessibility def testHyperlinkAccessibility(url): '''Take input URL, and return the status code, response message and the date visited.''' statusCode, responseMessage = 'N/A', 'N/A' try: req = Request(url) req.add_header('User-Agent', 'Mozilla/5.0') response = urlopen(req) except HTTPError as e: statusCode = e.code except URLError as e: responseMessage = str(e.reason) except: responseMessage = sys.exc_info()[1] else: statusCode = response.getcode() finally: visitedDate = date.today() return statusCode, responseMessage, visitedDate
When knowing the accessibility of a hyperlink, we update the accessibility of the hyperlink in the database table accordingly. We first construct the UPDATE statement. We then pass the SQL statement to the execute() method, as shown in the following Python function. We also access the number of updated rows through the cursor “rowcount” attribute.
# Update the accessibility of an Internet address def updateHyperlinkAccessibility(connection,hyperlinkUrl, statusCode, responseMessage, visitedDate): ''' Take input database connection, URL, status code, response message and date visited. Update the row in the database table [dbo].[Hyperlink] ''' cursor = connection.cursor() # Execute the query cursor.execute('''UPDATE [dbo].[Hyperlink] SET [StatusCode] = ? ,[ResponseMessage] = ? ,[VisitedDate] = ? WHERE HyperlinkUrl = ?''',statusCode, responseMessage, visitedDate, hyperlinkUrl) print('Number of rows updated: ', cursor.rowcount) connection.commit() # Close the cursor cursor.close()
5 – Deleting Records
Deleting records work in the same ways as updating records. When passing DELETE statements to the execute() method, we can perform delete operations via Python programs. We use the “rowcount” attribute to find the number of rows deleted. After finding inaccessible hyperlinks in the articles, we can remove the hyperlinks from the articles. To make the data in the database reflect the changes, we need to delete the relationship in the table [dbo].[HyperlinkIdInArticle]. The following Python function implements this requirement.
# Helper function to delete a row in a database table def deleteHyperlinkIdInArticle(connection, articleId, hyperlinkId): '''Take input database connection, articleId and hyperlinkId. Delete the row accordingly''' cursor = connection.cursor() # Execute the query cursor.execute('''DELETE FROM [dbo].[HyperlinkIdInArticle] WHERE [ArticleId] = ? and [HyperlinkId] = ?''',articleId, hyperlinkId) print('Number of rows deleted: ', cursor.rowcount) connection.commit() # Close the cursor cursor.close()
6 – The Complete Source Code
The Python program scans all hyperlinks in articles and finds inaccessible URLs. Instead of traveling over the entire website, we process articles by one author at once. Therefore, when initializing variables, we should give an author name. After gathering data from the MSSQLTips website, the program performs CRUD operations on the Microsoft SQL Server database using the Python language. The program walks through eight steps:
- Initialize variables including the SQL Server database information, author name, and the MSSQLTips Authors' URL;
- Remove all testing data in the database;
- Extract the author's summary information and save it into the database table;
- Find a list of author's articles and store the list into the database table;
- Discover all hyperlinks in articles and save them into the database;
- Read all URLs from the database into an in-memory list;
- Iterate over the URL list, evaluate the accessibility of each URL, and update the accessibility in the database accordingly;
- Print out a summary report;
Click here for the complete code. To run the program, we should change the values of these variables: server, database, user, and password.
Summary
In everyday work practices, we often need to perform database operations. There are four basic database operations: create, read, update, and delete (CRUD). This article used the pyodbc module to perform Microsoft SQL Server database operations. The article also adopted web-scraping techniques to gather data from the MSSQLTips website. We used Microsoft Visual Studio 2019 to edit and test the Python program.
We created a database with four tables. We determined how to use the driver, {ODBC Driver 17 for SQL Server}. With known server instance name, database name, and user confidential information; we connected to the database from the Python program. We scratched the surface of the connection/cursor model. The cursor object provided the execute() method that can run SQL statements.
We explored three ways to create records: calling stored procedures, using the executemany() method, and employing the execute() method. We covered five cursor object methods to read data: fetchval(), fetchone(), fetchall(), fetchmany(), and nextset(). Updating and deleting both work the same way. We passed UPDATE or DELETE statements to the execute() method to perform the update or the delete operation. Finally, we provided the complete source code.
References
Crummy (2020). Beautiful Soup. https://www.crummy.com/software/BeautifulSoup/.
Erskine, K. & Kleehammer, M. (2020). Cursor. https://github.com/mkleehammer/pyodbc/wiki/Cursor.
Kleehammer, M (2020). Connecting to SQL Server from Windows. https://github.com/mkleehammer/pyodbc/wiki/Connecting-to-SQL-Server-from-Windows
Mitchell, R. (2018). Web Scraping with Python, 2nd Edition. O'Reilly Media.
PYPI (2020). DB API Module for ODBC: pyodbc 4.0.30. https://pypi.org/project/pyodbc/.
Stein, S., Coulter, D., Wales, C., Parente, J., Mabee, D., Ray., M., Milener, G., Guyer, C. & Cai, S. (2017). Stored Procedures (Database Engine). https://docs.microsoft.com/en-us/sql/relational-databases/stored-procedures/stored-procedures-database-engine?view=sql-server-ver15#benefits-of-using-stored-procedures.
Thompson, G. & Kleehammer, M. (2018). Calling Stored Procedures. https://github.com/mkleehammer/pyodbc/wiki/Calling-Stored-Procedures.
Vishal (2020). Python cursor’s fetchall, fetchmany(), fetchone() to read records from database table. https://pynative.com/python-cursor-fetchall-fetchmany-fetchone-to-read-rows-from-table/.
Zhou, N. (2020). Recursive Programming Techniques using Python. https://www.mssqltips.com/sqlservertip/6549/recursion-in-python/.
Next Steps
- The author covered most SQL Server database operations used in practice. However, the article did not include everything about database operations. For example, the article did not introduce how we explicitly declare the types and sizes of query parameters (Erskine et al., 2020). The example code in this article demonstrated the basic syntax of using these execute and fetch methods. However, the source code did not handle exceptions. Some functions such as findAllHyperlinkUrlbyAuthor2, findAllHyperlinkUrlbyAuthor3, and saveAuthor2 were for illustrative purposes only. We would not write such functions in practice. Therefore, there is much room for improvement in the sample code.
- Check out these related tips:
- Learning Python in Visual Studio 2019
- Python Programming Tutorial with Top-Down Approach
- Recursive Programming Techniques using Python
- Beginners Guide to Python for SQL Server Professionals
- Web Screen Scraping with Python to Populate SQL Server Tables
- SQL Server RegEx Capabilities Using Python
- Linear Regression with Python in SQL Server 2017
- Exploratory Data Analysis with Python in SQL Server 2017
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips