CRUD Operations in SQL Server using Python


By:   |   Updated: 2021-01-11   |   Comments (2)   |   Related: More > Python


Problem

When writing Python programs to automate daily business transactions, we often need to perform database operations. There are four basic database operations: create, read, update, and delete (CRUD). Combining these four operations, which usually results in many SQL statements, can implement complex business requirements. In practice, we can either run ad-hoc SQL statements directly to perform database operations or execute stored procedures that encapsulate these SQL statements. Thus, the Python language should support the executions of both ad-hoc SQL statements and stored procedures. Many companies use Microsoft SQL Server technologies. Therefore, IT professionals working in these companies want to know how to perform database operations on Microsoft SQL Server using Python.

Solution

We use pyodbc (PYPI, 2020), an open-source Python module, to access SQL Server databases. Inspired by Mitchell’s book: Web Scraping with Python (Mitchell, 2018), we use web scraping techniques to gather data from MSSQLTips. With the collected data, we then demonstrate how a Python program performs database operations. To make the program in this article practical, let us look at a real-world project.

Articles on the site MSSQLTips contain many hyperlinks that link to internal or external web pages. As time goes by, some hyperlinks become inaccessible. This article uses web scraping techniques to find hyperlinks in all articles written by an individual author and verify each hyperlink’s accessibility. Rather than using recursive programming techniques (Zhou, 2020) to assess all hyperlinks simultaneously, we check hyperlinks from one author to another. For example, one day, we look at all hyperlinks in Greg’s articles, and on another day, we investigate all the hyperlinks in Jeremy’s articles. This strategy avoids causing performance issues on web servers.

This article goes through a Python program that provides a solution for this project. The Python program demonstrates how to make a SQL Server database connection and perform database operations. Although the program includes several user-defined Python functions for web scraping, we focus on code that performs database operations. We do not need to understand these web scraping functions to learn database operation functions. The article covers the following topics:

We need to create a virtual environment to run the Python program used in this article. The article Learning Python in Visual Studio 2019 offers a tutorial in creating a virtual environment in Visual Studio 2019 and installing the pyodbc package into the environment. To use the web scraping techniques, we also need to install the beautifulsoup4 package (Crummy, 2020) into this environment. We can use the “requirements.txt” file, shown as follows, to create the virtual environment quickly:

 
beautifulsoup4==4.9.3
bs4==0.0.1
pip==20.1.1
pyodbc==4.0.30
setuptools==47.1.0
soupsieve==2.0.1
 

The author tested code in this article with Visual Studio 2019 (Community Edition) and Python 3.7 on Windows 10 Home 10.0 <X64>. The DBMS is Microsoft SQL Server 2019 Developer Edition (64-bit). The article uses SQL Server Management Studio v18.6 to produce an ER diagram. The Python script, shown in the following code snippet, imports several Python modules. For simplicity, the sample code snippets in the following sections do not include these import statements again.

 
import sys, re, pyodbc
from bs4 import BeautifulSoup
from datetime import date, datetime
from urllib.parse import urljoin, urlparse
from urllib.request import urlopen, Request, URLError, HTTPError
 

1 – Connecting to Microsoft SQL Server in Python

In using Python programs to perform database operations, we need to establish database connections between the programs and the SQL Server databases. The Python programs can send SQL statements to the SQL server databases and receive result sets through the database connections. To prepare for executing the SQL statements in this article, we first create a database with four tables. We then create re-usable functions to establish a database connection and close the connection. Creating a re-usable function is always the best practice so that we can use these functions later. We even can use these functions in other projects.

1.1 Preparing Database Tables for Data Collection

Many authors made contributions to MSSQLTips, and each article contains some hyperlinks to provide further information. When we work on the project, three entities are of interests: Author, Article, and Hyperlink. The relationship between the Author entity and the Article entity is one-to-many, but the relationship between the Article entity and the Hyperlink entity is many-to-many. We create an entity-relationship diagram shown in Figure 1.

diagram

Figure 1 The ER Diagram

According to the ER diagram, we create four tables using the following SQL script. We also create two stored procedures to add authors. The stored procedure "dbo.sp_add_author" takes four parameters and inserts these parameter values into the table "[dbo].[Author]." The other stored procedure, "dbo.sp_add_author2," is for illustrative purposes only. The procedure has four input parameters and one output parameter and returns a value to the caller.

USE [Demo]
GO
 
CREATE TABLE [dbo].[Author](
   [Author] [nvarchar](50) NOT NULL,
   [AuthorProfileUrl] [nvarchar](250) NOT NULL,
   [AuthorSinceYear] [int] NOT NULL,
   [TipContribution] [int] NOT NULL,
 CONSTRAINT [PK_Author] PRIMARY KEY CLUSTERED([Author] ASC)
) 
GO
CREATE TABLE [dbo].[Article](
   [ArticleId] [int] IDENTITY(1,1) NOT NULL,
   [Title] [nvarchar](500) NOT NULL,
   [Author] [nvarchar](50) NOT NULL,
   [ArticleUrl] [nvarchar](1024) NOT NULL,
   [ArticleLastUpdate] [smalldatetime] NOT NULL,
 CONSTRAINT [PK_Article] PRIMARY KEY CLUSTERED([ArticleId] ASC),
 CONSTRAINT [FK_Article_Author] FOREIGN KEY([Author]) REFERENCES [dbo].[Author]([Author])
)
GO
CREATE TABLE [dbo].[Hyperlink](
   [HyperlinkId] [int] IDENTITY(1,1) NOT NULL,
   [LinkTitle] [nvarchar](1024) NOT NULL,
   [HyperlinkUrl] [nvarchar](1024) NOT NULL,
   [StatusCode] [nchar](10) NOT NULL,
   [ResponseMessage] [nvarchar](max) NOT NULL,
   [VisitedDate] [smalldatetime] NOT NULL,
 CONSTRAINT [PK_Hyperlink] PRIMARY KEY CLUSTERED([HyperlinkId] ASC)
)
GO
CREATE TABLE [dbo].[HyperlinkIdInArticle](
   [ArticleId] [int] NOT NULL,
   [HyperlinkId] [int] NOT NULL,
   CONSTRAINT [PK_HyperlinkIdInArticle] PRIMARY KEY CLUSTERED([ArticleId] ASC,[HyperlinkId] ASC),
   CONSTRAINT [FK_HyperlinkIdInArticle_Article] FOREIGN KEY([ArticleId]) 
REFERENCES [dbo].[Article]([ArticleId]),
   CONSTRAINT [FK_HyperlinkIdInArticle_Hyperlink] FOREIGN KEY([HyperlinkId]) 
REFERENCES [dbo].[Hyperlink]([HyperlinkId])
)
GO
CREATE PROCEDURE [dbo].[sp_add_author]
   @Author nvarchar(50), 
   @AuthorProfileUrl nvarchar(250),
   @AuthorSinceYear int,
   @TipContribution int
AS
BEGIN
   SET NOCOUNT ON;
 
   INSERT INTO [dbo].[Author]
           ([Author],[AuthorProfileUrl],[AuthorSinceYear],[TipContribution])
       VALUES
           (@Author,@AuthorProfileUrl,@AuthorSinceYear,@TipContribution)
END
GO
CREATE PROCEDURE [dbo].[sp_add_author2]
   @Author nvarchar(50), 
   @AuthorProfileUrl nvarchar(250),
   @AuthorSinceYear int,
   @TipContribution int,
   @RowCountVar int output
AS
BEGIN
   DECLARE @ErrorVar int;  
   SET NOCOUNT ON;
   INSERT INTO [dbo].[Author]
           ([Author],[AuthorProfileUrl],[AuthorSinceYear],[TipContribution])
       VALUES
           (@Author,@AuthorProfileUrl,@AuthorSinceYear,@TipContribution)
   SELECT @ErrorVar = @@ERROR  
      ,@RowCountVar = @@ROWCOUNT;  
   -- Test the error value. 
   IF @ErrorVar <> 0   
      RETURN -1;   
   ELSE  
      RETURN 0;  
END
GO

1.2 ODBC Driver for SQL Server

Python programs can use an ODBC driver to access data in database management systems (DBMS). Microsoft has distributed several ODBC drivers for SQL Server. We use the driver, {ODBC Driver 17 for SQL Server}, that supports SQL server 2008 through 2019. To verify whether a computer has the driver installed, we can navigate to “Administrative Tools -> ODBC Data Sources (64-bit).” Double-click on the “ODBC Data Sources (64-bit)” item. The “ODBC Data Source Administrator” window appears, and we switch to the “Drivers” tab. We should see the driver in the list, as shown in Figure 2. If the computer does not install the driver, we can download the installer from Microsoft Download Center.

odbc connection

Figure 2 Check the Installed ODBC Driver

1.3 Establishing a Connection to the SQL Server Database

We must connect to a SQL Server instance and then perform database operations. We need to know the driver name, server instance name, database name, user id, and password to make the connection. With this information, we can then construct a connection string. According to Kleehammer, the connection string should have this format (Kleehammer, 2020):

DRIVER={ODBC Driver 17 for SQL Server};SERVER=test;DATABASE=test;UID=user;PWD=pwd

Once we construct an ODBC connection string, we can pass the connection string to the pyodbc.connection() method to establish a connection. The method takes the input connection string and returns a connection object responsible for connecting to the database, transferring the database information, handling rollbacks, and creating new cursor objects. In practice, we should always close the connection as soon as we complete the database operations. Connection leaks may occur if we do not close a database connection. We can use the following two functions to create a connection and close a connection. Since we may use variables to construct a connection string, we prefer the syntax in the “createDBConnection” function.

# Create a SQL Server database connection. 
def createDBConnection(server, database, user, password): 
    ''' 
    Take inputs server instance name, database name, username and password 
    Return a SQL Server database connection 
    ''' 
    connection = pyodbc.connect(Driver="{ODBC Driver 17 for SQL Server}", 
                            Server=server, 
                            Database=database, 
                            UID=user, 
                            PWD=password) 
    return connection 
  
# Close the database connection 
def closeDBConnection(connection): 
    '''Take input connection and close the database connection''' 
    try: 
        connection.close() 
    except pyodbc.ProgrammingError: 
        pass 

1.4 Using the fetchval() Method to Select a Single Value

Database programming commonly uses the connection/cursor model (Mitchell, 2018). After establishing a connection to the SQL Server database, we can create a cursor object to execute SQL statements, track the connection state, and travel over result sets. The connection object provides a method, cursor(), to create a cursor object. The cursor object offers a method, execute(), to execute SQL statements. We use the cursor object’s fetchval() method to get the first column of the first row if the SQL statements return a result set. The method returns None if the query has an empty result set.

The following code block demonstrates the syntax of using the fetchval() method. By calling the user-defined Python functions defined in Section 1.3, we create a connection. We then create a cursor object to run the query that checks the number of rows in the “[dbo].[Author]” table. Since we have not added any data into the database yet, the program should return 0. By the end of this step, The Python program will have successfully connected to the SQL Server database and read data from the database table.

# The entry point of the program 
if __name__ == "__main__": 
    # Specify server name, database, user and password 
    server = '192.168.2.17' 
    database = 'Demo' 
    user = 'user' 
    password = 'password' 
  
    connection = createDBConnection(server,database,user,password) 
    cursor = connection.cursor() 
    # Execute the query 
    cursor.execute('SELECT COUNT(*) FROM [dbo].[Author]') 
    # Access the result set 
    count = cursor.fetchval() 
    print('The number of authors in the database is {0}'.format(count)) 
    cursor.close() 
    closeDBConnection(connection) 

1.5 Using the execute() Method to Execute SQL Statements

In Section 1.4, we used the cursor object to perform the fetch operation from the result set. We can also use the execute() method to execute SQL statements that do not return any result. We created four tables in Section 1.1. During the testing phase, we frequently need to clear data in the testing database. We want to create a Python helper function that deletes all data in the database and resets the identity column values. The following function serves this requirement. Since the execute() method can perform multiple database operations, it is worth noting that the connection object manages database transactions but not the cursor object (Erskine et al., 2020).

# Execute multiple SQL statements 
def clearDatabase(connection): 
    '''Take input connection and delete all data in the database''' 
    cursor = connection.cursor() 
    # Execute the SQL statements 
    cursor.execute('''DELETE FROM [dbo].[HyperlinkIdInArticle] 
                      DELETE FROM [dbo].[Hyperlink] 
                      DBCC CHECKIDENT ('dbo.Hyperlink', RESEED, 0) 
                      DELETE FROM [dbo].[Article] 
                      DBCC CHECKIDENT ('dbo.Article', RESEED, 0)  
                      DELETE FROM [dbo].[Author]''') 
    connection.commit() 
    # Close the cursor 
    cursor.close() 

2 – Creating Records

We used the cursor object’s execute() method to execute SQL statements. If we pass INSERT statements to the function as the follows, we can add records to a database table:

    cursor.execute(''' 
        INSERT INTO [dbo].[Author] 
               ([Author],[AuthorProfileUrl],[AuthorSinceYear],[TipContribution]) 
         VALUES 
               ('Jeremy Kadlec','https://.../38/jeremy-kadlec/',2006 ,237) 
        ''') 
    # Call commit() method to save changes to the database 
    connection.commit() 

We often use the parameterized SQL statements in practice. The parameters protect the program from SQL injection. The syntax looks like the follows:

    cursor.execute(''' 
        INSERT INTO [dbo].[Author] 
               ([Author],[AuthorProfileUrl],[AuthorSinceYear],[TipContribution]) 
         VALUES(?,?,?,?)''','Greg Robidoux','https://.../37/greg-robidoux/',2006 ,185) 
    # Call commit() method to save changes to the database 
    connection.commit() 

The previous Python code passed one set of parameters to the execute() method. The program used the set of parameters to construct one data row. When we have multiple rows, we can use the executemany() method. The method takes one parameter that must be a sequence of sequences or a generator of sequences (Erskine et al., 2020). The following code shows how we use this method. The cursor attribute “fast_executemany” defaults to False. We can use the SQL Server profiler to compare the executions when assigning different values to the attribute.

    params = [ 
        ('Jeremy Kadlec','https://.../38/jeremy-kadlec/',2006,237), 
        ('Greg Robidoux','https://.../37/greg-robidoux/',2006,185), 
        ] 
    cursor.fast_executemany = True 
    cursor.executemany(''' 
        INSERT INTO [dbo].[Author] 
            ([Author],[AuthorProfileUrl],[AuthorSinceYear],[TipContribution]) 
        VALUES(?,?,?,?)''', params) 

We can also use stored procedures to add new records to the database. When calling a stored procedure that takes only input parameters, the syntax should look like the following code block:

    params = ('Jeremy Kadlec','https://.../38/jeremy-kadlec/',2006,237) 
    cursor.execute("{CALL dbo.sp_add_author (?,?,?,?)}", params) 
    connection.commit() 

2.1 Calling the Stored Procedure to Insert a Data Row

Using stored procedures has many benefits (Stein et al., 2017). We can call stored procedures in Python programs. To demonstrate this technique, we extract author information from the MSSQLTips Authors page and then save the information into the database by calling a stored procedure. The following user-defined Python function finds the author’s profile page, the year the author started to contribute, and the number of tip contributions. The function needs a screen-scraping library, beautifulsoup4 (Crummy, 2020).

# Search author's profile 
def findAuthor(author, authorsPageUrl): 
    ''' 
    Take input author name and authors' page URL 
    Return the individual profile page URL, the participation year, and the number of tips 
    ''' 
    req = Request(authorsPageUrl) 
    req.add_header('User-Agent', 'Mozilla/5.0') 
    response = urlopen(req) 
    bsObj = BeautifulSoup(response, 'html.parser') 
    authorLink = bsObj.find('a', href = True, text = author) 
    authorProfileUrl = urljoin(authorsPageUrl,authorLink.get('href')) 
    authorSinceYear = authorLink.parent.parent.strong.get_text() 
    tipContribution = authorLink.parent.parent.strong.find_next('strong').get_text() 
    return authorProfileUrl, authorSinceYear, tipContribution 

After obtaining the author’s information from the web page, we use a Python function (shown as follows) to call the stored procedure “dbo.sp_add_author” and add the information to the database table. The stored procedure takes four parameters. When a procedure does not take any parameter, we can use this syntax: cursor.execute(‘{CALL procedure_name}’). The following Python user-defined function calls the stored procedure to create a new row in the database table. Python programs can call stored procedures that perform other database operations.

# Create an author row in the database table 
def saveAuthor(connection, author, authorProfileUrl, authorSinceYear, tipContribution): 
    ''' 
    Take input database connection and profile data 
    Save author profile into the table [dbo].[Author] 
    ''' 
    cursor = connection.cursor() 
    params = (author, authorProfileUrl, authorSinceYear, tipContribution) 
    cursor.execute("{CALL dbo.sp_add_author (?,?,?,?)}", params) 
    connection.commit() 
    cursor.close() 

In practice, a stored procedure may have output parameters and return values. As of this writing, pyodbc does not implement a method that allows us to access the output parameters and return values. Thompson provides a workaround for retrieving these values (Thompson et al., 2018). The solution is to use ad-hoc SQL statements to execute the stored procedure and then select these values. The fetchone() method used in the following block returns the next row in the result set or None when no more data is available (Erskine et al., 2020).

# Call a stored procedure with output parameters and return values 
def saveAuthor2(connection, author, authorProfileUrl, authorSinceYear, tipContribution): 
    ''' 
    Take input database connection and author profile data 
    Save author profile data into the table [dbo].[Author] 
    ''' 
    cursor = connection.cursor() 
    params = (author, authorProfileUrl, authorSinceYear, tipContribution) 
    cursor.execute('''SET NOCOUNT ON; 
                        DECLARE @retValue int, @rowCountVar int; 
                        EXEC @retValue = dbo.sp_add_author2 ?,?,?,?, @rowCountVar OUTPUT; 
                        SELECT @retValue, @rowCountVar;''', params) 
    row = cursor.fetchone()   
    connection.commit() 
    # For illustrative purposes only 
    print('Return Value: {0}, Output Value: {1}'.format(row[0], row[1])) 
    cursor.close() 

2.2 Using the Cursor Object’s executemany () Method to Insert Multiple Rows

When we need to insert multiple data rows into the database, we can use the cursor object’s Executemany() method. Erskine discussed the method behavior in more depth (Erskine et al., 2020). We use this method to insert article data into the database table “dbo.Article.” The author’s profile page includes a list of all his/her articles. We use the following user-defined function to read all article titles and URLs. We pass a lambda function to the findAll() method to select all anchor tags pointing to the articles.

# Find all author's articles on the profile page 
def findAuthorArticles(author, authorProfileUrl): 
    ''' 
    Takes author name and author's profile page url 
    Return a list of articles contributed by a specific author 
    ''' 
    rtnList = list() 
    req = Request(authorProfileUrl) 
    req.add_header('User-Agent', 'Mozilla/5.0') 
    response = urlopen(req) 
    bsObj = BeautifulSoup(response, "html.parser") 
    tags = bsObj.findAll(lambda tag: tag.name == 'a' 
                         and len(tag.attrs) == 1 
                         and 'href' in tag.attrs 
                         and tag.attrs['href'].startswith('/sqlservertip/')) 
    for tag in tags: 
        m = re.search('[\d]{1,2}/[\d]{1,2}/[\d]{4}', tag.next_sibling) 
        articleLastUpdate = datetime.strptime(m.group(0), '%m/%d/%Y') 
        title = tag.get_text() 
        articleUrl = urljoin(authorProfileUrl, tag.get('href')) 
        rtnList.append((title, author, articleUrl, articleLastUpdate,))    
    return rtnList 

The Python user-defined function returns a list of tuples containing article titles, URLs, author names, and last update dates. We can then use the executemany () method to insert this list into the database. To perform the INSERT operation, we pass the list to the method. The following script demonstrates how the syntax should look:

# Save multiple articles into the database 
def saveAuthorArticles(connection, articles): 
    '''Takes input database connection and a list of tuples containing article data''' 
    cursor = connection.cursor() 
    cursor.fast_executemany = True 
    cursor.executemany(''' 
        INSERT INTO [dbo].[Article]([Title],[Author],[ArticleUrl],[ArticleLastUpdate]) 
        VALUES(?,?,?,?)''',articles) 
    connection.commit() 
    cursor.close() 

2.3 Using the Cursor Object’s execute () Method to Insert Rows

When inserting a row to a database table, we may also need to implement other business logic. For example, we check if the row is in the database already. We can wrap all business logic into a stored procedure, then use Python to call the stored procedure as we did in Section 2.1. On certain occasions though, we want to embed SQL statements in Python code and use the cursor object’s execute method to execute multiple SQL statements.

To demonstrate executing complex SQL statements, we use web scraping techniques to gather data from MSSQLTips and save the data into the database. Inspired by Mitchell’s code (Mitchell, 2018), we extract all hyperlinks in an article. Using the top-down approach, we divide the task to find all hyperlinks into two sub-tasks: find all internal links and find all external links.

# Web Scraping with Python, Second Edition by Ryan Mitchell (O’Reilly). 
# Copyright 2018 Ryan Mitchell, 978-1-491-998557-1. 
# Find all internal hyperlinks in an article 
def findInternalLinks(bsObj, articleUrl): 
    ''' 
    Takes a BeautifulSoup object and the article URL 
    Return a set of internal hyperlinks in the article 
    ''' 
    internalLinks = set() 
    siteUrl = urlparse(articleUrl).scheme+'://'+urlparse(articleUrl).netloc 
    #Finds all links that begin with a "/" 
    links = bsObj.article.findAll('a', href=re.compile('^(/|'+siteUrl+')')) 
    for link in links: 
        if(link.attrs['href'].startswith('/cdn-cgi/l/email-protection')): 
            pass 
        elif (link.attrs['href'].startswith('/')): 
            internalLinks.add((link.get_text(), urljoin(articleUrl, link.attrs['href']))) 
        else: 
            internalLinks.add((link.get_text(), link.attrs['href'])) 
    return internalLinks 
  
# Find all external hyperlinks in an article 
def findExternalLinks(bsObj, articleUrl): 
    ''' 
    Takes a BeautifulSoup object and the article URL 
    Return a set of external hyperlinks in the article 
    ''' 
    externalLinks = set() 
    siteDomain = urlparse(articleUrl).netloc 
    #Finds all links that start with "http" or "www" that do 
    #not contain the current domain 
    links = bsObj.article.findAll('a', href=re.compile('^(http|https|www)((?!'+siteDomain+').)*$')) 
    for link in links: 
        externalLinks.add((link.get_text(), link.attrs['href'])) 
    return externalLinks 
  
# Find all hyperlinks in an article 
def findAllHyperLinks(articleUrl): 
    ''' 
    Takes an article URL 
    Return a set of hyperlinks in the article 
    ''' 
    req = Request(articleUrl) 
    req.add_header('User-Agent', 'Mozilla/5.0') 
    response = urlopen(req) 
    bsObj = BeautifulSoup(response, 'html.parser') 
    # Decompose the task to smaller tasks 
    internalLinks = findInternalLinks(bsObj, articleUrl) 
    externalLinks = findExternalLinks(bsObj, articleUrl) 
    return externalLinks.union(internalLinks) 

When we have these hyperlink data, we want to add them into the database table “dbo.Hyperlink.” Since there is a many-to-many relationship between articles and hyperlinks, we also need to add the relationship into the table “dbo. HyperlinkIdInArticle.” The SQL statements passed to the execute method include SELECT statements and INSERT statements. The following Python user-defined function use pyodbc to interface with the database.

 
# Save all hyperlink data into the database
def saveArticleHyperLinks(connection, articleUrl, hyperlinks):
    '''
   Take input database connection, article URL and a list of hyperlinks.
   Insert data and their relationships into the database tables
   '''
    cursor = connection.cursor()
    for link in hyperlinks:
        cursor.execute('''
                       DECLARE @ArticleUrl nvarchar(1024) = ?
                             ,@LinkTitle nvarchar(1024) = ?
                             ,@HyperlinkUrl nvarchar(1024) = ?
                              ,@ArticleId int = 0
                             ,@HyperlinkId int = 0
                       SELECT @ArticleId = [ArticleId]
                         FROM [dbo].[Article]
                       WHERE [ArticleUrl] = @ArticleUrl
                       SELECT @HyperlinkId = [HyperlinkId]
                         FROM [dbo].[Hyperlink]
                       WHERE [HyperlinkUrl] = @HyperlinkUrl AND [LinkTitle] = @LinkTitle
                       IF NOT EXISTS (SELECT * FROM HyperlinkIdInArticle
                                   WHERE ArticleId = @ArticleId AND HyperlinkId = @HyperlinkId)
                       BEGIN
                           INSERT INTO [dbo].[Hyperlink]
                                      ([LinkTitle]
                                      ,[HyperlinkUrl]
                                      ,[StatusCode]
                                      ,[ResponseMessage]
                                      ,[VisitedDate])
                                VALUES
                                      (@LinkTitle
                                      ,@HyperlinkUrl
                                      ,'N/A'
                                      ,'N/A'
                                      ,'1900-01-01')
                           SELECT @HyperlinkId = MAX(HyperlinkId)
                            FROM [dbo].[Hyperlink]
                           INSERT INTO [dbo].[HyperlinkIdInArticle]
                                      ([ArticleId]
                                      ,[HyperlinkId])
                                VALUES
                                      (@ArticleId
                                      ,@HyperlinkId)   
                        END            
                   ''',articleUrl, link[0], link[1])
        connection.commit()
    cursor.close()

3 – Reading Data

When we need to pull data out of databases, we can use the cursor execute() method. The execute() method prepares, and executes SQL statements, then returns the cursor object itself. When passing SELECT statements to the method, we can retrieve data from databases if the statements return row objects. The row objects are similar to tuples, but we can access column values by name. Section 1.4 used the cursor object's fetchval() method to read a single value from the database. This section introduces three other fetch methods: fetchall(), fetchone(), and fetchmany(size).

3.1 The fetchall() Method

The fetchall() method fetches all remaining rows in a result set. The method returns an empty list when no more data is available (Vishal, 2020). Using this method can free up a cursor object and a database connection before processing the data. However, this method consumes more memory than other methods because we store all row objects in memory. The following Python function calls this method to read all hyperlinks that an author referenced in his/her articles.

# Find all hyperlinks an author referenced 
def findAllHyperlinkUrlbyAuthor(connection,author): 
    ''' 
    Take input author name, return a list of row objects. 
    ''' 
    cursor = connection.cursor() 
    # Execute the query 
    cursor.execute('''SELECT distinct h.HyperlinkUrl 
                      FROM [Demo].[dbo].[Article] a 
                     INNER JOIN [dbo].[HyperlinkIdInArticle] ha ON a.ArticleId = ha.ArticleId 
                     INNER JOIN [dbo].[Hyperlink] h ON ha.HyperlinkId = h.HyperlinkId 
                WHERE [Author] = ?''',author) 
    # All rows are stored in memory 
    rows = cursor.fetchall() 
  
    # Close the cursor 
    cursor.close() 
    
    return rows 

3.2 The fetchone() Method

The fetchone() method returns the next row in a result set or None if there is no record to fetch. Therefore, we can use a loop to iterate over the result set and process all data. The following Python function demonstrates the syntax of reading data from the database. The while loop stops when the fetchone() method reaches the end of the result set. In each iteration, we access a column value via the column index or the column name. The Python function is for illustrative purposes only. If we want to store all rows in memory, we prefer the fetchall() method.

# Find all hyperlinks an author referenced and process each row data 
def findAllHyperlinkUrlbyAuthor2(connection, author): 
    ''' 
    Take input author name, process data in each row, return a list of row objects. 
    ''' 
    rows = list() 
    cursor = connection.cursor() 
    # Execute the query 
    cursor.execute('''SELECT h.HyperlinkId, h.HyperlinkUrl 
                      FROM [Demo].[dbo].[Article] a 
                     INNER JOIN [dbo].[HyperlinkIdInArticle] ha ON a.ArticleId = ha.ArticleId 
                     INNER JOIN [dbo].[Hyperlink] h ON ha.HyperlinkId = h.HyperlinkId 
                WHERE [Author] = ?''',author) 
    while True: 
        # Access the query result 
        row = cursor.fetchone() 
        if not row: 
            break                           # Returns None when all remaining rows have been retrieved. 
        hyperlinkId = row[0]                # Access by column index (zero-based) 
        hyperlinkUrl = row.HyperlinkUrl     # Access by name 
        rows.append(row)                    # Process data 
        print('HyperlinkId: {0} URL: {1}'.format(hyperlinkId, hyperlinkUrl)) 
    # Close the cursor 
    cursor.close() 
    return rows 

3.3 The fetchmany(size) Method

If fetching data row by row is not efficient and loading all rows into memory is not feasible, the cursor object provides another method fetchmany(size). The method fetches the next set of rows of a result set. The size argument determines the number of rows be retrieved. Therefore, the method works in the same way as the fetchone() method when the size is 1. When there are no remaining rows in the result set, the method returns an empty list. The following Python function fetches five rows at once. The function is for demonstration purposes only. If we want to return all rows to a function caller, we may prefer the fetchall() method.

 
# Find all hyperlinks an author referenced and process row objects in batches
def findAllHyperlinkUrlbyAuthor3(connection, author):
    '''
   Take input author name, process row objects in batches, return a list of objects.
   '''
    rows = list()
    cursor = connection.cursor()
    # Execute the query
    cursor.execute('''SELECT h.HyperlinkId, h.HyperlinkUrl
                     FROM [Demo].[dbo].[Article] a
                      INNER JOIN [dbo].[HyperlinkIdInArticle] ha ON a.ArticleId = ha.ArticleId
                      INNER JOIN [dbo].[Hyperlink] h ON ha.HyperlinkId = h.HyperlinkId
               WHERE [Author] = ?''',author)
    while True:
        # Access the query result
        records = cursor.fetchmany(5)           # The maximum number of rows in records is 5
        if not records:
            break
        for row in records:                        
            hyperlinkId = row[0]                # Access by column index (zero-based)
            hyperlinkUrl = row.HyperlinkUrl     # Access by name
            rows.append(row)
            print('HyperlinkId: {0} URL: {1}'.format(hyperlinkId, hyperlinkUrl))
        print('Loaded {0} rows from the database to a batch'.format(len(records)))
    # Close the cursor
    cursor.close()
    return rows

3.4 The nextset() Method

When a SQL script contains multiple SELECT statements, the execution of the script returns multiple result sets. We use the nextset() method to move the cursor to the next available result set. If there are no more result sets, the method returns None. We call the fetch methods to retrieve rows from the available result set. The following Python function executes three SELECT statements to get a summary of three tables; therefore, there should be three results sets:

# Get table summary from the database 
def getSummary(connection): 
'''Take input connection and retrieve summary information about the three tables''' 
cursor = connection.cursor() 
# Execute the SQL statements 
cursor.execute(''' 
SELECT [Author],[AuthorSinceYear],[TipContribution] 
FROM [dbo].[Author] 
SELECT YEAR([ArticleLastUpdate]) AS [YEAR], COUNT(*) AS SubTotal 
FROM [dbo].[Article] 
GROUP BY YEAR([ArticleLastUpdate]) 
SELECT COUNT(*) AS TotalLinks 
FROM [dbo].[Hyperlink] 
''') 
print('Fetch rows from first set') 
rows=cursor.fetchall() 
for row in rows: 
print('Author: {0} Since Year: {1} Contributions: {2}'.format(row[0],row[1],row[2])) 
print('Fetch rows from second set') 
if cursor.nextset(): 
# fetch rows from next set, discarding first set 
rows=cursor.fetchall() 
for row in rows: 
print('Year: {0} Contributions Subtotal: {1}'.format(row[0], row[1])) 
print('Fetch rows from third set') 
if cursor.nextset(): 
# fetch a single value from next set 
count = cursor.fetchval() 
print('Total links: {0}'.format(count)) 
# Close the cursor 
cursor.close() 

4 – Updating Records

When passing an UPDATE statement to the execute() method, we can update data in a database. Section 3.2 retrieves all hyperlinks in every article of a particular author. We then use the Python program to make HTTP requests. Web servers release status codes in response to the HTTP requests. We determine the hyperlink accessibilities according to these status codes and other response messages and update the database. The following Python function can evaluate the accessibility of a hyperlink:

# Access an Internet address to check the accessibility 
def testHyperlinkAccessibility(url): 
'''Take input URL, and return the status code, response message and the date visited.''' 
statusCode, responseMessage = 'N/A', 'N/A' 
try: 
req = Request(url) 
req.add_header('User-Agent', 'Mozilla/5.0') 
response = urlopen(req) 
except HTTPError as e: 
statusCode = e.code 
except URLError as e: 
responseMessage = str(e.reason) 
except: 
responseMessage = sys.exc_info()[1] 
else: 
statusCode = response.getcode() 
finally: 
visitedDate = date.today() 

return statusCode, responseMessage, visitedDate 

When knowing the accessibility of a hyperlink, we update the accessibility of the hyperlink in the database table accordingly. We first construct the UPDATE statement. We then pass the SQL statement to the execute() method, as shown in the following Python function. We also access the number of updated rows through the cursor “rowcount” attribute.

# Update the accessibility of an Internet address 
def updateHyperlinkAccessibility(connection,hyperlinkUrl, statusCode, responseMessage, visitedDate): 
''' 
Take input database connection, URL, status code, response message and date visited. 
Update the row in the database table [dbo].[Hyperlink] 
''' 
cursor = connection.cursor() 
# Execute the query 
cursor.execute('''UPDATE [dbo].[Hyperlink] 
SET [StatusCode] = ? 
,[ResponseMessage] = ? 
,[VisitedDate] = ? 
WHERE HyperlinkUrl = ?''',statusCode, responseMessage, visitedDate, hyperlinkUrl) 
print('Number of rows updated: ', cursor.rowcount) 
connection.commit() 
# Close the cursor 
cursor.close() 

5 – Deleting Records

Deleting records work in the same ways as updating records. When passing DELETE statements to the execute() method, we can perform delete operations via Python programs. We use the “rowcount” attribute to find the number of rows deleted. After finding inaccessible hyperlinks in the articles, we can remove the hyperlinks from the articles. To make the data in the database reflect the changes, we need to delete the relationship in the table [dbo].[HyperlinkIdInArticle]. The following Python function implements this requirement.

# Helper function to delete a row in a database table 
def deleteHyperlinkIdInArticle(connection, articleId, hyperlinkId): 
'''Take input database connection, articleId and hyperlinkId. Delete the row accordingly''' 
cursor = connection.cursor() 
# Execute the query 
cursor.execute('''DELETE FROM [dbo].[HyperlinkIdInArticle] 
WHERE [ArticleId] = ? and [HyperlinkId] = ?''',articleId, hyperlinkId) 
print('Number of rows deleted: ', cursor.rowcount) 
connection.commit() 
# Close the cursor 
cursor.close() 

6 – The Complete Source Code

The Python program scans all hyperlinks in articles and finds inaccessible URLs. Instead of traveling over the entire website, we process articles by one author at once. Therefore, when initializing variables, we should give an author name. After gathering data from the MSSQLTips website, the program performs CRUD operations on the Microsoft SQL Server database using the Python language. The program walks through eight steps:

  1. Initialize variables including the SQL Server database information, author name, and the MSSQLTips Authors' URL;
  2. Remove all testing data in the database;
  3. Extract the author's summary information and save it into the database table;
  4. Find a list of author's articles and store the list into the database table;
  5. Discover all hyperlinks in articles and save them into the database;
  6. Read all URLs from the database into an in-memory list;
  7. Iterate over the URL list, evaluate the accessibility of each URL, and update the accessibility in the database accordingly;
  8. Print out a summary report;

Click here for the complete code. To run the program, we should change the values of these variables: server, database, user, and password.

Summary

In everyday work practices, we often need to perform database operations. There are four basic database operations: create, read, update, and delete (CRUD). This article used the pyodbc module to perform Microsoft SQL Server database operations. The article also adopted web-scraping techniques to gather data from the MSSQLTips website. We used Microsoft Visual Studio 2019 to edit and test the Python program.

We created a database with four tables. We determined how to use the driver, {ODBC Driver 17 for SQL Server}. With known server instance name, database name, and user confidential information; we connected to the database from the Python program. We scratched the surface of the connection/cursor model. The cursor object provided the execute() method that can run SQL statements.

We explored three ways to create records: calling stored procedures, using the executemany() method, and employing the execute() method. We covered five cursor object methods to read data: fetchval(), fetchone(), fetchall(), fetchmany(), and nextset(). Updating and deleting both work the same way. We passed UPDATE or DELETE statements to the execute() method to perform the update or the delete operation. Finally, we provided the complete source code.

References

Crummy (2020). Beautiful Soup. https://www.crummy.com/software/BeautifulSoup/.

Erskine, K. & Kleehammer, M. (2020). Cursor. https://github.com/mkleehammer/pyodbc/wiki/Cursor.

Kleehammer, M (2020). Connecting to SQL Server from Windows. https://github.com/mkleehammer/pyodbc/wiki/Connecting-to-SQL-Server-from-Windows

Mitchell, R. (2018). Web Scraping with Python, 2nd Edition. O'Reilly Media.

PYPI (2020). DB API Module for ODBC: pyodbc 4.0.30. https://pypi.org/project/pyodbc/.

Stein, S., Coulter, D., Wales, C., Parente, J., Mabee, D., Ray., M., Milener, G., Guyer, C. & Cai, S. (2017). Stored Procedures (Database Engine). https://docs.microsoft.com/en-us/sql/relational-databases/stored-procedures/stored-procedures-database-engine?view=sql-server-ver15#benefits-of-using-stored-procedures.

Thompson, G. & Kleehammer, M. (2018). Calling Stored Procedures. https://github.com/mkleehammer/pyodbc/wiki/Calling-Stored-Procedures.

Vishal (2020). Python cursor’s fetchall, fetchmany(), fetchone() to read records from database table. https://pynative.com/python-cursor-fetchall-fetchmany-fetchone-to-read-rows-from-table/.

Zhou, N. (2020). Recursive Programming Techniques using Python. https://www.mssqltips.com/sqlservertip/6549/recursion-in-python/.

Next Steps





get scripts

next tip button



About the author
MSSQLTips author Nai Biao Zhou Nai Biao Zhou is a Senior Software Developer with 20+ years of experience in software development, specializing in Data Warehousing, Business Intelligence, Data Mining and solution architecture design.

View all my tips


Article Last Updated: 2021-01-11

Comments For This Article




Monday, January 11, 2021 - 4:27:07 PM - Nai Biao Zhou Back To Top (88029)
Thank you for reading my article. I appreciate your question, but I do not have a clear answer. I would say it depends.

The Python web framework, Django, provides an Object-Relational Mapper (ORM). I have used this feature in web development, and it is very convenient in some situations. However, when the web applications need to analyze (or operate) data from legacy databases, I still prefer these CRUD operations.

In addition, when performing data analysis, I have never used an ORM to perform data operations. I would appreciate it if someone can share their experiences in using ORM in Python programming, especially in data analysis projects.

Nai Biao Zhou

Monday, January 11, 2021 - 12:19:37 PM - Jeff Moden Back To Top (88027)
I'm just curious about all of this... isn't this all what ORMs are supposed to do auto-magically for you? Is there actually still a need for C.R.U.D. procedures/


download














get free sql tips
agree to terms