Automatically Transcribe MP3 or MP4 Files and Store in SQL Server

By: Daniel Calbimonte | Updated: 2023-10-12 | Comments | Related: More > Artificial Intelligence

Problem

Often we have audio or video files that have valuable information that we would like to convert to text transcripts. In this article, we look at a way that this can be done by taking audio or video files stored in SQL Server and automatically create a transcript from these files.

Solution

In this article, we will use Python and the OpenAI API to convert MP3 sound files into text and store it in SQL Server. The example can also work for .wav, .mp3, .flac, .ogg, and .mp4 files.

Requirements

SQL Server 2022 installed or any other SQL version.
SSMS installed.
Visual Studio Code installed.
A .mp3 file with English audio. If you do not have an MP3, you can record one.

Install Python Extension in Visual Studio Code

This article will provide steps to work with Visual Studio Code. However, you can use any other IDE you prefer to run the Python code.

In Visual Studio Code, go to Extensions.

Search for Python and install the Microsoft extension if not already installed.

Open a Python File

In Visual Studio Code, go to File > New Text File.

In the Text file, Select a language.

Select Python.

Writing Code to Read an MP3 File

The first example will create code to read the MP3 file and write the text.

The following code will transcribe the contents of the audio.mp3 file stored in the c:\data folder. The audio in this file is "Hello this is a test."

#MSSQLTips.com
#use the openai API
import Openai
#use the api key
openai.api_key = "sk-Z5KNDHB8hmMd801C90WywthJYIfRaNqj"
#specify the path
audio_file= open("C:\\data\\audio.mp3", "rb")
#store the transcription in a variable
transcript = openai.Audio.transcribe("whisper-1", audio_file)
#print the transcription of the audio
print(transcript)

Install openai library from command line

Note: You need to install the openai library using the command line.

Run the pip install openai in the command line where Python is installed. This library uses AI to read and transcribe information from the audio file. Once installed, we can invoke it using the import openai code.

Register and log into OpenAI

To connect to openai, we need to register there. You can find the API keys here. You will need to register on the OPENAI website. They will ask for a login and a password, and you will receive a message on your cell phone to confirm.

Create secret key

Once you have an account and have logged into the site, Create a new secret key if you do not have any:

Copy the secret key created.

Code

The following line of code will open and read the binary file p2. Note: The path c:\data uses double backslashes.

audio_file= open("C:\\data\\audio.mp3", "rb")

"rb" means to read the binary file (read binary). There are other options for the open function:

w = write the file.
a = append the file.
x = exclusive creation mode.
t = text mode.

OpenAI supports the following audio extensions .wav, .mp3, .flac, .ogg, and .mp4.

The next line of code is the one that transcribes the audio into text and stores the text in the transcript variable:

transcript = openai.Audio.transcribe("whisper-1", audio_file)

Whisper-1 is used for a quiet, whispered speech. Other options include:

phone_call = phone calls.
serene-2 = a quiet speech
focus-group-1 = a group forum or discussion.
interview-3 = transcribing interviews.
lecture-1 = academic lectures.
Default = generic one for general purposes.

And finally, we print the transcription.

print(transcript)

This line of code will show the text:

Note: It returns the data in brackets. This is because we are invoking the Openai REST API, and the REST API is returning a JSON response.

How to store the transcript into SQL Server

We have an mp3 file and we want to store the transcription into SQL Server.

The code to generate a table is the following:

CREATE TABLE transcript
(
    id INT IDENTITY(1,1) PRIMARY KEY,
    filename VARCHAR(255),
    transcription VARCHAR(MAX)
);

The table transcript will contain the file name in the filename column and the transcription of the mp3 audio in the transcription column. The code to do that is the following:

#mssqltips.com
import pyodbc
import openai
 
# Set up the connection to the SQL Server database using Windows Authentication
server = '.'
database = 'adventureworks2019'
conn = pyodbc.connect('DRIVER={SQL Server};SERVER=' + server + ';DATABASE=' + database + ';Trusted_Connection=yes;')
 
# use the openai API
openai.api_key = "sk-Z5KNDHB8hmMd801C90WywthJYIfRaNqj"
 
# specify the path
audio_file_path = "C:\\data\\audio.mp3"
 
#open(audio_file_path, "rb")
 
# store the transcription in a variable
transcript = openai.Audio.transcribe("whisper-1", audio_file)
 
# print the transcription of the audio
print(transcript)
 
# Insert the transcription and filename into the transcript table
cursor = conn.cursor()
query = "INSERT INTO transcript (filename, transcription) VALUES (?, ?)"
cursor.execute(query, ("audio.mp3", transcript))
conn.commit()
 
# Close the connection
conn.close()

Note that the audio files supported are the following:

mp3
mp4
mpeg
mpga
m4a
wav
webm

How to Read Sound Data Files Stored in SQL Server using Python and AI

Previously, we learned how to read an MP3 file. In this new example, we will read the file stored in SQL Server.

First, we must install pyodbc, a library used to connect with ODBC in Python.

Run this command where Python is installed in order to install the pyodbc library:

Pip install pyodbc

Also, we need a database with MP3 files. The code below creates a table named AudioFiles. With the ID as the primary key, the filename will contain the name of the file and AudioData will contain the MP3 files.

--MSSQLTips.com
 
CREATE TABLE [dbo].[AudioFiles](
   [ID] [int] IDENTITY(1,1) NOT NULL,
   [FileName] [varchar](100) NULL,
   [AudioData] [varbinary](max) NULL,
PRIMARY KEY CLUSTERED 
(
   [ID] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY] TEXTIMAGE_ON [PRIMARY]
GO

To insert an audio file in the AudioFiles table, you can use the following T-SQL statement:

INSERT INTO [dbo].[AudioFiles] ([FileName], [AudioData])
SELECT 'audio.mp3', BulkColumn
FROM OPENROWSET(BULK N'C:\data\audio.mp3', SINGLE_BLOB) AS FileData

We are inserting the MP3 file from the c:\data path into the table.

Next, in Visual Studio Code, write this code:

#MSSQLTips.com
import pyodbc
from io import BytesIO
import openai
 
# Connect to SQL Server
conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=.;'
                      'Database=Adventureworks2019;'
                      'Trusted_Connection=yes;')
 
# Get the audio data from the AudioFiles table
cursor = conn.cursor()
cursor.execute('SELECT AudioData FROM AudioFiles WHERE FileName = ?', 'test1.m4a')
audio_data = cursor.fetchone()[0]
 
# Transcribe the audio data
openai.api_key = "sk-Z50WywthJYIfRaNqj"
audio_file = BytesIO(audio_data)
audio_file.name = "test1.m4a"
transcript = openai.Audio.transcribe("whisper-1", audio_file)
 
# Print the transcription
print(transcript)
 
# Close the connection
conn.close()

First, we import the libraries: pyodbc to connect to SQL Server using ODBC, BytesIO to handle the files, and openai to transcribe the audio.

import pyodbc
from io import BytesIO
import openai

Next, we need to connect to SQL Server using the SQL Server driver, where the "." means to use the local SQL Server. You can specify the SQL Server name instead. The database is the name of the SQL Server database where the AudioFiles table is stored. You can use any database. Finally, Trusted_connection is used to show the connection to SQL Server is through the Windows Account. Ensure that your current Windows Account has access to the database with the AudioFiles table.

# Connect to SQL Server
conn = pyodbc.connect('Driver={SQL Server};'
                      'Server=.;'
                      'Database=Adventureworks2019;'
                      'Trusted_Connection=yes;')

Also, create a cursor, and store the information from the cursor in the audio_data variable. The test1.m4a is the file with the audio. The cursor will read the data of the column where the filename is equal to test1.m4a.

# Get the audio data from the AudioFiles table
cursor = conn.cursor()
cursor.execute('SELECT AudioData FROM AudioFiles WHERE FileName = ?', 'test1.m4a')
audio_data = cursor.fetchone()[0]

We then use the openai libraries to read the audio_file and transcribe the audio in the transcript variable.

# Transcribe the audio data
openai.api_key = "sk-Z50WywthJYIfRaNqj"
audio_file = BytesIO(audio_data)
audio_file.name = "test1.m4a"
transcript = openai.Audio.transcribe("whisper-1", audio_file)

Finally, we print the transcription and close the connection.

# Print the transcription
print(transcript)
 
# Close the connection
conn.close()

Conclusion

In this article, we learned how to transcribe audio files using AI and then how to transcribe audio files stored in a SQL Server database.

Next Steps

To learn more about Python and handling binary files in SQL Server, refer to these links:

About the author

Daniel Calbimonte is a Microsoft SQL Server MVP, Microsoft Certified Trainer and 6-time Microsoft Certified IT Professional. Daniel started his career in 2001 and has worked with SQL Server 6.0 to 2022. Daniel is a DBA as well as specializes in Business Intelligence (SSIS, SSAS, SSRS) technologies.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Article Last Updated: 2023-10-12

Use AI to Transcribe Audio from an MP3 or MP4 File and Store in SQL Server