Problem
PDF forms are widely used for data collection, document processing, and digital workflows due to their versatility and consistency across different platforms and devices. They are essential in various industries, including healthcare, education, finance, government, and business. How do you retrieve data from PDF forms and insert into a SQL Server database table?
Solution
Using Visual Studio, let’s create a new project using a Visual Basic Windows Forms App template.
Click Next, and give a name for your solution. The name I used is PdfFormsReader. Choose a location and click Next. Choose the adequate framework and click Create.
In the Solution Explorer panel, click on the My Project folder. Click Settings, and then Create or the open application settings hyperlink.

Create a new variable called LastVisitedFolder as string.

I renamed the Form1.vb to FromPdfReader.vb, then added a textbox named TxbFolderPath and a button named BtnPdfFolder with text property “Browse”. Next, I added a RichTextBox named RtbPdfFormsContent and another button named BtnPdfFormRead with text property “Read”.

Right-click over your project name and choose Manage Nuget Packages and add the itext7 and System.Data.SqlClient packages:

Now include the following code in the FormPdfReader.vb file:
Imports System.Data.SqlClient
Imports System.IO
Imports iText.Kernel.Pdf
Public Class FormPdfReader
Private Sub BtnPdfFolder_Click(sender As Object, e As EventArgs) Handles BtnPdfFolder.Click
Using folderBrowserDialog1 As New FolderBrowserDialog()
If folderBrowserDialog1.ShowDialog() = DialogResult.OK Then
My.Settings.LastVisitedFolder = folderBrowserDialog1.SelectedPath
My.Settings.Save()
TxbFolderPath.Text = folderBrowserDialog1.SelectedPath
End If
End Using
End Sub
Private Sub BtnPdfFormRead_Click(sender As Object, e As EventArgs) Handles BtnPdfFormRead.Click
RtbPdfFormsContent.Text = ""
Dim folderPath As String = TxbFolderPath.Text
If Directory.Exists(folderPath) Then
Dim files As String() = Directory.GetFiles(folderPath)
For Each file As String In files
If IsNothing(file) Then
Continue For
End If
RtbPdfFormsContent.Text += New String("-"c, 100) & vbCrLf
RtbPdfFormsContent.Text += file & vbCrLf
RtbPdfFormsContent.Text += New String("-"c, 100) & vbCrLf
Dim pdfDoc As New PdfDocument(New PdfReader(file))
Dim root As PdfDictionary = pdfDoc.GetCatalog().GetPdfObject().GetAsDictionary(PdfName.AcroForm)
If root IsNot Nothing Then
Dim fields As PdfArray = root.GetAsArray(PdfName.Fields)
If fields IsNot Nothing Then
For I As Integer = 0 To fields.Size - 1
Dim field As PdfDictionary = fields.GetAsDictionary(i)
Dim nameObject As PdfObject = field.Get(PdfName.T)
Dim fieldname As String
If nameObject IsNot Nothing AndAlso nameObject.IsName() Then
fieldname = DirectCast(nameObject, PdfName).GetValue()
Else
fieldname = nameObject.ToString()
End If
Dim fieldValue As String = ""
If field.Get(PdfName.V) IsNot Nothing Then
fieldValue = field.Get(PdfName.V).ToString()
End If
If fieldValue.Length > 0 Then
Dim connectionString As String = "Server=DAIO;Database=MSSQLTips;Integrated Security=True;Connect Timeout=30;Encrypt=False;TrustServerCertificate=True;"
Using connection As New SqlConnection(connectionString)
Try
connection.Open()
Dim commando As String = $"EXECUTE dbo.uspPdfForms '{Path.GetFileName(file)}','{fieldName}','{fieldValue}';"
Using cmd As New SqlCommand(comando, connection)
Dim unused = cmd.ExecuteNonQuery()
End Using
RtbPdfFormsContent.Text += fieldname & ": " & fieldValue & vbCrLf
Catch ex As Exception
MsgBox($"Falha para: {file} [{fieldName}] --> {fieldValue} " & ex.Message)
End Try
End Using
End If
Next
Else
MsgBox("No form fields found in the PDF.", MsgBoxStyle.Information, "Diretório Vazio")
End If
End If
pdfDoc.Close()
Next
Else
MsgBox("Directory not found.", MsgBoxStyle.Critical, "Erro Encontrado")
End If
Beep()
End Sub
Private Sub FormPdfReader_Load(sender As Object, e As EventArgs) Handles MyBase.Load
TxbFolderPath.Text = My.Settings.LastVisitedFolder
End Sub
End Class
Data Compilation – SQL Server Side
The first step is to create a table to hold the data. This table only takes into consideration the file name, its fields names, and respective values, so it will work with any PDF form.
CREATE TABLE [dbo].[PdfForms](
[ItemId] [int] IDENTITY(1,1) NOT NULL,
[FormName] [nvarchar](50) NULL,
[FieldName] [nvarchar](150) NULL,
[FieldValue] [nvarchar](250) NULL,
[ModifiedDate] [datetime] DEFAULT GETDATE(),
[RecordChanged] [bit] DEFAULT 0,
CONSTRAINT [PK_PdfForms] PRIMARY KEY CLUSTERED
(
[ItemId] ASC
)WITH (PAD_INDEX = OFF, STATISTICS_NORECOMPUTE = OFF, IGNORE_DUP_KEY = OFF, ALLOW_ROW_LOCKS = ON, ALLOW_PAGE_LOCKS = ON, OPTIMIZE_FOR_SEQUENTIAL_KEY = OFF) ON [PRIMARY]
) ON [PRIMARY]
GO
Add the store procedure (below) that will receive and process the data from our Visual Studio solution:
-- =============================================
-- Author: SCP - MSSQLTips
-- Create date: 20240910
-- Description: PDF Forms Compilation
-- =============================================
ALTER PROCEDURE [dbo].[uspPdfForms]
(@FormName nvarchar(50)
,@FieldName nvarchar(150)
,@FieldValue nvarchar(250))
AS
BEGIN
SET NOCOUNT ON;
BEGIN TRY
BEGIN TRANSACTION;
DECLARE @ModifiedDate datetime = GETDATE();
DECLARE @ItemId int =
(SELECT [ItemId]
FROM [dbo].[PdfForms]
WHERE [FormName] = @FormName AND
[FieldName] = @FieldName);
IF @ItemId IS NULL
SET @ItemId = 0;
-- UPDATE
IF @ItemId > 0
BEGIN
UPDATE [dbo].[PdfForms]
SET [FormName] = @FormName
,[FieldName] = @FieldName
,[FieldValue] = @FieldValue
,[ModifiedDate] = @ModifiedDate
,[RecordChanged] = CASE WHEN [FieldValue] <> @FieldValue THEN 1 ELSE 0 END
WHERE [ItemId] = @ItemId;
END
-- INSERT
IF @ItemId = 0
BEGIN
INSERT INTO [dbo].[PdfForms]
([FormName]
,[FieldName]
,[FieldValue]
,[ModifiedDate])
VALUES (@FormName
,@FieldName
,@FieldValue
,@ModifiedDate);
SET @ItemId = SCOPE_IDENTITY();
END
-- DELETE
IF @ItemId < 0
BEGIN
DELETE FROM [dbo].[PdfForms]
WHERE [ItemId] = ABS(@ItemId);
END
COMMIT TRANSACTION;
RETURN @ItemId;
END TRY
BEGIN CATCH
IF @@TRANCOUNT > 0
BEGIN
ROLLBACK TRANSACTION;
END
-- Print error information.
PRINT 'Error: ' + CONVERT(varchar(50), ERROR_NUMBER()) +
', Severity: ' + CONVERT(varchar(5), ERROR_SEVERITY()) +
', State: ' + CONVERT(varchar(5), ERROR_STATE()) +
', Procedure: ' + ISNULL(ERROR_PROCEDURE(), '-') +
', Line: ' + CONVERT(varchar(5), ERROR_LINE());
PRINT ERROR_MESSAGE();
END CATCH;
END
Add the following store procedure to display the result of the form read action:
-- =============================================
-- Author: SCP - MSSQLTips
-- Create date: 20240910
-- Description: PDF Forms Compiled
-- =============================================
ALTER PROCEDURE [dbo].[uspPdfFormsChk]
AS
BEGIN
SET NOCOUNT ON;
DECLARE @Columns nvarchar(MAX)
,@SQL NVARCHAR(MAX);
SELECT @Columns = STRING_AGG(QUOTENAME([FormName]), ', ')
FROM (SELECT DISTINCT [FormName]
FROM [dbo].[PdfForms]) AS FormNames;
SET @SQL = N'
SELECT FieldName, ' + @Columns + N'
FROM (SELECT [FormName]
,[FieldName]
,[FieldValue] + CASE WHEN RecordChanged = 1 THEN '' <---'' ELSE '''' END AS [FieldValue]
FROM [dbo].[PdfForms]) AS SourceTable
PIVOT (MAX(FieldValue)
FOR [FormName] IN (' + @Columns + N')) AS PivotTable;
';
EXEC sp_executesql @SQL;
END
Testing the Solution
For this example, I will use the form below with the fields NAME as a textbox, CHOICE as a combo box, OPTION as a radio button, and COLOR as a list box. I will include only two forms.

Place your filled PDF forms in a folder and execute the Visual Studio, first browsing to this folder and then reading the files inside it. The results will be displayed in the Rich Text box.

When the store procedure uspPdfFormsChk is executed, the result is:

Now I will add another form it the forms directory, and change the option in the List box to Orange for the form 1 and execute the READ button in my application:

Now after executing the store procedure uspPdfFormsChk, the result will be:

Observe that for the ListBox in form 1, the system detected that it was changed since the last run.
This covers the basics of retrieving data from PDF forms using Visual Studio and saving the data to SQL Server.
Next Steps
- You can improve the application in Visual Studio by retrieving the data from the store procedure uspPdfFormsChk to a data grid view.
- In this example, I did not select the most appropriate field name descriptions. Instead of using descriptive names like “Name,” I labeled the fields based on their type, such as “TextBox,” to demonstrate the item type used.
- Here are some related articles: