sqldf in R Example for SQL Server

By:   |   Comments   |   Related: More > R Language


Problem

I have a basic knowledge of R, is it possible to manipulate data inside an R dataframe using regular T-SQL syntax?

Solution

R has a package called sqldf that allows developers to manipulate data inside a dataframe in the same way a SQL developer, queries a SQL table. In this tutorial I will show how to install the package and how to use it to query some values from the sample AdventureWorks2014 database.

Step 1: First, we need to install the sqldf package. This is simply done by the following command:

install.packages("sqldf")
install R package sqldf

Step 2: Use the AdventureWorks2014 database, Sales.SalesOrderHeader and Sales.SalesTerritory tables to create our sample query.

SELECT T.[Name], convert(varchar(20),[OrderDate],110) as OrderDate, [SalesOrderID],[CustomerID], 
S.[TerritoryID],[SubTotal],[TaxAmt],[TotalDue]
FROM Sales.SalesOrderHeader AS S
INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID = S.TerritoryID
query results

Step 3: Create an R script in RStudio that connects to SQL, reads the data returned from the above query into an R dataframe structure.

library(RODBC)
library(dplyr)
 
sqlcnt <- odbcDriverConnect('driver={SQL Server Native Client 11.0};uid=xxxxxxxxx;pwd=xxxxxxx;server=xxxxxxxxx;database=AdventureWorks2014;rows_at_time=1024')
 
sqlQry <- "SELECT T.[Name], convert(varchar(20),[OrderDate],110) as OrderDate, [SalesOrderID],[CustomerID], 
S.[TerritoryID],[SubTotal],[TaxAmt],[TotalDue]
FROM Sales.SalesOrderHeader AS S
INNER JOIN [Sales].[SalesTerritory] AS T 
ON T.TerritoryID =S.TerritoryID"
# Store data into our adwdf dataframe
adwdf <- sqlQuery(sqlcnt, sqlQry, stringsAsFactors = F,as.is = T)
# To view our data
View(adwdf)
query output

Step 4: Finally, we have our data loaded into a dataframe, we can show how easy it is to work with dataframe in a SQL like manner. Let us start with a simple query that lists all the Canadian orders:

#adwdf is our DataFrame
 
library(sqldf)
sqldf("select  * from adwdf where Name = 'Canada'")
Select from dataframe ADWDF

Using T-SQL code.

SELECT T.[Name], convert(varchar(20),[OrderDate],110) as OrderDate, [SalesOrderID],[CustomerID], 
S.[TerritoryID],[SubTotal],[TaxAmt],[TotalDue]
FROM Sales.SalesOrderHeader AS S
INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID =S.TerritoryID
WHERE [Name] = 'Canada'
query results

Next, let us aggregate some data using a group by clause

sqldf("select name, sum(SubTotal) from adwdf group by name") 
Select using group by

Using T-SQL code.

SELECT T.[Name], sum([SubTotal])
FROM Sales.SalesOrderHeader AS S
INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID =S.TerritoryID
GROUP BY [Name]  
query results

Lastly, let us query our dataframe to look for Total Due greater than one million.

sqldf("select name, OrderDate , sum(SubTotal) SubTotal,sum(taxAmt) taxAmt,sum(TotalDue) TotalDue from adwdf group by name, OrderDate having sum(TotalDue) >1000000")
Select Using Having

Using T-SQL code.

SELECT T.[Name],OrderDate, sum(SubTotal) SubTotal,sum(taxAmt) taxAmt,sum(TotalDue) TotalDue
FROM Sales.SalesOrderHeader AS S
INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID =S.TerritoryID
GROUP BY [Name],OrderDate
HAVING sum(TotalDue) >1000000
query results

Conclusion

The R sqldf package is versatile and we can use it to create ways to filter data in a dataframe using familiar T-SQL syntax.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Matteo Lorini Matteo Lorini is a DBA and has been working in IT since 1993. He specializes in SQL Server and also has knowledge of MySQL.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article

















get free sql tips
agree to terms