sqldf in R Example for SQL Server


By:   |   Updated: 2021-02-09   |   Comments   |   Related: More > R Language


Problem

I have a basic knowledge of R, is it possible to manipulate data inside an R dataframe using regular T-SQL syntax?

Solution

R has a package called sqldf that allows developers to manipulate data inside a dataframe in the same way a SQL developer, queries a SQL table. In this tutorial I will show how to install the package and how to use it to query some values from the sample AdventureWorks2014 database.

Step 1: First, we need to install the sqldf package. This is simply done by the following command:

install.packages("sqldf")
install R package sqldf

Step 2: Use the AdventureWorks2014 database, Sales.SalesOrderHeader and Sales.SalesTerritory tables to create our sample query.

SELECT T.[Name], convert(varchar(20),[OrderDate],110) as OrderDate, [SalesOrderID],[CustomerID], 
S.[TerritoryID],[SubTotal],[TaxAmt],[TotalDue]
FROM Sales.SalesOrderHeader AS S
INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID = S.TerritoryID
query results

Step 3: Create an R script in RStudio that connects to SQL, reads the data returned from the above query into an R dataframe structure.

library(RODBC)
library(dplyr)
 
sqlcnt <- odbcDriverConnect('driver={SQL Server Native Client 11.0};uid=xxxxxxxxx;pwd=xxxxxxx;server=xxxxxxxxx;database=AdventureWorks2014;rows_at_time=1024')
 
sqlQry <- "SELECT T.[Name], convert(varchar(20),[OrderDate],110) as OrderDate, [SalesOrderID],[CustomerID], 
S.[TerritoryID],[SubTotal],[TaxAmt],[TotalDue]
FROM Sales.SalesOrderHeader AS S
INNER JOIN [Sales].[SalesTerritory] AS T 
ON T.TerritoryID =S.TerritoryID"
# Store data into our adwdf dataframe
adwdf <- sqlQuery(sqlcnt, sqlQry, stringsAsFactors = F,as.is = T)
# To view our data
View(adwdf)
query output

Step 4: Finally, we have our data loaded into a dataframe, we can show how easy it is to work with dataframe in a SQL like manner. Let us start with a simple query that lists all the Canadian orders:

#adwdf is our DataFrame
 
library(sqldf)
sqldf("select  * from adwdf where Name = 'Canada'")
Select from dataframe ADWDF

Using T-SQL code.

SELECT T.[Name], convert(varchar(20),[OrderDate],110) as OrderDate, [SalesOrderID],[CustomerID], 
S.[TerritoryID],[SubTotal],[TaxAmt],[TotalDue]
FROM Sales.SalesOrderHeader AS S
INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID =S.TerritoryID
WHERE [Name] = 'Canada'
query results

Next, let us aggregate some data using a group by clause

sqldf("select name, sum(SubTotal) from adwdf group by name") 
Select using group by

Using T-SQL code.

SELECT T.[Name], sum([SubTotal])
FROM Sales.SalesOrderHeader AS S
INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID =S.TerritoryID
GROUP BY [Name]  
query results

Lastly, let us query our dataframe to look for Total Due greater than one million.

sqldf("select name, OrderDate , sum(SubTotal) SubTotal,sum(taxAmt) taxAmt,sum(TotalDue) TotalDue from adwdf group by name, OrderDate having sum(TotalDue) >1000000")
Select Using Having

Using T-SQL code.

SELECT T.[Name],OrderDate, sum(SubTotal) SubTotal,sum(taxAmt) taxAmt,sum(TotalDue) TotalDue
FROM Sales.SalesOrderHeader AS S
INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID =S.TerritoryID
GROUP BY [Name],OrderDate
HAVING sum(TotalDue) >1000000
query results

Conclusion

The R sqldf package is versatile and we can use it to create ways to filter data in a dataframe using familiar T-SQL syntax.

Next Steps





get scripts

next tip button



About the author
MSSQLTips author Matteo Lorini Matteo Lorini is a DBA and has been working in IT since 1993. He specializes in SQL Server and also has knowledge of MySQL.

View all my tips


Article Last Updated: 2021-02-09

Comments For This Article





download














get free sql tips
agree to terms