sqldf in R Example for SQL Server
By: Matteo Lorini | Updated: 2021-02-09 | Comments | Related: More > R Language
Problem
I have a basic knowledge of R, is it possible to manipulate data inside an R dataframe using regular T-SQL syntax?
Solution
R has a package called sqldf that allows developers to manipulate data inside a dataframe in the same way a SQL developer, queries a SQL table. In this tutorial I will show how to install the package and how to use it to query some values from the sample AdventureWorks2014 database.
Step 1: First, we need to install the sqldf package. This is simply done by the following command:
install.packages("sqldf")

Step 2: Use the AdventureWorks2014 database, Sales.SalesOrderHeader and Sales.SalesTerritory tables to create our sample query.
SELECT T.[Name], convert(varchar(20),[OrderDate],110) as OrderDate, [SalesOrderID],[CustomerID], S.[TerritoryID],[SubTotal],[TaxAmt],[TotalDue] FROM Sales.SalesOrderHeader AS S INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID = S.TerritoryID

Step 3: Create an R script in RStudio that connects to SQL, reads the data returned from the above query into an R dataframe structure.
library(RODBC) library(dplyr) sqlcnt <- odbcDriverConnect('driver={SQL Server Native Client 11.0};uid=xxxxxxxxx;pwd=xxxxxxx;server=xxxxxxxxx;database=AdventureWorks2014;rows_at_time=1024') sqlQry <- "SELECT T.[Name], convert(varchar(20),[OrderDate],110) as OrderDate, [SalesOrderID],[CustomerID], S.[TerritoryID],[SubTotal],[TaxAmt],[TotalDue] FROM Sales.SalesOrderHeader AS S INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID =S.TerritoryID" # Store data into our adwdf dataframe adwdf <- sqlQuery(sqlcnt, sqlQry, stringsAsFactors = F,as.is = T) # To view our data View(adwdf)

Step 4: Finally, we have our data loaded into a dataframe, we can show how easy it is to work with dataframe in a SQL like manner. Let us start with a simple query that lists all the Canadian orders:
#adwdf is our DataFrame library(sqldf) sqldf("select * from adwdf where Name = 'Canada'")

Using T-SQL code.
SELECT T.[Name], convert(varchar(20),[OrderDate],110) as OrderDate, [SalesOrderID],[CustomerID], S.[TerritoryID],[SubTotal],[TaxAmt],[TotalDue] FROM Sales.SalesOrderHeader AS S INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID =S.TerritoryID WHERE [Name] = 'Canada'

Next, let us aggregate some data using a group by clause
sqldf("select name, sum(SubTotal) from adwdf group by name")

Using T-SQL code.
SELECT T.[Name], sum([SubTotal]) FROM Sales.SalesOrderHeader AS S INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID =S.TerritoryID GROUP BY [Name]

Lastly, let us query our dataframe to look for Total Due greater than one million.
sqldf("select name, OrderDate , sum(SubTotal) SubTotal,sum(taxAmt) taxAmt,sum(TotalDue) TotalDue from adwdf group by name, OrderDate having sum(TotalDue) >1000000")

Using T-SQL code.
SELECT T.[Name],OrderDate, sum(SubTotal) SubTotal,sum(taxAmt) taxAmt,sum(TotalDue) TotalDue FROM Sales.SalesOrderHeader AS S INNER JOIN [Sales].[SalesTerritory] AS T ON T.TerritoryID =S.TerritoryID GROUP BY [Name],OrderDate HAVING sum(TotalDue) >1000000

Conclusion
The R sqldf package is versatile and we can use it to create ways to filter data in a dataframe using familiar T-SQL syntax.
Next Steps
- The reader will need to install RStudio in order to test this tip.
- R Tutorial
- Check out these tips
About the author

View all my tips
Article Last Updated: 2021-02-09