Create SQL Server Graphics with ggplot2 and R Services


By:   |   Updated: 2020-12-30   |   Comments   |   Related: More > R Language


Problem

In this article we will cover how you can use SQL Server R Services and create charts and graphs to plot data from R queries.

Solution

R has a package called ggplot2 that allows developers to create statistical graphs. In this article, I will show you how to install this package and then use it to plot some values from the AdventureWorks2014 database.

I am using SQL Server 2016 and this SQL Server version does not allow us to plot graphs directly in SQL Server Management Studio. However, we can create an external file that contains our graph. The following steps will show how to work with statistical graphics in RStudio first. This tip will end with an example on how to create graphs using SSMS

Step 1: First, we need to install the statistical ggplot2 package. This is executed by the following command:

install.packages("ggplot2")
install ggplot2 package

Step 2: As an example, we will use AdventureWorks2014 database and the Sales.SalesOrderHeader and Sales.SalesTerritory tables to create graphs.

SELECT t.Name, YEAR(orderdate) [Year], Month(orderdate) [Month], concat(cast(YEAR(orderdate) as char(4)),' - ', DATENAME(month,orderdate)) [Year_Month_Name], 
SUM(s.SubTotal) AS [Sub_Total], 
STR(Sum([TaxAmt])) AS [Total_Taxes], 
STR(Sum([TotalDue])) AS [Total_Sales]
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesTerritory as t ON s.TerritoryID = t.TerritoryID
GROUP BY t.Name,YEAR(orderdate), Month(orderdate), DATENAME(month,orderdate),concat(cast(YEAR(orderdate) as char(4)),' - ', DATENAME(month,orderdate))
ORDER BY t.Name,YEAR(orderdate), Month(orderdate)
Sample Data From AdventureWorks2014 database

Step 3: Next, we will create an R script in RStudio that connects to SQL Server and reads the data returned from the above query and plot a graph.

The R code below connects to our SQL Server database and reads the data into a dataframe called gmt.

library(ggplot2)
library(RODBC)
library(dplyr)
 sqlcnt <- odbcDriverConnect('driver={SQL Server Native Client 11.0};uid=xxxxxxxxx;pwd=xxxxxxx;server=xxxxxxxxx;database=AdventureWorks2014;rows_at_time=1024')
 sqlQry <- "SELECT t.Name, YEAR(orderdate) [Year], Month(orderdate) [Month], concat(cast(YEAR(orderdate) as char(4)),' - ', DATENAME(month,orderdate)) [Year_Month_Name], 
SUM(s.SubTotal) AS [Sub_Total], 
STR(Sum([TaxAmt])) AS [Total_Taxes], 
STR(Sum([TotalDue])) AS [Total_Sales]
FROM Sales.SalesOrderHeader AS s
INNER JOIN Sales.SalesTerritory as t ON s.TerritoryID = t.TerritoryID
GROUP BY t.Name,YEAR(orderdate), Month(orderdate), DATENAME(month,orderdate),concat(cast(YEAR(orderdate) as char(4)),' - ', DATENAME(month,orderdate))
ORDER BY t.Name,YEAR(orderdate), Month(orderdate) "
 gmt <- sqlQuery(sqlcnt,sqlQry, stringsAsFactors = F,as.is = T)

To verify that the data has been loaded correctly, issue the following command:

View(gmt)
Data Frame Contents

Step 4: Finally, we have our data and it is time to use R function ggplot.

For this example, we call ggplot() function with dataframe gmt as an input parameter. Use the aes() function to construct aesthetic mappings by passing the X and Y coordinates.

ggplot(gmt, aes(x=Year_Month_Name, y=Sub_Total)) + geom_point()
RStudio Command

The output graph is as follows:

Output graph

We have just produced our first graph. However, it is too dense because we are trying to plot too much data at once.

Filter the output to year 2012 and name Australia.

ggplot(filter(gmt, Name == "Australia" & Year == "2012"), aes(x=Year_Month_Name, y=Sub_Total)) + geom_point()
graph

The function aes(), aesthetic mappings used with ggplot() is versatile and allows us to re-arrange and build multiple types of graphs.

For example, we can break down our graph by month and summarize it by Total Taxes as follows.

ggplot(data=filter(gmt, Name == "Australia" & Year == "2012")) +
       geom_point(mapping = aes(x=Year_Month_Name, y=Sub_Total, color =Year_Month_Name, size = Total_Taxes ) ) +
       facet_wrap(Year ~ Month)
graph

In the next example, we will display only the Total Sales of the last quarter of 2012.

ggplot(data=filter(gmt, Name == "Australia" & Year == "2012" & (Month == "10" | Month == "11" | Month == "12")) ) +
       geom_point(mapping = aes(y=Year_Month_Name, x=Sub_Total, color =Year_Month_Name, size = Total_Sales ) ) 
total sales graph

Step 5: Let's see how it works using SSMS.

Since it is not possible to visualize an R graph inside SSMS, we must redirect the Graph output to an external file. The following code shows how this can be done.

USE AdventureWorks2014;
GO

DECLARE @rscript NVARCHAR(MAX);
SET @rscript = N'
  # import scales package
  library(ggplot2)
  library(dplyr)
  #
  # Define our output file name and location
  #
  GraphOutFile <- "C:\\DataFiles\\GraphOut.png"
  png(filename=GraphOutFile, width=500, height=500)
  #
  # Load the DataFrame with data from our query
  #
  gmt <- InputDataSet
  #
  # Use ggpot() function to create our graph
  #
  gmt1 <- filter(gmt, NName == "Australia" & Year == "2012" & (Month == "10" | Month == "11" | Month == "12"))
  myGraph <- ggplot(data=gmt1) + 
  geom_point(mapping = aes(y=Year_Month_Name, x=Sub_Total, color = Year_Month_Name, size = Total_Sales ) )
  #
  # Redirect our output to the files system
  #
  print(myGraph)
  dev.off()';

DECLARE @sqlscript NVARCHAR(MAX);
SET @sqlscript = N'
      SELECT t.Name as NName, YEAR(orderdate) [Year], Month(orderdate) [Month], 
      concat(cast(YEAR(orderdate) as char(4)),'' - '', DATENAME(month,orderdate)) [Year_Month_Name], 
      SUM(s.SubTotal) AS [Sub_Total], 
      STR(Sum([TaxAmt])) AS [Total_Taxes], 
      STR(Sum([TotalDue])) AS [Total_Sales]
      FROM Sales.SalesOrderHeader AS s
      INNER JOIN Sales.SalesTerritory as t ON
      s.TerritoryID = t.TerritoryID
      GROUP BY t.Name,YEAR(orderdate), Month(orderdate), DATENAME(month,orderdate),
      concat(cast(YEAR(orderdate) as char(4)),'' - '', DATENAME(month,orderdate))
      ;';
EXEC sp_execute_external_script
  @language = N'R',
  @script = @rscript,
  @input_data_1 = @sqlscript
  WITH RESULT SETS NONE; 
GO 
r code

We can see the output file has been created in our file system.

file system

We can open it using the Windows Paint application to see the visualizations.

r graph

Conclusion

The R function ggplot() is versatile and can be used to create many types of statistical graphs. I recommend you check out this resource to see all the possible parameters of each function.

Next Steps


Last Updated: 2020-12-30


get scripts

next tip button



About the author
MSSQLTips author Matteo Lorini Matteo Lorini is a DBA and has been working in IT since 1993. He specializes in SQL Server and also has knowledge of MySQL.

View all my tips



Comments For This Article





download





Recommended Reading

Import Data from PDF to SQL Server

Using Simple Linear Regression to Make Predictions

sqldf in R Example for SQL Server

Index Numbers Explained with Examples in R

Exploring Four Simple Time Series Forecasting Methods with R Examples














get free sql tips
agree to terms