Installing external R packages to use with SQL Server 2017
SQL Server 2017 supports execution of R scripts from T-SQL as part of In-Database Machine Learning Services. R is an open-source statistical programming language. R has a very wide community of statistical, mathematical and data science professionals who develop custom R packages and contribute the same to R. Packages are libraries with ready to use logic in the form of functions. There are more than 10,000 R packages and new packages keep getting developed every day.
When R is installed in SQL Server 2017, it does not install these thousands of packages and without the availability of these packages, the functionality of R remains very limited. In this tip we will look at how to install and use R packages in SQL Server 2017.
R libraries can be installed from the R console, so that they can be accessed from T-SQL.
SQL Server contains a variety of functions to work with data of different data types, but when it comes to statistical and mathematical computing, languages like R which specializes in statistical computing, machine learning, natural language processing and other such functionalities takes precedence.
Below is a list of some useful R packages that can be used for processing SQL Server data.
- dplyr - Essential shortcuts for subsetting, summarizing, rearranging, and joining together data sets. dplyr is the go to package for fast data manipulation in R. Data read from T-SQL can be manipulated much faster with this package.
- ggplot2 - R's famous package for making beautiful graphics. ggplot2 lets you use the grammar of graphics to build layered, customizable plots for visual statistical analysis. The graphs generated using ggplot2 are not directly available in SSRS or PowerBI too.
- XLConnect, xlsx, readxl - These packages help you read and write Microsoft Excel files from R. You can also just export your spreadsheets from Excel as .csv's. This package can be useful to ingest data from Excel in SQL Server using R.
- XML, jsonlite, httr – Read and create XML, JSON files. You can also use work directly with http connections.
- SparkR - R frontend for Spark. This package can be used to indirectly integrate SQL Server and Spark using R.
- RODBC - ODBC database access for R. This package can be used in R for connecting to any external database using ODBC.
- h2o - Machine Learning, Random forests, GBM, KMeans, PCA, GLM and other deep learning analysis can be performed using this package. H2O is one of the most popular packages and is available for a variety of machine learning languages like R, Python and others. Using this package with R would mean that indirectly it can be used with SQL Server too for advance analytics.
- quantmod - Quantitative Financial Modeling & Trading Framework for R. This package is specifically useful for financial data analysis hosted in SQL Server.
There are thousands of R packages and the above list is just a glimpse of the packages that can be easily used with data hosted in SQL Server. Based on the type of data and the intended analysis, any of the packages in R can be installed and used. These thousands of packages are not installed in R by default, it’s up to the user to install the packages of choice as per the need. In the case where you use a package (i.e. library), that is not installed in the R script and execute the package from T-SQL, it will throw an exception.
In order to understand the kind of error that one would face due to the absence of a package, let’s consider to use a package that is not installed by default in R with SQL Server 2017. One such package is “Readxl”. This package can be used to read data from Excel and is not installed by default.
Libraries or packages can be used in R using the library function. To use a package in the R Script that is executed from T-SQL, the same function has to be used as shown below. Once an R script is executed from T-SQL using the execute_external_script procedure, and if the script refers to a package that is not existent, then it throws an exception as shown below. Here I have used the readxl package in the script, and as the package is not installed, the script throws an exception.
The next step to solve this issue is by installing this package. In order to install, navigate to the directory as shown below. The bin directory contains an application named R, which is the R console. From this console one can install R packages. Consider executing this console with Administrator privileges, as at times it may require Administrator privileges to download packages from the internet and install them.
Installation of the package is a fairly straight-forward process. Type the command in the following format install.packages(“<package name>”) as shown below. This will initiate downloading the package using your internet connection and will install in your R installation. Once the package has been downloaded, extracted and installed, it should look as shown below.
As soon as the installation is complete, the package is ready to use from T-SQL. Open SSMS and execute the R Script that uses this package, and this time it should work as shown below.
In this way, you can install any custom or external packages in R, so that you can use this in your R scripts from T-SQL using SQL Server 2017.
- You can also install packages using an R application from Visual Studio 2017. Try installing packages with VS 2017 and check whether you are able to use the package in T-SQL.
Last Updated: 2017-07-18
About the author
View all my tips