Generate Network Graphs with R in SQL Server 2017
There are hundreds of visualizations available to visualize data using R. One of the niche types of visualizations are network graphs. This category of visualizations uses nodes and edges to represent relationships from source to target. The data required to generate these graphs is hosted generally in columnar or key-value data store that stores data in JSON format. As relational databases have evolved, there is dedicated support for JSON and other data formats in relational databases and are increasingly being used to store graph related data in the form of nodes and edges. SQL Server also has support for storing graph related data.
Generally reporting tools with key visualization like force-directed graphs are required even to explore and visualize the data. And this can become a bottleneck in basic data exploration while managing graph data. In this tip, we will learn how to visually explore network graphs with SQL Server.
R contains powerful freely available libraries for interacting with network graphs. We will be using one of these libraries which will create diverse types of network graphs. SQL Server 2017 ships with an option to install and use the R language with T-SQL. In this tip we will use just T-SQL and R along with a freely available R library for the above-mentioned purpose.
Step 1 - Before starting the below mentioned exercise, ensure that the machine has the required software components i.e. SQL Server 2017, SSMS, and R installed as well as configured. If you need to refer the installation steps, you can follow the installation section of R tutorial.
Force Network Chart Visualization
Step 3 - Once you have installed this package, we are ready to execute our first example. In this example, we using data frames available in the package itself and will be generating a force network chart. You can read about force network graph from here. Open SSMS and open a new T-SQL query window, type or copy the below query and execute the code. Before we execute the code, let’s try to understand this code.
- Sp_execute_external_script is the store procedure indicating which external scripts can be executed in SQL Server.
- @language parameter signifies the script language being used, which is R in this case.
- @script parameter contains the actual R script which will be executed.
- The library function references the networkD3 package for use with rest of the code. Magrittr library is installed when you install the networkD3 library, and the same is being referenced in the code along with networkD3.
- With the data function we are loading two datasets that we will be using to create the network graph. You can look at the data contained in the MisLinks and MisNodes dataset by using the print function.
- Once this graph has been created successfully, we are using the redirection operator %>% to redirect the output of the function to the saveNetwork function from the Magrittr package to save the file at the desired location. We are passing the desired location where we intend to save the graph output in the saveNetwork function.
execute sp_execute_external_script @language = N'R', @script = N' library(networkD3) library(magrittr) # Load Data data(MisLinks) data(MisNodes) # Create Network Chart and save to file forceNetwork(Links = MisLinks, Nodes = MisNodes, Source = "source", Target = "target", Value = "value", NodeID = "name", Group = "group", opacity = 0.8) %>% saveNetwork(file = "C:\\Temp\\Net1.html", FALSE) '
Step 4 - Once the above code executes successfully, the file should get created in the provided location. Open the file in a browser and the visualization would look as shown below.
Step 5 - This visualization is interactive and if you hover your mouse over any node or egde, you would find the visualization respond to it as shown below.
Sankey Chart Visualization
Step 7 - There are four different types of network graphs supported by the network D3 library – force directed network, Sankey chart, Radial network, and Dendograms. Sankey is another very famous network chart used to analyze the flow and distribution from source to target, especially in energy or currency distribution datasets. You can use the below code to create a Sankey diagram. This code has a similar structure to the one explained in point 3, expect the data ingestion part. Here we are reading data from a url using the paste0 function and then converting the json data to list format, before passing it to the function.
execute sp_execute_external_script @language = N'R', @script = N' library(networkD3) library(magrittr) # Read data URL <- paste0("https://cdn.rawgit.com/christophergandrud/networkD3/","master/JSONdata/energy.json") # Convert to list format Energy <- jsonlite::fromJSON("https://raw.githubusercontent.com/apache/incubator-echarts/master/test/data/energy.json") # Create Sankey Chart and save to file sankeyNetwork(Links = Energy$links, Nodes = Energy$nodes, Source = "source", Target = "target", Value = "value", NodeID = "name", units = "TWh", fontSize = 12, nodeWidth = 30) %>% saveNetwork(file = "C:\\Temp\\Net2.html", FALSE) '
Step 8 - Once the code executes successfully and the file is saved, open the file and you will find an interactive Sankey chart with data as shown below.
Radial Network Visualization
Step 9 - Let’s execute one more example that is available in the package documentation itself, which will create a radial network diagram. Execute the below code to create the file with the visualization. This example too has the same code structure, but uses the RadialNetwork function. Data used in these examples is not the primary focus of this tip, as you can read data from anywhere including SQL Server tables. The core engine which lets you visualize and explore the data is the networkD3 package which is the focus of this tip.
execute sp_execute_external_script @language = N'R', @script = N' library(networkD3) library(magrittr) # Read data URL <- paste0("https://cdn.rawgit.com/christophergandrud/networkD3/","master/JSONdata//flare.json") # Convert to list format Flare <- jsonlite::fromJSON(URL, simplifyDataFrame = FALSE) # Filter Data Flare$children = Flare$children[1:3] # Create chart and save to file radialNetwork(List = Flare, fontSize = 24, opacity = 0.9) %>% saveNetwork(file = "C:\\Temp\\Net3.html", FALSE) '
Step 10 - Once you execute the code successfully, open the file in a browser and you will be able to visualize and interact with the chart as shown below.
In this way, you can use the networkD3 packet in R to create interactive network visualizations on your graph data using a few lines of T-SQL code, without any dependency on specialized visualization tools just to explore your data in SQL Server or any other data repositories hosting data related to graphs or networks.
- Consider exploring more options and network visualization functions from the networkD3 package to extract the maximum benefit of the package.
About the author
View all my tips