Scripts to Build a Network of Connected Records Using SQL Server

By:   |   Comments   |   Related: > TSQL


Problem

You need a way to store records that are connected together. Let's say we have five records A, B, C, D, E.  We want to store this data in a SQL Server database to show that A, B and E are connected and C and D are connected.  In this tip we will go over the code to accomplish this task.

Solution

For this solution, we will be storing integer pairs that are connected to each other. The connectivity relation needs to be reflexive, symmetric and transitive meaning:

  1. Reflexive - For any object p it is connected to itself.
  2. Symmetric - If object p is connected to object q then q is connected to p.
  3. Transitive - If object p is connected to object q and object q is connected to object r then object p is connected to object r.

The algorithm has three major purposes:

  1. To initialize the network structure.
  2. To build the network of connected devices using a UNION operation.
  3. To check if a record is connected to another record using the isConnected query operation.
  4. To find all connected nodes based on a node.
  5. To disconnect a node from the network of nodes.

The solution I found for this problem is to create a table (QFT) with an ID and Value columns that represent a general array of integers. For the initialization operation, we use a stored procedure (QFInit) that fills the table with N rows from row (1,1) through (N,N) this represents a network of N unconnected devices, so all records will be connected to themselves.

For the isConnected query, we use a function (dbo.QFIsConnected) that returns a bit value 0/1. If 0, the records are not connected to the other record and if 1 the records are connected. It will check the values for both p and q and if it is the same value then the devices are connected and if not they are not connected.

For the Union operation we use another stored procedure (QFConnect). The procedure finds the values corresponding to cells p and q and updates all the values in the table where the value equals the cell p value to the value of cell q.

Here is the T-SQL script of the stored procedures and function that I have used to solve this problem.

-- First create an empty table that will serve as the storage of the algorithm:
CREATE TABLE QFT (id int , v int)
go

-- This procedure just creates a table of rows for the test
CREATE PROCEDURE QFInit (@N int) 
as
  begin
 declare @i int
 set @i = 1
 while @i <= @N 
 BEGIN
   insert into QFT (id,v) values (@i,@i);
   set @i=@i+1;
 END
  end
go

-- This check to see if two nodes are connected:
CREATE FUNCTION dbo.QFIsConnected (
 @p INT
 ,@q INT
 )
RETURNS BIT
AS
BEGIN
 DECLARE @a INT
 DECLARE @b INT
 DECLARE @r BIT

 SELECT @a = v
 FROM QFT
 WHERE id = @p;

 SELECT @b = v
 FROM QFT
 WHERE id = @q;

 IF (@a = @b)
  SET @r = 1
 ELSE
  SET @r = 0

 RETURN @r
END
GO

-- This procedure connects two nodes together
CREATE PROCEDURE QFConnect (
 @p INT
 ,@q INT
 )
AS
BEGIN
 DECLARE @a INT
 DECLARE @b INT

 SELECT @a = v
 FROM QFT
 WHERE id = @p;

 SELECT @b = v
 FROM QFT
 WHERE id = @q;

 UPDATE QFT
 SET v = @b
 WHERE v = @a
END
GO

-- This procedure disconnects a node from all other nodes
CREATE PROC [dbo].[QFDisconnect] (@node INT)
AS
BEGIN
 DECLARE @maxnode INT
 DECLARE @secondmax INT

 SELECT @maxnode = max(v)
 FROM QFT
 WHERE id = @node

 SELECT @secondmax = max(id)
 FROM QFT
 WHERE id < @maxnode
  AND v = @maxnode

 IF (@node = @maxnode)
 BEGIN
  UPDATE QFT
  SET v = @secondmax
  WHERE v = @node;
 END

 UPDATE QFT
 SET v = @node
 WHERE id = @node
END
GO

-- this function returns a list of all nodes that are connected to a specific node
CREATE FUNCTION dbo.QFListNodesPath (@node INT)
RETURNS TABLE
AS
RETURN (
   SELECT t.id
   FROM QFT t
   WHERE t.v = (
                SELECT v
                FROM QFT s
                WHERE s.id = @node
                )
   )
GO

Example Usage

Using the above code, we will implement a network of ten components with connectivity between nodes as shown below:

1---4---5---8
2---3---6
7---9---10

From the above, we can see that:

  • 1, 4, 5 and 8 are connected
  • 2, 3 and 6 are connected
  • and 7, 9 and 10 are connected

Below we will create a starting point of 10 records and each record will point to itself at first.

-- Step 1 : Initialize the structure with a table with 10 rows
exec QFInit 10;
go

Next we will connect the rows together as follows.

-- Step 2 : build the structure using a series of calls to the QFConnect procedure
-- connect 1, 4, 5 and 8
exec QFConnect 1,4
exec QFConnect 4,5
exec QFConnect 5,8

-- connect 2,3 and 6
exec QFConnect 2,3
exec QFConnect 3,6

-- connect 7, 9 and 10
exec QFConnect 7,9
exec QFConnect 9,10

Now we can see which records are connected to each other after running the above.

-- Step 3 : check table details 
select * from QFT;

This is what is returned:

id    value
===   =====
1     8
2     6
3     6
4     8
5     8
6     6
7     10
8     8
9     10
10    10

We can also use the following to see if specific nodes are connected together.  We just need to pass in two node IDs.

-- Step 4 : check connections using the scalar dbo.QFIsConnected function for several values:
select dbo.QFIsConnected (1,4)  as conn_1_4,
       dbo.QFIsConnected (1,8)  as conn_1_8,
       dbo.QFIsConnected (1,9)  as conn_1_9,
       dbo.QFIsConnected (4,2)  as conn_4_2,
       dbo.QFIsConnected (3,6)  as conn_3_6,
       dbo.QFIsConnected (3,7)  as conn_3_7,
       dbo.QFIsConnected (7,10) as conn_7_10

The result is as follows: 

conn_1_4 conn_1_8 conn_1-9 conn_4_2 conn_3_6 conn_3_7 conn_7_10
-------- -------- -------- -------- -------- -------- ---------
    1        1        0        0        1        0        1

If the value is 1 then the nodes are connected, if 0 they are not connected.

  • 1 is connected to 4
  • 1 is connected to 8
  • 1 is not connected to 9
  • 4 is not connected to 2
  • 3 is connected to 6
  • 3 is not connected to 7
  • 7 is connected to 10

We can also use this function to find all nodes that are connected.  So if we want to find out which nodes are connected to 1 we can run the following:

-- this will list nodes connected to 1
SELECT * FROM dbo.QFListNodesPath(1)

This returns the following:

id
--
1
4
5
8

If we want to disconnect a node we can run the following.  This will disconnect the node from all other nodes.

-- this will disconnect node 1 from all other nodes
exec QFDisconnect 1

After running this we can check the nodes for node 1 again.

-- this will list nodes connected to 1
SELECT * FROM dbo.QFListNodesPath(1)

This returns this result:

id
--
1

If we check for node 4 connections:

-- this will list nodes connected to 4
SELECT * FROM dbo.QFListNodesPath(4)

The above returns these results:

id
--
4
5
8
Next Steps
  • Compile and execute the procedures and functions on your SQL Server database environment if your needs require to store data using this approach.
  • The time complexity of the QF algorithm for N operation is considered to be bad O(N*N) for n union operations. In order to improve it, you can index the id column of the table and use the index in your operations. That will make the time complexity of the operation for N operations O(N*LOG(N)) when you use this code.
  • The code was tested on Microsoft SQL Server 2014 and 2016.


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Eli Leiba Eli Leiba is a senior application DBA, a teacher and a senior database consultant with 19 years of RDBMS experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article

















get free sql tips
agree to terms