Delete Files with SQL Server 2016 R Logic in T-SQL Code

By:   |   Comments   |   Related: > TSQL


Problem

How can I delete files older than [X] days in a folder using T-SQL?

This is an old requirement, as old as SQL Server 7 (or even earlier). The typical T-SQL solution is to use xp_cmdshell or an undocumented stored procedure xp_delete_file.

These options work, but there are a number of items to consider:

  1. xp_cmdshell may not be allowed as per company security policy or auditing requirements.
  2. Undocumented stored procedures are never recommended for use in production environment as you never know whether your next patch will break it.
  3. For complex deletion logic such as deleting files older than 3 days, but only when file size is bigger than 1MB and the filename contains the string 'temp' are very difficult to program using the xp_cmdshell or xp_delete_file options.

Another question is how to list files in a folder?

The typical solution is to use xp_cmdshell or write a CLR stored procedure/function, but CLR integration is considered relatively more difficult to develop and maintain than a T-SQL script.

Solution

Starting with SQL Server 2016, we are able to use the R language directly inside SQL Server via sp_execute_external_script.

The assumption here is that you have installed R service (in-database). For detailed installation steps, please check this MSDN document.

Before plugging R code into a SQL script, it is best to test the R script in a dedicated R IDE. I personally prefer RStudio Desktop IDE, and it is free.

Once the R service is installed and configured, we can prepare the demo case.

Let's say I have a folder with a few files in it as shown below.

Folder files

Figure 1 - Sample Directory

Native R Script Demonstration

All the R code below is run in the RStudio IDE.

List Files with R

I will run the following script in RStudio to do the following tasks:

  1. List files in a folder
  2. List files whose names match a condition
  3. List files whose property, such as size, creation time, matches a condition
  4. List files with multiple conditions

All the script functions are detailed in the comments below:

#1. list files in a folder
dir(path="c:/MyTest", full.names =  TRUE) #R is case-sensitive, so True <> TRUE

#2.1. list files with names containing "temp" literal value, pattern parm accepts RegEx
dir(path="c:/MyTest", pattern = "temp") #R is case-sensitive, so True <> TRUE

#2.2. list files with names containing number(s) by using regular expression \d
dir(path="c:/MyTest", pattern = "\\d", full.names = T) # T means TRUE

#3. list files created more than 7 days ago.
fi <- file.info(dir(path="c:/Mytest", full.names = T));
fi[difftime(Sys.time(), fi[,"ctime"], units="days") > 7, c( 'ctime', 'size')]

#4. list files created within 7 days and size bigger than 1000 bytes
fi <- file.info(dir(path="c:/Mytest", full.names = T));
fi[difftime(Sys.time(), fi[,"ctime"], units="days") <= 7 & fi['size'] > 1000, c('ctime', 'size')]

The execution result is as follows in the RStudio IDE, you can check the result against the info in Figure 1.

Results of List Files R Code

Figure 2 - Results of List Files R Code

Delete Files with R

Now we will look at how to delete files. The following script tasks are:

  1. Delete a file
  2. Delete multiple files based on a criteria
  3. Delete a whole directory

Here is the script:

#1. del some specified files
file.remove("c:/Mytest/result.jpg", "c:/mytest/mytext_1.txt")
# or you can do
unlink(c("c:/Mytest/result.jpg", "c:/mytest/mytext_1.txt"));

#2.1. del files whose names contain "temp"
file.remove(dir(path="c:/MyTest/", pattern = "temp", full.names = T))
# or you can do this way
unlink("c:/Mytest/*temp*.*");

#2.2. del files older than 7 days
# first find files that are older than 7 days
fi <- file.info(dir(path="c:/Mytest", full.names = T));
fi <- fi[difftime(Sys.time(), fi[,"mtime"], units="days") > 7, 1:3];
file.remove(rownames(fi));
#or you can do this way to replace the file.remove()
unlink(rownames(fi));

#3. to delete the whole directory c:/Mytest, including its sub-folders if any
unlink("c:/mytest", recursive = T, force = T)

In summary, if we want to find or delete a file based on its properties, i.e. size, creation time or modification time or others, we need to use file.info function to filter out what we need, then we can work on the result.

Using R Code in T-SQL

Now we will embed the above R scripts into T-SQL via sp_exec_external_script. For more examples, please see MSDN document.

List files with R in T-SQL

Let's look at the t-sql code first

--1. list files in a folder
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'file_list <- as.data.frame(dir(path="c:/MyTest", full.names =  TRUE));'
, @output_data_1_name = N'file_list'  
with result sets (([AllFiles] varchar(256)))
go

--2. list files with names containing "temp" literal value, pattern parm accepts RegEx
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'file_list <- as.data.frame(dir(path="c:/MyTest", pattern = "temp"))'
,  @output_data_1_name = N'file_list'  
with result sets (([FileName_Has_Temp] varchar(256)))
go

--3. list files with names containing number(s)
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'file_list <- as.data.frame(dir(path="c:/MyTest", pattern = "\\d", full.names = T))'
,  @output_data_1_name = N'file_list'  
with result sets (([FileName_Has_Num] varchar(256)))
go

--4. list files created more than 7 days ago.
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'fi <- file.info(dir(path="c:/Mytest/", full.names = T));
   fi <- fi[difftime(Sys.time(), fi[,"ctime"], units="days") > 7, c("ctime", "size")];
   fi["Name"] <- rownames(fi);
   file_list <- as.data.frame(fi[c("Name", "ctime")]);'
,  @output_data_1_name = N'file_list'  
with result sets (([File_7+Days_Old] varchar(256), Create_Time datetime))
go 

--5. list files created within 7 days and size bigger than 1000 bytes
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'fi <- file.info(dir(path="c:/Mytest2", full.names = T));
            fi <- fi[difftime(Sys.time(), fi[,"ctime"], units="days") <= 7 & fi["size"] > 1000, c("ctime", "size")];
   fi["name"] <- rownames(fi)
   file_list <- as.data.frame(fi[c("name", "ctime","size")])'
,  @output_data_1_name = N'file_list'  
with result sets (([File < 7Days with Size > 1000] varchar(256), ctime datetime, size int))
go 

If you run the script in a SSMS window that connects to a SQL Server 2016 instance with R service installed, you will get the following results:

]T-SQL Results of running R Code

If we compare the R code embedded in T-SQL with the native R code examples we listed before, we can see there are some differences. In T-SQL, we need to use as.data.frame function to convert the result into a Data Frame structure before the R result can be displayed by T-SQL, otherwise, an error like the following will occur:

Msg 39004, Level 16, State 20, Line 1
A 'R' script error occurred during execution of 'sp_execute_external_script' with HRESULT 0x80004004.
Msg 39019, Level 16, State 1, Line 1
An external script error occurred: 
Error: The output dataset must be of type data frame; consider using data.frame() to convert it.

Delete files with R in T-SQL

Deleting files does not need to return any result set, so we do not need as.data.frame in R code

--1. delete specific files
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'file.remove("c:/Mytest/result.jpg", "c:/mytest/mytext_1.txt");';

-- or
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'unlink(c("c:/Mytest/result.jpg", "c:/mytest/mytext_1.txt"))'
go

-- 2. delete files with names containing "temp"
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'file.remove(dir(path="c:/MyTest/", pattern = "temp", full.names = T))'

-- or
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'unlink("c:/Mytest/*temp*.*")'
go

-- 3. delete files older than 7 days
-- first find files that are older than 7 days
EXECUTE [sys].[sp_execute_external_script] @language = N'R',
@script = N'
fi <- file.info(dir(path="c:/Mytest", full.names = T));
fi <- fi[difftime(Sys.time(), fi[,"ctime"], units="days") > 7, c("ctime", "size")];
file.remove(rownames(fi)) # or use this: unlink(rownames(fi));' 
go

After running all the deletion statements in SSMS, all files will be gone under C:\MyTest folder.

A tip here is that when I refer to a folder, I use / (forward slash) instead of \ (back slash), because \ is an escape letter in R, so if we refer to c:\Mytest in R code, we should write c:\\Mytest. To save time, we can simply code it as c:/Mytest

Summary

In this tip, we have discussed how to use the R language to do file listing and deletion, this actually is a good replacement for a CLR stored procedure.

All the code in this tip was tested in an environment using SQL Server 2016 Developer edition on a Windows 10 computer.

R language is mainly for data science, but it is powerful enough for DBAs to use it innovatively for daily work. To allow T-SQL to embed R code directly, Microsoft starts to treat T-SQL more like a first class citizen in the MS language portfolio.

Sp_execute_external_script has a parameter @language, currently the only valid value is 'R', but it is reasonable to guess that in the future this stored procedure will support other languages, such as Python, JavaScript, both of which are popular, cross-platform languages, just like R. Coincidently, SQL Server vNext will run across multi-platforms as well.

With PowerShell to be run in Linux/Mac soon, I hope T-SQL can also embed PowerShell code someday, if all this become true, TSQL itself will be both a language and a platform for administration and development tasks, and CLR integration may be minimized due to its complexity and cost for maintenance and development.

Next Steps

R language opens a door for both DBAs and database developers, and we may do lots of things differently yet more efficiently with R, so it is really worthwhile to dig deeper into the R language.

Please check the following for further information:



sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Jeffrey Yao Jeffrey Yao is a senior SQL Server consultant, striving to automate DBA work as much as possible to have more time for family, life and more automation.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article

















get free sql tips
agree to terms