SQL Server vs Oracle: File Watcher Process

By:   |   Updated: 2023-08-10   |   Comments   |   Related: > SQL Server and Oracle Comparison


Problem

When you're a programmer, sooner or later, you'll need to watch for files to arrive at a specific location; either you need to wait for an asynchronous task to complete unzipping/moving/downloading a file, the file needs to be manipulated by several programs at different stages, you need to transfer several files to another location, etc. You can implement this on your own by subscribing to the host operating system notifications using the C# FileSystemWatcher, the Java WatchService, or similar in another language. In the end, you'll find out you don't get notified promptly every time for every single file. You'll need to add a time-based module to make it failsafe. However, the program will have grown too big at this point, and you have invested so much time tuning it. How can this be simplified, leveraging the database's built-in capabilities in either Oracle or SQL Server?

Solution

SQL Server and Oracle provide out-of-the-box mechanisms to watch for files to arrive at a specific location, so you don't need to watch, tune or debug it. You can subscribe to event notifications from the database, being notified when a file arrives, with the benefit of the file metadata available in the database using a tested and reliable process, and only needs manipulating as you see fit. The only caveat is that neither option works in Docker containers; you need a full Windows installation.

Oracle file watcher process

The Oracle functionality comes from the DBMS_SCHEDULER package, specifically the method CREATE_FILE_WATCHER. But first, you must enable JVM as described in Oracle Semantic Search, steps 1-5. Then you must create an OS credential (see below) by replacing "oracle" with your OS username and replacing "password" with the user password:

BEGIN
  DBMS_CREDENTIAL.CREATE_CREDENTIAL('FILEWATCHERCRED', 'oracle', 'password');
END;

Once that is done, you can create the file watcher in the following way:

BEGIN
  DBMS_SCHEDULER.CREATE_FILE_WATCHER(
    FILE_WATCHER_NAME => 'FILEWATCHER',
    DIRECTORY_PATH => 'C:\setup\DocumentStore\',
    FILE_NAME => '*.txt',
    CREDENTIAL_NAME => 'FILEWATCHERCRED',
    MIN_FILE_SIZE => 1,
    STEADY_STATE_DURATION => INTERVAL '60' SECOND,
    ENABLED => FALSE);
END;

Note the following parameters:

  • DIRECTORY_PATH: Location where new files will arrive.
  • FILE_NAME: You can use wildcards to filter for specific filenames.
  • CREDENTIAL_NAME: The credential created earlier.
  • MIN_FILE_SIZE: Only after the file size is greater than this number of bytes will it be processed by the watcher.
  • STEADY_STATE_DURATION: If the file hasn't been modified in this amount of time, it will be processed by the watcher.

The file watcher has been created in a disabled state because we haven't defined its processing method. Now, let's create the destination table:

CREATE TABLE FILEWATCHERTBL (
  NAME VARCHAR2(30),
  PATH VARCHAR2(30),
  DATETIME TIMESTAMP);

And define the procedure that will populate it:

CREATE OR REPLACE PROCEDURE FILEWATCHERPROC (FWR IN SYS.SCHEDULER_FILEWATCHER_RESULT) AS
BEGIN
  INSERT INTO FILEWATCHERTBL(PATH, NAME, DATETIME)
  VALUES (FWR.DIRECTORY_PATH, FWR.ACTUAL_FILE_NAME, FWR.FILE_TIMESTAMP);
  COMMIT;
END;

Note: The type SCHEDULER_FILEWATCHER_RESULT has many more fields that you can use when inserting the file in the database:

  • DESTINATION: The host name or IP address where the file was found because you can watch for files on another computer.
  • DIRECTORY_PATH: Absolute path of the file.
  • ACTUAL_FILE_NAME: Name of the file.
  • FILE_SIZE: Size of the file in bytes.
  • FILE_TIMESTAMP: Timestamp when the file was found after considering the MIN_FILE_SIZE and STEADY_STATE_DURATION.

Now we need to create a program that will consume the procedure:

BEGIN
  DBMS_SCHEDULER.CREATE_PROGRAM(
    PROGRAM_NAME => 'FILEWATCHERPRG',
    PROGRAM_TYPE => 'STORED_PROCEDURE',
    PROGRAM_ACTION => 'FILEWATCHERPROC',
    NUMBER_OF_ARGUMENTS => 1,
    ENABLED => FALSE);
  DBMS_SCHEDULER.DEFINE_METADATA_ARGUMENT(
    PROGRAM_NAME => 'FILEWATCHERPRG',
    METADATA_ATTRIBUTE => 'EVENT_MESSAGE',
    ARGUMENT_POSITION => 1);
  DBMS_SCHEDULER.ENABLE('FILEWATCHERPRG');
END;

And we need to create a scheduled job that is the link between the file watcher and the program:

BEGIN
  DBMS_SCHEDULER.CREATE_JOB(
    JOB_NAME => 'FILEWATCHERJOB',
    PROGRAM_NAME => 'FILEWATCHERPRG',
    QUEUE_SPEC => 'FILEWATCHER',
    AUTO_DROP => FALSE,
    ENABLED => FALSE);
END;

When everything is in place, we need to enable the job and the file watcher:

BEGIN
  DBMS_SCHEDULER.ENABLE('FILEWATCHERJOB, FILEWATCHER');
END;

When a file is placed in the specified directory, its metadata will be inserted in the database:

PATH                     NAME      DATETIME
------------------------ --------- -----------------------------
C:\setup\DocumentStore\  new1.txt  19-JUN-23 05.09.31.000000 AM

There is one additional configuration for the failsafe method. In this case, we set it to check every 60 seconds:

BEGIN
  DBMS_SCHEDULER.SET_ATTRIBUTE('FILE_WATCHER_SCHEDULE', 'REPEAT_INTERVAL', 'FREQ=SECONDLY; INTERVAL=60');
END;

There is also one additional configuration to launch multiple procedures when multiple files are found:

BEGIN
  DBMS_SCHEDULER.SET_ATTRIBUTE('FILEWATCHERJOB', 'PARALLEL_INSTANCES', TRUE);
END;

If there are issues and the file metadata is not getting inserted, you can first search for the failed reason in the database:

SELECT ERROR#, ADDITIONAL_INFO, DESTINATION
  FROM SYS.SCHEDULER$_JOB_RUN_DETAILS
 WHERE LOG_ID=(SELECT MAX(LOG_ID) FROM SYS.SCHEDULER$_JOB_RUN_DETAILS);

You can also search for errors in the trace folder with the following PowerShell command:

Get-ChildItem -Path "C:\app\diag\rdbms\orcl\orcl\trace" -File | Where-Object {$_.Name -like "*_j*.trc"} | Sort-Object LastWriteTime -Descending| Select-Object -First 20 | Select Length, Name, LastWriteTime, CreationTime

If you want to view the existing file watchers, you can use the queries below:

SELECT OBJ#, DIRECTORY_PATH FROM SCHEDULER$_FILE_WATCHER;
SELECT OWNER, FILE_WATCHER_NAME, ENABLED FROM DBA_SCHEDULER_FILE_WATCHERS;

SQL Server file watcher process

The SQL Server functionality requires FILESTREAM to be enabled in the SQL Server Configuration Manager console. If you want to use PowerShell, you can use the code below specifying your instance name (see this link and this link):

$instance = "MSSQLSERVER"
$wmi = Get-WmiObject -Namespace "ROOT\Microsoft\SqlServer\ComputerManagement15" -Class FilestreamSettings | where {$_.InstanceName -eq $instance}
$wmi.EnableFilestream(3, $instance)
Get-Service -Name $instance | Restart-Service
PowerShell EnableFilestream

Note: This command will create a shared folder named "mssqlserver," but the name can be changed.

Note: The ReturnValue must be 0.

Later, when you put files in the shared folder, it will look like this, where "Win-qsulb167kkl" is the computer name:

Shared folder created

Once done, you need to enable it at the instance level using the code below:

EXEC sp_configure filestream_access_level, 2
RECONFIGURE

Now, when you create a database, you need to specify the CONTAINS FILESTREAM option in a filegroup shown in the code below:

CREATE DATABASE Archive 
ON
PRIMARY ( NAME = Arch1, FILENAME = 'C:\setup\archdat1.mdf'),
FILEGROUP FileStreamGroup1 CONTAINS FILESTREAM ( NAME = Arch3, FILENAME = 'C:\setup\filestream1')
LOG ON ( NAME = Archlog1, FILENAME = 'C:\setup\archlog1.ldf');

Later, when you put a file into the shared folder, it will internally look like this:

Filestream path containing the file

Now you need to enable FILETABLE using the code below:

ALTER DATABASE Archive SET FILESTREAM ( NON_TRANSACTED_ACCESS = FULL, DIRECTORY_NAME = N'filetable1' )

You can create the FILETABLE with the code below:

CREATE TABLE DocumentStore AS FileTable WITH ( FileTable_Directory = 'DocumentStore', FileTable_Collate_Filename = database_default );

When you put a file in the shared folder, it will automatically be inserted in this table in the following way:

FileTable content

There are several things you need to be aware of:

  • SQL Server automatically creates the computer share; you don't do it manually.
  • The permissions on the computer share can only be managed by SQL Server; if a user wants to read or write to it, the login and user must be created in the server instance and database and grant permissions on the FileTable.
  • A system administrator can't manage the permissions on the computer share or the internal directories; it is only managed by SQL Server and can't be overridden.
  • The FileTable has computed columns, so you can't use Query Notifications directly; you need to use a trigger to a second table or manually notify the client application.

Conclusion

The Oracle FileWatcher provides more configuration options, can be fine-tuned to your needs, and you can use Query Notifications with it. The SQL Server FileTable involves more steps and is inflexible to filter files but inserts the file metadata immediately when a file arrives; however, you can't use Query Notifications immediately.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Pablo Echeverria Pablo Echeverria is a talented database administrator and C#.Net software developer since 2006. Pablo wrote the book "Hands-on data virtualization with Polybase". He is also talented at tuning long-running queries in Oracle and SQL Server, reducing the execution time to milliseconds and the resource usage up to 10%. He loves learning and connecting new technologies providing expert-level insight as well as being proficient with scripting languages like PowerShell and bash. You can find several Oracle-related tips in his LinkedIn profile.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips


Article Last Updated: 2023-08-10

Comments For This Article

















get free sql tips
agree to terms