SQL Server vs Oracle: Data Loading

By:   |   Updated: 2022-03-23   |   Comments   |   Related: More > Other Database Platforms


Problem

There are multiple tools and methods for loading data into a database, such as tools like SQL Server Management Studio or SQLDeveloper, using T-SQL and PL/SQL, third-party software, etc. Some specific tools are Bulk Copy Program (BCP) for SQL Server and SQL*Loader (SQLLDR) for Oracle. We will take a look at both of these to see how you can load data into SQL Server and Oracle.

Solution

In the next section you will see the parameters available for both BCP and SQLLDR, and we will load data into a database using both.

For data loading I’m going to get some tweets to do a sentiment analysis, so I’m going to use the PowerShell script below:

$APIKey = '<YourKeyHere>' 
$SearchWord = 'sql' 
$OutFile = 'C:\Temp\input.txt' 
$APIKeySecret = [System.Convert]::ToBase64String([System.Text.Encoding]::ASCII.GetBytes($APIKey)) 
$result = Invoke-RestMethod -Method Post -Uri https://api.twitter.com/oauth2/token -ContentType "application/x-www-form-urlencoded;charset=UTF-8" -Headers @{'Authorization'='Basic '+$APIKeySecret} -Body 'grant_type=client_credentials' 
$OAuthToken = $result | select -ExpandProperty access_token 
$result = Invoke-WebRequest -Uri ('https://api.twitter.com/1.1/search/tweets.json?q='+$SearchWord+' exclude:retweets&lang=en&count=100') -Headers @{'Authorization'='Bearer '+$OAuthToken} 
Clear-Content $OutFile 
Add-Content $OutFile (($result | select -ExpandProperty Content | ConvertFrom-Json | select -ExpandProperty statuses).text -replace "`n|`r" | select-string -pattern $SearchWord) 
  • Line 1 sets my Twitter API key, which you can obtain from Twitter.
  • Line 2 specifies the word I’m going to search, this what I will perform a sentiment analysis on later.
  • Line 3 sets my output file.
  • Line 4 converts my Twitter key into base 64.
  • Lines 5 and 6 get from Twitter the OAuthToken to be used when performing searches.
  • Line 7 performs the search, excluding retweets, in English language, requiring 100 tweets to be returned (otherwise it only returns 15).
  • Line 8 clears the output file.
  • Line 9 saves only the text response from Twitter (because it returns XML with all kinds of information) removing the ads.

With this file in place, we can load the records into both databases to test BCP and SQLLDR.

Oracle SQLLDR

The default tool for data loading in Oracle is named SQL*Loader (SQLLDR), which from the official documentation states it has a powerful data parsing engine that puts little limitation on the format of the data in the data file.

You can use SQL*Loader to do the following:

  • Load data across a network if your data files are on a different system than the database.
  • Load data from multiple data files during the same load session.
  • Load data into multiple tables during the same load session.
  • Specify the character set of the data.
  • Selectively load data (you can load records based on the records' values).
  • Manipulate the data before loading it, using SQL functions.
  • Generate unique sequential key values in specified columns.
  • Use the operating system's file system to access the data files.
  • Load data from disk, tape, or named pipe.
  • Generate sophisticated error reports, which greatly aid troubleshooting.
  • Load arbitrarily complex object-relational data.
  • Use secondary data files for loading LOBs and collections.
  • Use conventional, direct path, or external table loads.

If you run it on the command line with no parameters, it shows the usage as below:

PS C:\> sqlldr
SQL*Loader: Release 19.0.0.0.0 - Production on Mon Feb 21 16:58:37 2022 
Version 19.3.0.0.0 
Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved. 
Usage: SQLLDR keyword=value [,keyword=value,...] 
Valid Keywords: 
    userid -- ORACLE username/password 
   control -- control file name 
       log -- log file name 
       bad -- bad file name 
      data -- data file name 
   discard -- discard file name 
discardmax -- number of discards to allow          (Default all) 
      skip -- number of logical records to skip    (Default 0) 
      load -- number of logical records to load    (Default all) 
    errors -- number of errors to allow            (Default 50) 
      rows -- number of rows in conventional path bind array or between direct path data saves 
               (Default: Conventional path 250, Direct path all) 
  bindsize -- size of conventional path bind array in bytes  (Default 1048576) 
    silent -- suppress messages during run (header,feedback,errors,discards,partitions) 
    direct -- use direct path                      (Default FALSE) 
   parfile -- parameter file: name of file that contains parameter specifications 
  parallel -- do parallel load                     (Default FALSE) 
      file -- file to allocate extents from 
skip_unusable_indexes -- disallow/allow unusable indexes or index partitions  (Default FALSE) 
skip_index_maintenance -- do not maintain indexes, mark affected indexes as unusable  (Default FALSE) 
commit_discontinued -- commit loaded rows when load is discontinued  (Default FALSE) 
  readsize -- size of read buffer                  (Default 1048576) 
external_table -- use external table for load; NOT_USED, GENERATE_ONLY, EXECUTE 
columnarrayrows -- number of rows for direct path column array  (Default 5000) 
streamsize -- size of direct path stream buffer in bytes  (Default 256000) 
multithreading -- use multithreading in direct path 
 resumable -- enable or disable resumable for current session  (Default FALSE) 
resumable_name -- text string to help identify resumable statement 
resumable_timeout -- wait time (in seconds) for RESUMABLE  (Default 7200) 
date_cache -- size (in entries) of date conversion cache  (Default 1000) 
no_index_errors -- abort load on any index errors  (Default FALSE) 
partition_memory -- direct path partition memory limit to start spilling (kb)  (Default 0) 
     table -- Table for express mode load 
date_format -- Date format for express mode load 
timestamp_format -- Timestamp format for express mode load 
terminated_by -- terminated by character for express mode load 
enclosed_by -- enclosed by character for express mode load 
optionally_enclosed_by -- optionally enclosed by character for express mode load 
characterset -- characterset for express mode load 
degree_of_parallelism -- degree of parallelism for express mode load and external table load 
      trim -- trim type for express mode load and external table load 
       csv -- csv format data files for express mode load 
    nullif -- table level nullif clause for express mode load 
field_names -- field names setting for first record of data files for express mode load 
dnfs_enable -- option for enabling or disabling Direct NFS (dNFS) for input data files  (Default FALSE) 
dnfs_readbuffers -- the number of Direct NFS (dNFS) read buffers  (Default 4) 
sdf_prefix -- prefix to append to start of every LOB File and Secondary Data File 
      help -- display help messages  (Default FALSE) 
empty_lobs_are_null -- set empty LOBs to null  (Default FALSE) 
  defaults -- direct path default value loading; EVALUATE_ONCE, EVALUATE_EVERY_ROW, IGNORE, IGNORE_UNSUPPORTED_EVALUATE_ONCE, 
 IGNORE_UNSUPPORTED_EVALUATE_EVERY_ROW 
direct_path_lock_wait -- wait for access to table when currently locked  (Default FALSE) 
PLEASE NOTE: Command-line parameters may be specified either by 
position or by keywords.  An example of the former case is 'sqlldr 
scott/tiger foo'; an example of the latter is 'sqlldr control=foo 
userid=scott/tiger'.  One may specify parameters by position before 
but not after parameters specified by keywords.  For example, 
'sqlldr scott/tiger control=foo logfile=log' is allowed, but 
'sqlldr scott/tiger control=foo log' is not, even though the 
position of the parameter 'log' is correct. 

In the script below I prepare my environment and load data. The command is in yellow and the output of each command is displayed below it in grey.

# Create the user, drop if exists 
[email protected]" 
BEGIN 
  FOR i IN (SELECT 1 FROM DBA_USERS WHERE USERNAME='MYDB') LOOP 
    EXECUTE IMMEDIATE 'DROP USER MYDB CASCADE'; 
  END LOOP; 
  EXECUTE IMMEDIATE 'CREATE USER MYDB IDENTIFIED BY MyPwd'; 
  EXECUTE IMMEDIATE 'GRANT CONNECT, CREATE TABLE TO MYDB'; 
  EXECUTE IMMEDIATE 'ALTER USER MYDB QUOTA 1G ON SYSTEM'; 
END; 
/ 
"@ 
$Script | sqlplus -s / as sysdba 
PL/SQL procedure successfully completed.
# Create the table 
"CREATE TABLE MYTBL(TEXT VARCHAR2(4000));" | sqlplus -s MYDB/MyPwd 
Table created.
# Get file size 
Write-Host((Get-Item C:\setup\input.txt).length/1KB)
8.5732421875
# Import data 
sqlldr MYDB/MyPwd TABLE=MYTBL DATA='c:\setup\input.txt' DIRECT=TRUE MULTITHREADING=TRUE PARALLEL=TRUE
SQL*Loader: Release 19.0.0.0.0 - Production on Wed Feb 23 11:32:14 2022 
Version 19.3.0.0.0 
Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved. 
Express Mode Load, Table: MYTBL 
Path used:      Direct 
Load completed - logical record count 72. 
Table MYTBL: 
  72 Rows successfully loaded. 
Check the log file: 
  MYTBL.log 
for more information about the load.
# Show contents of log file
Get-Content MYTBL.log
SQL*Loader: Release 19.0.0.0.0 - Production on Wed Feb 23 11:32:14 2022 
Version 19.3.0.0.0 
Copyright (c) 1982, 2019, Oracle and/or its affiliates.  All rights reserved. 
Express Mode Load, Table: MYTBL 
Data File:      c:\setup\input.txt 
  Bad File:     input.bad 
  Discard File:  none specified 
 (Allow all discards) 
Number to load: ALL 
Number to skip: 0 
Errors allowed: 50 
Continuation:    none specified 
Path used:      Direct - with parallel option. 
Table MYTBL, loaded from every logical record. 
Insert option in effect for this table: APPEND 
   Column Name                  Position   Len  Term Encl Datatype 
------------------------------ ---------- ----- ---- ---- --------------------- 
TEXT                                FIRST     *   ,       CHARACTER 
Generated control file for possible reuse: 
OPTIONS(DIRECT=TRUE, PARALLEL=TRUE) 
LOAD DATA 
INFILE 'c:\setup\input.txt' 
APPEND 
INTO TABLE MYTBL 
FIELDS TERMINATED BY "," 
( 
  TEXT CHAR(4000) 
) 
End of generated control file for possible reuse. 
Table MYTBL: 
  72 Rows successfully loaded. 
  0 Rows not loaded due to data errors. 
  0 Rows not loaded because all WHEN clauses were failed. 
  0 Rows not loaded because all fields were null. 
Bind array size not used in direct path. 
Column array  rows :    5000 
Stream buffer bytes:  256000 
Read   buffer bytes: 1048576 
Total logical records skipped:          0 
Total logical records read:            72 
Total logical records rejected:         0 
Total logical records discarded:        0 
Total stream buffers loaded by SQL*Loader main thread:        1 
Total stream buffers loaded by SQL*Loader load thread:        0 
Run began on Wed Feb 23 11:32:14 2022 
Run ended on Wed Feb 23 11:32:15 2022 
Elapsed time was:     00:00:00.37 
CPU time was:         00:00:00.11
# Review results 
"SELECT COUNT(1) FROM MYTBL;" | sqlplus -s MYDB/MyPwd | Write-Host 
  COUNT(1)
---------- 
        72

Note the following:

  • The first script drops the user and its table (if exists) and creates the user new with permissions to create the table. Then it connects with this new user to create the table.
  • When loading data, if you don’t specify a control file (as in my case), you will use the Express Mode, which has fewer options but is more straightforward.
  • If you don’t specify DIRECT, it will try to load using an EXTERNAL TABLE which requires the user to have the CREATE ANY DIRECTORY permission; it is not recommended that normal users have this privilege. If you grant this privilege and load without MULTITHREADING nor PARALLEL, it takes around 6 seconds to load the data.
  • If you don’t specify MULTITHREADING, it will try loading using a single CPU. If you specify it in DIRECT mode and without PARALLEL, it takes around 4 seconds to load the data.
  • If you don’t specify PARALLEL, it will try loading in a single session. If you specify it alongside with DIRECT mode and MULTITHREADING, it takes 0.37 seconds to load the data as is seen in the output file MYTBL.log.

SQL Server BCP

The default tool for data loading in SQL Server is named Bulk Copy Program (BCP), which from the official documentation states bulk copies data between an instance of Microsoft SQL Server and a data file in a user-specified format. The bcp utility can be used to import large numbers of new rows into SQL Server tables or to export data out of tables into data files. Except when used with the queryout option, the utility requires no knowledge of Transact-SQL. To import data into a table, you must either use a format file created for that table or understand the structure of the table and the types of data that are valid for its columns.

If you run it with the /v flag, it shows the version:

PS C:\> bcp /v
BCP - Bulk Copy Program for Microsoft SQL Server. 
Copyright (C) Microsoft Corporation. All Rights Reserved. 
Version: 15.0.2000.5 

If you run it with a double dash, it shows the usage:

PS C:\> bcp ––
usage: C:\Program Files\Microsoft SQL Server\Client SDK\ODBC\170\Tools\Binn\bcp.exe {dbtable | query} {in | out | queryout | 
format} datafile 
  [-m maxerrors]            [-f formatfile]          [-e errfile] 
  [-F firstrow]             [-L lastrow]             [-b batchsize] 
  [-n native type]          [-c character type]      [-w wide character type] 
  [-N keep non-text native] [-V file format version] [-q quoted identifier] 
  [-C code page specifier]  [-t field terminator]    [-r row terminator] 
  [-i inputfile]            [-o outfile]             [-a packetsize] 
  [-S server name]          [-U username]            [-P password] 
  [-T trusted connection]   [-v version]             [-R regional enable] 
  [-k keep null values]     [-E keep identity values][-G Azure Active Directory Authentication] 
  [-h "load hints"]         [-x generate xml format file] 
  [-d database name]        [-K application intent]  [-l login timeout] 

In the script below I prepare my environment and load data, the output of each command is displayed below it if there is output:

# Create the database, drop if exists 
SQLCMD -Q "DROP DATABASE IF EXISTS MyDb; CREATE DATABASE MyDb;" 
# Create the table 
SQLCMD -Q "CREATE TABLE MyDb.dbo.MyTbl([Text] VARCHAR(MAX));" 
# Get file size 
Write-Host((Get-Item C:\setup\input.txt).length/1KB) 
8.5732421875
# Import data 
bcp MyDb.dbo.MyTbl IN C:\setup\input.txt -T -c
Starting copy... 
72 rows copied. 
Network packet size (bytes): 4096 
Clock Time (ms.) Total     : 47     Average : (1531.91 rows per sec.) 
# Review results 
SQLCMD -Q "SELECT COUNT(1) FROM MyDb.dbo.MyTbl;" 
----------- 
         72 
(1 rows affected) 

Note the following:

  • The first command drops the database if exists and creates a new one.
  • The second command creates the table.
  • The third command imports the data, the flag -T is used to establish a trusted connection and -c is used to import character data.
  • You can see it took 47 ms to import the data and the program calculated in theory it could insert 1531 rows per second.

Conclusion

As you can see, in a default instance, default database, and default load, SQL Server is 10 times faster than Oracle. But of course, there are optimizations both in Oracle and SQL Server especially when doing imports that can reduce the differences in time. It is up to you to experiment with the type of load you will perform, and adjust the databases, tables, and loader program to your scenario, to do a better comparison and decide which one is better for you. As you noticed there are different options when loading data that are not available in both SQL Server and Oracle.

Next Steps





get scripts

next tip button



About the author
MSSQLTips author Pablo Echeverria Pablo Echeverria worked for more than 10 years as a software programmer and analyst. In 2016, I switched jobs to a DBA position, where I have implemented new processes, created better monitoring tools and grown my data scientist skills.

View all my tips


Article Last Updated: 2022-03-23

Comments For This Article

















get free sql tips
agree to terms