Export from Hadoop File System to a SQL Server Database Table

By: Dallas Snider | Comments (3) | Related: More > Big Data

Problem

I need to export data from the Hadoop Distributed File System (HDFS) to a SQL Server database table. How can I do this?

Solution

Apache's Sqoop allows for importing data from a database such as SQL Server to the HDFS, and for exporting data from the HDFS to a database table. In this tip we will use Apache Sqoop's export functionality to move data stored on the HDFS to a SQL Server database table. As with previous tips in this series, this tip is written using SQL Server 2014 and a Cloudera virtual machine running on a laptop.

The contents of the file on the HDFS are shown below. There are three columns in this comma-separated value file. The first column is a unique integer identifier. The second column is a string used for a description. The third column is the unit cost. The destination table will have columns to accommodate these three columns and their data types.

[hdfs@localhost:/mssqltips]$ hdfs dfs -cat mssqlTips/linuxDataFile.csv
1,Product A,1.01
2,Product B,2.02
3,Product C,3.03
4,Product D,4.04
5,Product E,5.05
6,Product F,6.06
7,Product G,7.07
8,Product H,8.08
9,Product I,9.09
10,Product J,10.10
11,Product K,11.11
12,Product L,12.12
13,Product M,13.13
14,Product N,14.14
15,Product O,15.15
16,Product P,16.16
17,Product Q,17.17
18,Product R,18.18
19,Product S,19.19
20,Product T,20.20
21,Product U,21.21
22,Product V,22.22
23,Product W,23.23
24,Product X,24.24
25,Product Y,25.25
26,Product Z,26.26
[hdfs@localhost:/mssqltips]$

The T-SQL below was used to create the destination table for this tip.

create table dbo.tblProductData
(
  ProductKey int not null PRIMARY KEY,    
  ProductName varchar(50),
  UnitCost money
)

The image below shows the destination table in the AdventureWorks2014 database as displayed in the Object Explorer.

Target table in the AdventureWorks2014 SQL Server database

The Sqoop command we will examine is listed below and we will break down each argument in the following bullet points. Please note that the command is supposed to be on one complete line or with the backslash (the Linux command line continuation character) at the end of each line except the last.

sqoop export --connect 'jdbc:sqlserver://aaa.bbb.ccc.ddd:pppp;databasename=AdventureWorks2014'              --username 'sqoopLogin' -P              --export-dir 'mssqlTips'              --table 'tblProductData'              --              --schema dbo

sqoop export - The executable is named sqoop and we are instructing it to export the data from an HDFS file to a database.
--connect - With the --connect argument, we are passing in the jdbc connect string for SQL Server. In this case, we use the IP address, port number, and database name.
--username - In this example, the user name is a SQL Server login, not a Windows login. Our database is set up to authenticate in mixed mode. We have a server login named sqoopLogin, and we have a database user name sqoopUser which is a member of the db_datawriter role and has a default schema of dbo.
-P - This will prompt the command line user for the password. If Sqoop is rarely executed, this might be a good option. There are multiple other ways to automatically pass the password to the command, but we are trying to keep it simple for this tip.
--export-dir - The export directory tells sqoop in which directory the file is stored on the HDFS. This argument is required by Sqoop when using the export option.
--table - This argument provides sqoop with the destination table on the SQL Server database. This argument is also required by Sqoop when using the export option.
-- This is not a typographical error. The double dash tells Sqoop that the following arguments are database specific.
--schema - This argument specifies the schema in which our table resides.

The output from the sqoop command is shown below.

[hdfs@localhost:/mssqltips]$ ./sqoopExportCommand.sh
16/09/19 12:03:22 INFO sqoop.Sqoop: Running Sqoop version: 1.4.5-cdh5.2.0
Enter password: 
16/09/19 12:03:25 INFO manager.SqlManager: Using default fetchSize of 1000
16/09/19 12:03:25 INFO manager.SQLServerManager: We will use schema dbo
16/09/19 12:03:25 INFO tool.CodeGenTool: Beginning code generation
16/09/19 12:03:26 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [dbo].[tblProductData] AS t WHERE 1=0
16/09/19 12:03:26 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/lib/hadoop-0.20-mapreduce
Note: /tmp/sqoop-training/compile/3eaf9ad50f5ebf5ef893fba0e41859b1/tblProductData.java uses or overrides a deprecated API.
Note: Recompile with -Xlint:deprecation for details.
16/09/19 12:03:29 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-training/compile/3eaf9ad50f5ebf5ef893fba0e41859b1/tblProductData.jar
16/09/19 12:03:29 INFO mapreduce.ExportJobBase: Beginning export of tblProductData
16/09/19 12:03:30 WARN mapred.JobClient: Use GenericOptionsParser for parsing the arguments. Applications should implement Tool for the same.
16/09/19 12:03:32 INFO input.FileInputFormat: Total input paths to process : 1
16/09/19 12:03:32 INFO input.FileInputFormat: Total input paths to process : 1
16/09/19 12:03:33 INFO mapred.JobClient: Running job: job_201609071028_0005
16/09/19 12:03:34 INFO mapred.JobClient:  map 0% reduce 0%
16/09/19 12:03:48 INFO mapred.JobClient:  map 25% reduce 0%
16/09/19 12:03:49 INFO mapred.JobClient:  map 50% reduce 0%
16/09/19 12:04:00 INFO mapred.JobClient:  map 100% reduce 0%
16/09/19 12:04:02 INFO mapred.JobClient: Job complete: job_201609071028_0005
16/09/19 12:04:02 INFO mapred.JobClient: Counters: 24
16/09/19 12:04:02 INFO mapred.JobClient:   File System Counters
16/09/19 12:04:02 INFO mapred.JobClient:     FILE: Number of bytes read=0
16/09/19 12:04:02 INFO mapred.JobClient:     FILE: Number of bytes written=1185776
16/09/19 12:04:02 INFO mapred.JobClient:     FILE: Number of read operations=0
16/09/19 12:04:02 INFO mapred.JobClient:     FILE: Number of large read operations=0
16/09/19 12:04:02 INFO mapred.JobClient:     FILE: Number of write operations=0
16/09/19 12:04:02 INFO mapred.JobClient:     HDFS: Number of bytes read=1778
16/09/19 12:04:02 INFO mapred.JobClient:     HDFS: Number of bytes written=0
16/09/19 12:04:02 INFO mapred.JobClient:     HDFS: Number of read operations=16
16/09/19 12:04:02 INFO mapred.JobClient:     HDFS: Number of large read operations=0
16/09/19 12:04:02 INFO mapred.JobClient:     HDFS: Number of write operations=0
16/09/19 12:04:02 INFO mapred.JobClient:   Job Counters 
16/09/19 12:04:02 INFO mapred.JobClient:     Launched map tasks=4
16/09/19 12:04:02 INFO mapred.JobClient:     Data-local map tasks=4
16/09/19 12:04:02 INFO mapred.JobClient:     Total time spent by all maps in occupied slots (ms)=47822
16/09/19 12:04:02 INFO mapred.JobClient:     Total time spent by all reduces in occupied slots (ms)=0
16/09/19 12:04:02 INFO mapred.JobClient:     Total time spent by all maps waiting after reserving slots (ms)=0
16/09/19 12:04:02 INFO mapred.JobClient:     Total time spent by all reduces waiting after reserving slots (ms)=0
16/09/19 12:04:02 INFO mapred.JobClient:   Map-Reduce Framework
16/09/19 12:04:02 INFO mapred.JobClient:     Map input records=26
16/09/19 12:04:02 INFO mapred.JobClient:     Map output records=26
16/09/19 12:04:02 INFO mapred.JobClient:     Input split bytes=576
16/09/19 12:04:02 INFO mapred.JobClient:     Spilled Records=0
16/09/19 12:04:02 INFO mapred.JobClient:     CPU time spent (ms)=4580
16/09/19 12:04:02 INFO mapred.JobClient:     Physical memory (bytes) snapshot=452489216
16/09/19 12:04:02 INFO mapred.JobClient:     Virtual memory (bytes) snapshot=2948849664
16/09/19 12:04:02 INFO mapred.JobClient:     Total committed heap usage (bytes)=127401984
16/09/19 12:04:02 INFO mapreduce.ExportJobBase: Transferred 1.7363 KB in 31.5282 seconds (56.394 bytes/sec)
16/09/19 12:04:02 INFO mapreduce.ExportJobBase: Exported 26 records.
[hdfs@localhost:/mssqltips]$

Notice the last line of output above shows that 26 records were exported. This corresponds to the 26 records in the file on the HDFS. When we query the count of rows in our database table, we see that 26 rows are present as displayed in the image below. Also in the image below, we see the first 17 records returned by the query.

SQL Server table row count and sample data

Next Steps

Please experiment with different data types. Also, please check out these other tips and tutorials on big data and Sqoop on MSSQLTips.com.

About the author

Dr. Dallas Snider is an Assistant Professor in the Computer Science Department at the University of West Florida and has 18+ years of SQL experience.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips

Comments For This Article

Monday, August 28, 2017 - 11:45:32 PM - Tin

Hi! I'm new Sqoop And hadoop.

i have do it as above, but i get this error as below. Please help me to solve this problem!

Thanks.

hduser@master:~$ sqoop export --connect 'jdbc:sqlserver://10.0.99.54:1433;databasename=SMartCity' --username 'sa' -P --export-dir '/input/export.txt' --table 'Export' -- --schema dbo --fields-terminated-by ','

Warning: /usr/local/sqoop/../hbase does not exist! HBase imports will fail.

Please set $HBASE_HOME to the root of your HBase installation.

Warning: /usr/local/sqoop/../hcatalog does not exist! HCatalog jobs will fail.

Please set $HCAT_HOME to the root of your HCatalog installation.

Warning: /usr/local/sqoop/../accumulo does not exist! Accumulo imports will fail.

Please set $ACCUMULO_HOME to the root of your Accumulo installation.

Warning: /usr/local/sqoop/../zookeeper does not exist! Accumulo imports will fail.

Please set $ZOOKEEPER_HOME to the root of your Zookeeper installation.

17/08/25 05:20:15 INFO sqoop.Sqoop: Running Sqoop version: 1.4.6

Enter password:

17/08/25 05:20:18 INFO manager.SqlManager: Using default fetchSize of 1000

17/08/25 05:20:18 INFO manager.SQLServerManager: We will use schema dbo

17/08/25 05:20:18 INFO tool.CodeGenTool: Beginning code generation

17/08/25 05:20:19 INFO manager.SqlManager: Executing SQL statement: SELECT t.* FROM [dbo].[Export] AS t WHERE 1=0

17/08/25 05:20:19 INFO orm.CompilationManager: HADOOP_MAPRED_HOME is /usr/local/hadoop

Note: /tmp/sqoop-hduser/compile/5e94ac2110077f5946caf7fa33d0e59d/Export.java uses or overrides a deprecated API.

Note: Recompile with -Xlint:deprecation for details.

17/08/25 05:20:21 INFO orm.CompilationManager: Writing jar file: /tmp/sqoop-hduser/compile/5e94ac2110077f5946caf7fa33d0e59d/Export.jar

17/08/25 05:20:21 INFO mapreduce.ExportJobBase: Beginning export of Export

17/08/25 05:20:21 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

17/08/25 05:20:21 WARN util.NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable

17/08/25 05:20:21 INFO Configuration.deprecation: mapred.jar is deprecated. Instead, use mapreduce.job.jar

17/08/25 05:20:23 INFO Configuration.deprecation: mapred.reduce.tasks.speculative.execution is deprecated. Instead, use mapreduce.reduce.speculative

17/08/25 05:20:23 INFO Configuration.deprecation: mapred.map.tasks.speculative.execution is deprecated. Instead, use mapreduce.map.speculative

17/08/25 05:20:23 INFO Configuration.deprecation: mapred.job.tracker is deprecated. Instead, use mapreduce.jobtracker.address

17/08/25 05:20:23 INFO Configuration.deprecation: mapred.map.tasks is deprecated. Instead, use mapreduce.job.maps

17/08/25 05:20:23 INFO Configuration.deprecation: session.id is deprecated. Instead, use dfs.metrics.session-id

17/08/25 05:20:23 INFO jvm.JvmMetrics: Initializing JVM Metrics with processName=JobTracker, sessionId=

17/08/25 05:20:23 INFO input.FileInputFormat: Total input paths to process : 1

17/08/25 05:20:23 INFO mapreduce.JobSubmitter: number of splits:4

17/08/25 05:20:23 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_local691914830_0001

17/08/25 05:20:24 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424267/avro-mapred-1.7.5-hadoop2.jar

17/08/25 05:20:25 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/avro-mapred-1.7.5-hadoop2.jar as file:/app/hadoop/tmp/mapred/local/1503656424267/avro-mapred-1.7.5-hadoop2.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424268/kite-data-hive-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/kite-data-hive-1.0.0.jar as file:/app/hadoop/tmp/mapred/local/1503656424268/kite-data-hive-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424269/parquet-format-2.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/parquet-format-2.0.0.jar as file:/app/hadoop/tmp/mapred/local/1503656424269/parquet-format-2.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424270/sqoop-1.4.6-hadoop200.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/sqoop-1.4.6-hadoop200.jar as file:/app/hadoop/tmp/mapred/local/1503656424270/sqoop-1.4.6-hadoop200.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424271/parquet-common-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/parquet-common-1.4.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424271/parquet-common-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424272/commons-logging-1.1.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/commons-logging-1.1.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424272/commons-logging-1.1.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424273/parquet-encoding-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/parquet-encoding-1.4.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424273/parquet-encoding-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424274/opencsv-2.3.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/opencsv-2.3.jar as file:/app/hadoop/tmp/mapred/local/1503656424274/opencsv-2.3.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424275/snappy-java-1.0.5.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/snappy-java-1.0.5.jar as file:/app/hadoop/tmp/mapred/local/1503656424275/snappy-java-1.0.5.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424276/parquet-column-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/parquet-column-1.4.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424276/parquet-column-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424277/commons-codec-1.4.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/commons-codec-1.4.jar as file:/app/hadoop/tmp/mapred/local/1503656424277/commons-codec-1.4.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424278/jackson-mapper-asl-1.9.13.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/jackson-mapper-asl-1.9.13.jar as file:/app/hadoop/tmp/mapred/local/1503656424278/jackson-mapper-asl-1.9.13.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424279/jackson-databind-2.3.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/jackson-databind-2.3.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424279/jackson-databind-2.3.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424280/parquet-generator-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/parquet-generator-1.4.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424280/parquet-generator-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424281/sqljdbc4.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/sqljdbc4.jar as file:/app/hadoop/tmp/mapred/local/1503656424281/sqljdbc4.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424282/jackson-core-2.3.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/jackson-core-2.3.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424282/jackson-core-2.3.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424283/commons-jexl-2.1.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/commons-jexl-2.1.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424283/commons-jexl-2.1.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424284/parquet-avro-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/parquet-avro-1.4.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424284/parquet-avro-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424285/kite-data-mapreduce-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/kite-data-mapreduce-1.0.0.jar as file:/app/hadoop/tmp/mapred/local/1503656424285/kite-data-mapreduce-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424286/avro-1.7.5.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/avro-1.7.5.jar as file:/app/hadoop/tmp/mapred/local/1503656424286/avro-1.7.5.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424287/ant-contrib-1.0b3.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/ant-contrib-1.0b3.jar as file:/app/hadoop/tmp/mapred/local/1503656424287/ant-contrib-1.0b3.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424288/paranamer-2.3.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/paranamer-2.3.jar as file:/app/hadoop/tmp/mapred/local/1503656424288/paranamer-2.3.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424289/slf4j-api-1.6.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/slf4j-api-1.6.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424289/slf4j-api-1.6.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424290/xz-1.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/xz-1.0.jar as file:/app/hadoop/tmp/mapred/local/1503656424290/xz-1.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424291/parquet-jackson-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/parquet-jackson-1.4.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424291/parquet-jackson-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424292/parquet-hadoop-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/parquet-hadoop-1.4.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424292/parquet-hadoop-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424293/jackson-core-asl-1.9.13.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/jackson-core-asl-1.9.13.jar as file:/app/hadoop/tmp/mapred/local/1503656424293/jackson-core-asl-1.9.13.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424294/kite-hadoop-compatibility-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/kite-hadoop-compatibility-1.0.0.jar as file:/app/hadoop/tmp/mapred/local/1503656424294/kite-hadoop-compatibility-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424295/kite-data-core-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/kite-data-core-1.0.0.jar as file:/app/hadoop/tmp/mapred/local/1503656424295/kite-data-core-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424296/jackson-annotations-2.3.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/jackson-annotations-2.3.0.jar as file:/app/hadoop/tmp/mapred/local/1503656424296/jackson-annotations-2.3.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424297/ant-eclipse-1.0-jvm1.2.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/ant-eclipse-1.0-jvm1.2.jar as file:/app/hadoop/tmp/mapred/local/1503656424297/ant-eclipse-1.0-jvm1.2.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424298/commons-compress-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/commons-compress-1.4.1.jar as file:/app/hadoop/tmp/mapred/local/1503656424298/commons-compress-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424299/commons-io-1.4.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/commons-io-1.4.jar as file:/app/hadoop/tmp/mapred/local/1503656424299/commons-io-1.4.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Creating symlink: /app/hadoop/tmp/mapred/local/1503656424300/hsqldb-1.8.0.10.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: Localized hdfs://master:9000/usr/local/sqoop/lib/hsqldb-1.8.0.10.jar as file:/app/hadoop/tmp/mapred/local/1503656424300/hsqldb-1.8.0.10.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424267/avro-mapred-1.7.5-hadoop2.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424268/kite-data-hive-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424269/parquet-format-2.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424270/sqoop-1.4.6-hadoop200.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424271/parquet-common-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424272/commons-logging-1.1.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424273/parquet-encoding-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424274/opencsv-2.3.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424275/snappy-java-1.0.5.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424276/parquet-column-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424277/commons-codec-1.4.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424278/jackson-mapper-asl-1.9.13.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424279/jackson-databind-2.3.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424280/parquet-generator-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424281/sqljdbc4.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424282/jackson-core-2.3.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424283/commons-jexl-2.1.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424284/parquet-avro-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424285/kite-data-mapreduce-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424286/avro-1.7.5.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424287/ant-contrib-1.0b3.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424288/paranamer-2.3.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424289/slf4j-api-1.6.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424290/xz-1.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424291/parquet-jackson-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424292/parquet-hadoop-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424293/jackson-core-asl-1.9.13.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424294/kite-hadoop-compatibility-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424295/kite-data-core-1.0.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424296/jackson-annotations-2.3.0.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424297/ant-eclipse-1.0-jvm1.2.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424298/commons-compress-1.4.1.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424299/commons-io-1.4.jar

17/08/25 05:20:26 INFO mapred.LocalDistributedCacheManager: file:/app/hadoop/tmp/mapred/local/1503656424300/hsqldb-1.8.0.10.jar

17/08/25 05:20:26 INFO mapreduce.Job: The url to track the job: http://localhost:8080/

17/08/25 05:20:26 INFO mapreduce.Job: Running job: job_local691914830_0001

17/08/25 05:20:26 INFO mapred.LocalJobRunner: OutputCommitter set in config null

17/08/25 05:20:26 INFO mapred.LocalJobRunner: OutputCommitter is org.apache.sqoop.mapreduce.NullOutputCommitter

17/08/25 05:20:26 INFO mapred.LocalJobRunner: Waiting for map tasks

17/08/25 05:20:26 INFO mapred.LocalJobRunner: Starting task: attempt_local691914830_0001_m_000000_0

17/08/25 05:20:26 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]

17/08/25 05:20:26 INFO mapred.MapTask: Processing split: Paths:/input/export.txt:51+9,/input/export.txt:60+10

17/08/25 05:20:26 INFO Configuration.deprecation: map.input.file is deprecated. Instead, use mapreduce.map.input.file

17/08/25 05:20:26 INFO Configuration.deprecation: map.input.start is deprecated. Instead, use mapreduce.map.input.start

17/08/25 05:20:26 INFO Configuration.deprecation: map.input.length is deprecated. Instead, use mapreduce.map.input.length

17/08/25 05:20:26 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false

17/08/25 05:20:26 WARN mapreduce.SQLServerAsyncDBExecThread: Got a Null or empty list. skipping

17/08/25 05:20:26 INFO mapred.LocalJobRunner: Starting task: attempt_local691914830_0001_m_000001_0

17/08/25 05:20:26 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]

17/08/25 05:20:26 INFO mapred.MapTask: Processing split: Paths:/input/export.txt:0+17

17/08/25 05:20:26 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false

17/08/25 05:20:26 WARN mapreduce.SQLServerAsyncDBExecThread: Got a Null or empty list. skipping

17/08/25 05:20:26 INFO mapred.LocalJobRunner: Starting task: attempt_local691914830_0001_m_000002_0

17/08/25 05:20:26 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]

17/08/25 05:20:26 INFO mapred.MapTask: Processing split: Paths:/input/export.txt:17+17

17/08/25 05:20:26 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false

17/08/25 05:20:26 WARN mapreduce.SQLServerAsyncDBExecThread: Got a Null or empty list. skipping

17/08/25 05:20:26 INFO mapred.LocalJobRunner: Starting task: attempt_local691914830_0001_m_000003_0

17/08/25 05:20:26 INFO mapred.Task: Using ResourceCalculatorProcessTree : [ ]

17/08/25 05:20:26 INFO mapred.MapTask: Processing split: Paths:/input/export.txt:34+17

17/08/25 05:20:26 INFO mapreduce.AutoProgressMapper: Auto-progress thread is finished. keepGoing=false

17/08/25 05:20:26 WARN mapreduce.SQLServerAsyncDBExecThread: Got a Null or empty list. skipping

17/08/25 05:20:26 INFO mapred.LocalJobRunner: map task executor complete.

17/08/25 05:20:26 WARN mapred.LocalJobRunner: job_local691914830_0001

java.lang.Exception: java.io.IOException: java.lang.ClassNotFoundException: Export

at org.apache.hadoop.mapred.LocalJobRunner$Job.runTasks(LocalJobRunner.java:462)

at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:522)

Caused by: java.io.IOException: java.lang.ClassNotFoundException: Export

at org.apache.sqoop.mapreduce.TextExportMapper.setup(TextExportMapper.java:70)

at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:142)

at org.apache.sqoop.mapreduce.AutoProgressMapper.run(AutoProgressMapper.java:64)

at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:784)

at org.apache.hadoop.mapred.MapTask.run(MapTask.java:341)

at org.apache.hadoop.mapred.LocalJobRunner$Job$MapTaskRunnable.run(LocalJobRunner.java:243)

at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:471)

at java.util.concurrent.FutureTask.run(FutureTask.java:262)

at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1145)

at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:615)

at java.lang.Thread.run(Thread.java:745)

Caused by: java.lang.ClassNotFoundException: Export

at java.net.URLClassLoader$1.run(URLClassLoader.java:366)

at java.net.URLClassLoader$1.run(URLClassLoader.java:355)

at java.security.AccessController.doPrivileged(Native Method)

at java.net.URLClassLoader.findClass(URLClassLoader.java:354)

at java.lang.ClassLoader.loadClass(ClassLoader.java:425)

at java.lang.ClassLoader.loadClass(ClassLoader.java:358)

at java.lang.Class.forName0(Native Method)

at java.lang.Class.forName(Class.java:278)

at org.apache.sqoop.mapreduce.TextExportMapper.setup(TextExportMapper.java:66)

... 10 more

17/08/25 05:20:27 INFO mapreduce.Job: Job job_local691914830_0001 running in uber mode : false

17/08/25 05:20:27 INFO mapreduce.Job: map 0% reduce 0%

17/08/25 05:20:27 INFO mapreduce.Job: Job job_local691914830_0001 failed with state FAILED due to: NA

17/08/25 05:20:27 INFO mapreduce.Job: Counters: 0

17/08/25 05:20:27 WARN mapreduce.Counters: Group FileSystemCounters is deprecated. Use org.apache.hadoop.mapreduce.FileSystemCounter instead

17/08/25 05:20:27 INFO mapreduce.ExportJobBase: Transferred 0 bytes in 4.1228 seconds (0 bytes/sec)

17/08/25 05:20:27 WARN mapreduce.Counters: Group org.apache.hadoop.mapred.Task$Counter is deprecated. Use org.apache.hadoop.mapreduce.TaskCounter instead

17/08/25 05:20:27 INFO mapreduce.ExportJobBase: Exported 0 records.

17/08/25 05:20:27 ERROR tool.ExportTool: Error during export: Export job failed!

Monday, December 5, 2016 - 6:00:41 PM - bass_player	Back To Top (44902)
PolyBase in SQL Server 2016 can be used to pull data from HDFS/Hadoop and into SQL Server. This can be accomplished in a simple SELECT * (from the external table) INTO (SQL Server table)

Thursday, October 13, 2016 - 11:05:54 AM - Marc Jellinek

What other ways are there to pull data from HDFS into a SQL Server table?

Could Polybase play a role (now that it ships with all versions of SQL Server 2016)?

How could this be done using SSIS? Or Azure Data Factory?