Do you need to make SQL Server data available for a SAS Analytics package? Do you also need to accept some data back from a SAS package and also process the returned values? How can a SQL Server professional with limited or no SAS experience accomplish these goals?
SAS is a popular tool in such areas as healthcare and retail analytics. Therefore, if data you manage as a SQL Server professional has analytical applications in these or other areas, you may very well find yourself preparing data for SAS analytics professionals or even running SQL Server data through a SAS analytics package without support from SAS professionals. Since SQL Server and SAS are from different software vendors, newcomers to the task may find interoperability challenging.
This tip is the first in a two-part series on presenting a simple set of techniques to handle interoperability between SQL Server and SAS with minimum hassle. In this first installment, you get a hands-on feel for the topic starting with coverage of the reasons and steps for creating a staging table in SQL Server for transfer to SAS. This first installment takes you from a SQL Server staging table for SAS through to its transfer to a SAS data set. Tip support materials include a script for creating a table and populating it with values for export from SQL Server to SAS as well as a .csv file with data values that are ready for transfer to SAS. This installment in the series closes with some ways of enriching the solution and resources for learning more about SAS.
Some Background Thoughts about the Solution
Interoperability between SQL Server and SAS is complicated especially by incompatible data types. Natively, SAS supports just two data types:
Characters up to a 32,767 character maximum
Double precision floating point numbers
Because SQL Server data typing is much richer than SAS, translating SQL Server values for use with SAS can be challenging. Even the topic of representing and processing missing data values is substantially different between SQL Server and SAS. SQL Server represents missing data values with NULL values, but SAS has a minimum of two ways of representing missing values: a period (.) for numeric missing values and a blank space (' ') for character missing values. In some cases, these incompatibility issues may not matter, at other times, they will matter. In any event, when SQL Server professionals need to have their data interface with SAS packages, you should learn as much as possible about data typing and missing value representation and processing issues between SAS and SQL Server as your time permits.
The solution illustrated here takes advantage of SQL Server and SAS wizards to ready data and enable its transfer between SQL Server and SAS. Because the approach takes advantage of .csv files to transfer data between SQL Server and SAS, data typing issues are not explicitly addressed in any ways besides the way SQL Server and SAS wizards handle the creation and reading of .csv files.
Create and Populate a Staging Table for SAS
SAS analysis packs typically expect data from a single table. SQL Server professionals will frequently need to populate this table from multiple relational tables in a SQL Server database. You can think of this part of the solution as a two-step process. First, create a staging table with columns corresponding to the SAS pack variables. Second, populate the columns with values.
The following script illustrates the type of code that you might use to generate a SQL Server table on which to base a SAS data set. Name the columns according to the variable names for the SAS pack for which you are preparing data. Use data types that have default translations from SQL Server as close as possible to those required by the SAS pack.
The script includes a primary key. The use of a primary key is strictly optional because SAS does not recognize SQL Server primary keys when data is transferred via .csv files. Nevertheless, you may find a primary key in the staging table useful for ordering your data for visual inspection before exporting them to SAS. SAS packs routinely use a numeric indicator to track rows in a SAS data set. Therefore, you may want a staging table column with an exact numeric data type, such as int or bigint, for this role. If you choose a bigint data type, be aware that very large bigint values can exceed the upper limit of a SAS numeric data type. Your original source data may use a uniqueidentifier data type to identify rows. SAS can map SQL Server uniqueidentifier values to character variables. If the SAS package you are using needs a numeric value for identifying rows, SQL Server uniqueidentifier values will not be appropriate.
After you create a table for the data to be exported to SAS, you need to populate the table. One approach to this task is to use an INSERT statement that derives source data from a SELECT statement or a SQL Server object, such as a view. The SELECT statement can de-normalize data from multiple SQL Server tables to your staging table for SAS. The de-normalization allows the staging table to have one row per entity, such as a person or a visit, although that entities may be in a one-to-many relationship with other columns in the source SQL Server database.
Using Wizards to Migrate a SQL Server table to a SAS Data Set
When you are able to use default settings for the SQL Server Export wizard and the SAS Import File wizard, you can export a table from SQL Server as a flat file in a comma separated values format (.csv) such that SAS can read the file and construct a SAS data set that has the same data values as the SQL Server staging table. This process can even succeed when the source data includes both character and numeric values -- some of which may be missing.
Start the process by launching the SQL Server Export wizard and specify the database containing your staging table for SAS as the Data Source. Designate a Flat File Destination on the Choose a Destination screen and browse to the path where you want your .csv file. Select CSV files as the file type and designate a name for the file, such as ForTable1_from_SS. Also select the check box for column names in the first row. Then, specify that you want to copy data from one or more tables and views before selecting the name and schema for the staging table. After this selection, you can use the remaining default Export wizard settings.
The following two screen shots illustrate some sample data for a staging table before and after export from SQL Server. The script available as a download with this article for creating and populating a staging table includes the sample data. Missing values are represented in the Results pane by NULL values; this view is before processing by the Export wizard. Notepad displays the exported data in a .csv file named ForTable1_from_SS. NULL values in the staging table appear as two commas with no value between them. Uniqueidentifier values are delimited by braces.
After creating a file of data values in .csv format, copy the file to a computer on which you are running SAS. Then, start SAS and make a menu choice of File, Import Data. This opens the first of the three screens that you see below. By default, the SAS Import Wizard is ready to accept a file in .csv format. Click Next to advance to the second screen below that allows you to browse to your .csv file. For this demonstration, the ForTable1_from_SS.csv file that was created with the SQL Server Export Wizard is in the MSSQLTips folder of the C: drive. Click Next to advance to the third screen below that allows you to indicate the name for your SAS data set and where it will be stored. By default, SAS creates the new data set in a temporary library named WORK. To facilitate easy re-use of the data set after the current session ends, you can instead choose to save the data set in a permanent library. The third screen below designates MSSQLTIP as the permanent library to store a data set named Table1_from_SS. After specifying what to name your data set and the SAS library in which to store it, you can click Finish.
Verifying the Data after Transfer to SAS
In response to clicking the Finish button, SAS creates the data set. If you completed the steps successfully, you can inspect the data set design and values with the following lines of SAS codes.
PROC CONTENTS DATA=MSSQLTIP.Table1_from_SS;
PROC PRINT DATA=MSSQLTIP.Table1_from_SS; RUN;
The following excerpt from the output of the above script shows the design and contents of the SAS data set. The top part of the screen excerpt displays metadata for the data set from the PROC CONTENTS. For example, you can see that ID_val column, which contained int data type values, is a numeric SAS variable. In contrast, the GUID_val column is a character variable of length 38; this LEN specification includes 36 characters for the numbers, letters, and dashes in the GUID representations and two additional characters for the opening and closing brace delimiters. The bottom portion of the display reveals the data contents from the PROC PRINT. Most rows in the data set have either a character or numeric missing value. The Comment column, indicates which, if any, column is missing. Those column values with a missing number appear with a period (.) as the column value. Columns with a missing character value just appear blank.
The second installment in this series will demonstrate the operation of a simple SAS pack with the transferred data from this tip. You'll also learn how to use the results computed in the SAS pack back in SQL Server. In both this installment and the next one, the goal is to rely heavily on wizards. This makes the approach easily adaptable by SQL Server professionals with limited SAS experience.
If your interoperability requirements between SQL Server and SAS require features that the wizards do not support, there are multiple options that can be pursued. For example, you can use a SSIS package to ready the SQL Server data for import by SAS as well as to transform returned data from SAS to appropriate SQL Server data type values. In addition, you can save the script created by the SAS File, Import wizard and then edit the saved script to perform additional transformations that a SAS pack may require. When you run the edited script, you will be invoking a customized SAS data step. If you are just getting started with transferring data from SQL Server to SAS, you can benefit by learning as much as possible about how SAS stores and uses data. I recommend you focus on ways of reading data with a custom SAS data step. Also, learn about the different ways that SAS handles missing values and dates from the ways SQL Server represents and processes these kinds of data. Here are some URLs that you can use to start learning more on your own while you waiting for the second installment in this series as well as other tips on SQL Server/SAS interoperability.
I used to do a lot of exporting and importing and then decided to assign the SAL database as a libray. I setup a few macro variables, assign the library and then use a datastep to import the data. You can bring in a fairly large chunk of data quickly and then use SAS to manipulate the data as needed:
*get yourtable data; data newsasdata; set mydblib.yourtable (where = (intMethodID = &method and intMatrixID = &matrix and txtStudyTracker = "&study" and intRunNumber ge &run and intRunNumber le &endrun and bitRejected = 0)); run;
Thursday, August 30, 2012 - 7:20:11 AM - Charles Chambers
The libname feature requires SAS Access package license, which may be an issue for smaller shops or low budget projects. For low-budget projects or maybe even for cases where the need to move data back and forth is highly infrequent (irrespective of the budget available), database administrators can also use .csv files for exchanging data between SQL Server and SAS. Using .csv files is free.
SAS changed their licensing plans to include the number of server CPU cores as part of the equation and, you will pay a premium to buy SAS the first year and every year after that to keep it running.
Copy the generated code from the SAS log, paste it into a full featured editor, and block quote / cut the SAS log numbers.
Now that SAS has generated the code for you, you can dynamically import the data from SQL Server into SAS using a filename pipe. (You can also remove or comment out the last line in the PowerShell script).
filename sqlsvr pipe "powershell -file C:\PowerShell\scripts\Query-SQLServer.ps1 -noprofile -windowstyle hidden" lrecl=32767;
* PRODUCT: SAS
* VERSION: 9.3
* CREATOR: External File Interface
* DATE: 17APR13
* DESC: Generated SAS Datastep Code
* TEMPLATE SOURCE: (None Specified.)
data WORK.TEST ;
%let _EFIERR_ = 0; /* set the ERROR detection macro variable */
infile SQLSVR delimiter = ',' MISSOVER DSD lrecl=32767 firstobs=2 ;
informat object_id best32. ;
informat name $46. ;
informat column_id best32. ;
informat system_type_id best32. ;
informat user_type_id best32. ;
informat max_length best32. ;
informat precision best32. ;
informat scale best32. ;
informat collation_name $30. ;
informat is_nullable $5. ;
informat is_ansi_padded $5. ;
informat is_rowguidcol $5. ;
informat is_identity $5. ;
informat is_computed $5. ;
informat is_filestream $5. ;
informat is_replicated $5. ;
informat is_non_sql_subscribed $5. ;
informat is_merge_published $5. ;
informat is_dts_replicated $5. ;
informat is_xml_document $5. ;
informat xml_collection_id best32. ;
informat default_object_id best32. ;
informat rule_object_id best32. ;
informat is_sparse $5. ;
informat is_column_set $5. ;
format object_id best12. ;
format name $46. ;
format column_id best12. ;
format system_type_id best12. ;
format user_type_id best12. ;
format max_length best12. ;
format precision best12. ;
format scale best12. ;
format collation_name $30. ;
format is_nullable $5. ;
format is_ansi_padded $5. ;
format is_rowguidcol $5. ;
format is_identity $5. ;
format is_computed $5. ;
format is_filestream $5. ;
format is_replicated $5. ;
format is_non_sql_subscribed $5. ;
format is_merge_published $5. ;
format is_dts_replicated $5. ;
format is_xml_document $5. ;
format xml_collection_id best12. ;
format default_object_id best12. ;
format rule_object_id best12. ;
format is_sparse $5. ;
format is_column_set $5. ;
if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */
Finally, if your SQL Server data has some embedded CRLF's, you'll need a select statement something like:
,replace(replace(finalsts_text,char(10),''),char(13),'') as [finalsts_text]
otherwise the embedded CRLF's will throw off SAS's reading of the (virtual) CSV file.