By: Daniel Farina | Comments (2) | Related: More > Integration Services Development
Problem
Big companies usually don't have a homogeneous environment and as a SSIS developer you need to consolidate data from different platforms into a data warehouse. For the average SQL Server Professional many of those environments may be unknown. That is the case with Mainframes. In this tip I will cover the basic aspects that you need to know to interact with your company's mainframe team.
Solution
One of the most used ways for information interchange is text files. Mainframe encoded files may seem to be encrypted, in fact some time ago a coworker of mine was truly convinced that due to the fact that when he opened a file with a text editor like Notepad he saw rubbish characters that seemed to be meaningless. After he showed me those files I explained him that those files weren't encrypted and contained plain text indeed, but with a codepage other than ASCII.
Flat File Metadata
Metadata refers to information about data, in other words, it describes what we are going to import into our database and how we should handle the flat file. If you ever used the BCP utility to import flat files into SQL Server you know that there is a special file called the format file that holds information about columns disposition, data types and field terminators. Mainframes also use a kind of format file containing record metadata called a COBOL Copybook.
A COBOL Copybook contains what is called a Record Layout, which is a description about the fields of a data file. The following is an example of a simple COBOL Copybook.
**---------------------------------------------- **-- CUSTOMERS DATA **---------------------------------------------- 01 CUSTOMER-RECORD. 05 CUSTOMERNAME PIC X(50). 05 CUSTOMERADDRESS PIC X(50). 05 CUSTOMERCATEGORY PIC 9 COMP-3. 05 CUSTOMERBALANCE PIC 9(8)V99 COMP-3.
As you can see, after the field name is a PIC clause describing the field data type, the most common are X to represent any character and 9 to define the field as numeric.
There is a special type of COBOL field called FILLER used to reserve space in a COBOL record, commonly for future expansion or to fill a gap created by a redefined field. A file can have any number of FILLER fields. Something important to say about fillers is that it can contain any kind of data, even binary data. This is something you need to be aware of and not assume that filler fields are filled by spaces.
The EBCDIC Character Encoding
Mainframes don't use the ASCII character set; instead they use EBCDIC (Extended Binary Coded Decimal Interchange Code) encoding. It was created with the intention to make it easier for the input of data with punched cards. As a consequence, character codes have gaps. For example the character codes "hij" in EBCDIC are encoded as hex numbers 0x88, 0x89, 0x91 (136, 137, 145 in decimal).
Numeric Representation in Mainframes
As you know, all computers and electronic systems work with bits; since the word bit comes by joining the words binary digit, it's wise to think that mainframes uses the binary system for numeric representation. But think about it, a byte doesn't mean anything on its own unless we assign a meaning to it. For example the byte 0x66 can represent the number 109, if we convert the byte's value from hexadecimal to decimal, or the number 66 if we take into consideration the decimal numbers composing the byte.
Mainframes use the last method called Binary Coded Decimal, or BCD for numerical representation. As its name says it consists on encoding a decimal digit into binary format. This makes easier for a human being to read the numbers and to write them into a punched card, remember that mainframes at the beginning used punched cards and not a keyboard as an input.
The Zoned Decimal Format
Zoned numbers were the preferred method for display and data input. Its name comes from the punched card era where you need to input a number. An operator has to punch into the punched card the digit position and a column called the zone. The numbers 0 to 9 are coded as 0xF1 to 0xF9, the EBCDIC codes for numbers. Taking this into consideration we could say that the Zoned Decimal Format is just like a text representation of numbers, but that will be true if we are dealing with unsigned numbers.
Signed zoned numbers store the sign position into the high order nibble of the rightmost byte as a hexadecimal C for the positive numbers and a D for the negative ones. So, according to the EBCDIC character map, the signed position of the field will display as a character value of "A-I" for positive numbers and "J-R" for negative numbers. Also the number zero will be represented as a "{" and "}", the character codes of 0xC0 and 0xD0 respectively.
The problem with signed zoned numbers is that you can get different results on the output text. If the signed zoned numbers were converted to ASCII prior to file transmission then the sign position for positive numbers will be in the ASCII range of 0x30 to 0x39, that are the ASCII codes for numbers 0 to 9 and the sign position for negative numbers will be in the ASCII range of 0x70 to 0x79, representing the characters "p" to "y".
But if the zoned numbers are converted while transferred via FTP in ASCII mode, they are converted by its character representation resulting in what is a modified zoned number where the sign position for the positive numbers has the value 0x7B (ASCII character "{") for zero and 0x41 to 0x49 (ASCII characters "A" to "I") for numbers 1 to 9, and the sign position for the negative numbers are 0x7D (ASCII character "}") for zero and 0x4A to x052 (ASCII characters "J" to "R") for negative numbers 1 to 9. The next table will show us the possible values we will get for the sign digit.
Digit |
EBCDIC Hex |
EBCDIC Display |
Strict Zoned ASCII Hex |
Strict Zoned ASCII Display |
Modified Zoned ASCII Hex |
Modified Zoned ASCII Display |
---|---|---|---|---|---|---|
+0 |
0xC0 |
{ |
0x30 |
0 |
0x7B |
{ |
+1 |
0xC1 |
A |
0x31 |
1 |
0x41 |
A |
+2 |
0xC2 |
B |
0x32 |
2 |
0x42 |
B |
+3 |
0xC3 |
C |
0x33 |
3 |
0x43 |
C |
+4 |
0xC4 |
D |
0x34 |
4 |
0x44 |
D |
+5 |
0xC5 |
E |
0x35 |
5 |
0x45 |
E |
+6 |
0xC6 |
F |
0x36 |
6 |
0x46 |
F |
+7 |
0xC7 |
G |
0x37 |
7 |
0x47 |
G |
+8 |
0xC8 |
H |
0x38 |
8 |
0x48 |
H |
+9 |
0xC9 |
I |
0x39 |
9 |
0x49 |
I |
-0 |
0xD0 |
} |
0x70 |
p |
0x7D |
} |
-1 |
0xD1 |
J |
0x71 |
q |
0x4A |
J |
-2 |
0xD2 |
K |
0x72 |
r |
0x4B |
K |
-3 |
0xD3 |
L |
0x73 |
s |
0x4C |
L |
-4 |
0xD4 |
M |
0x74 |
t |
0x4D |
M |
-5 |
0xD5 |
N |
0x75 |
u |
0x4E |
N |
-6 |
0xD6 |
O |
0x76 |
v |
0x4F |
O |
-7 |
0xD7 |
P |
0x77 |
w |
0x50 |
P |
-8 |
0xD8 |
Q |
0x78 |
x |
0x51 |
Q |
-9 |
0xD9 |
R |
0x79 |
y |
0x52 |
R |
The next image shows how the number 1234 is encoded as an unsigned zoned, a signed positive zoned and a signed negative number.
The Packed Decimal Format
This type of number, usually called Computational 3 or COMP-3 in COBOL, are used on Mainframes to save storage space and are preferred when the number is subject to arithmetic operations, like an account balance. Packed numbers holds two decimal digits per byte in contrary to the Zoned number representation that holds one digit per byte. For example, the number 48 can be represented into a byte as the hex number 0x48. Also, like zoned numbers, packed numbers can have implied decimal digits.
But just like with the zoned numbers, the less significant nibble of the first byte is used to hold the sign by using a hexadecimal C for positive numbers, a D for the negatives and an F for the unsigned values. The next image shows a representation of the number 1234 as an unsigned and signed packed number and the number -1234, of course as a signed packed.
Something to note is that when a file contains packed numbers it must be treated as a binary file, even on the mainframe environment.
This allowed programmers to economize space by using half of the size for numerical representation. Considering that back in the day the cost for storage and memory was much higher than nowadays, it was a wise decision to implement packed numbers at the expenses of increasing code complexity.
File Transfer between Mainframe and Open Systems
Usually files are transferred amongst these two platforms via FTP (File Transfer Protocol). This protocol has different ways to send a file between computers:
FTP Mode |
Description |
---|---|
ASCII or Text |
Data is converted from the host's character representation to ASCII before transmission. |
Binary |
File is transferred byte for byte. |
EBCDIC |
Used for plain text between Mainframes. |
Local |
Allows two computers with identical setups to send data in a proprietary format without the need to convert it to ASCII. |
From the previous table we can see that there are some things to consider before transmitting a file. If the file being transferred contains packed fields then there is no other way to transfer it, but binary.
Jobs in Mainframe Systems
Mainframe jobs can be configured to store job output into a file just like SQL Server Agent. This file is usually called the sysout because of the instruction used in JCL language to specify the output file for job's messages.
Next Steps
- Read my previous tip about Importing Mainframe Data with SQL Server Integration Services.
- For more information about conversion between data types check out this tip: SQL Server Integration Services Data Type Conversion Testing.
- Browse Integration Services Data Flow Transformations Tips Category for more tips about Data Flow Transformations.
- If you want to enrich your SSIS packages and make them look more professional, take a look at Integration Services Best Practices Tips Category.
- Also check out SQL Server Business Intelligence Tips and Tricks.
About the author
This author pledges the content of this article is based on professional experience and not AI generated.
View all my tips