Introduction to Mainframe Data Storage for SSIS Developers

By:   |   Comments (2)   |   Related: More > Integration Services Development


Problem

Big companies usually don't have a homogeneous environment and as a SSIS developer you need to consolidate data from different platforms into a data warehouse. For the average SQL Server Professional many of those environments may be unknown. That is the case with Mainframes. In this tip I will cover the basic aspects that you need to know to interact with your company's mainframe team.

Solution

One of the most used ways for information interchange is text files. Mainframe encoded files may seem to be encrypted, in fact some time ago a coworker of mine was truly convinced that due to the fact that when he opened a file with a text editor like Notepad he saw rubbish characters that seemed to be meaningless. After he showed me those files I explained him that those files weren't encrypted and contained plain text indeed, but with a codepage other than ASCII.

Flat File Metadata

Metadata refers to information about data, in other words, it describes what we are going to import into our database and how we should handle the flat file. If you ever used the BCP utility to import flat files into SQL Server you know that there is a special file called the format file that holds information about columns disposition, data types and field terminators. Mainframes also use a kind of format file containing record metadata called a COBOL Copybook.

A COBOL Copybook contains what is called a Record Layout, which is a description about the fields of a data file. The following is an example of a simple COBOL Copybook.

      **----------------------------------------------
      **--  CUSTOMERS DATA
      **----------------------------------------------
       01  CUSTOMER-RECORD.
           05 CUSTOMERNAME                      PIC X(50).
           05 CUSTOMERADDRESS                   PIC X(50).
           05 CUSTOMERCATEGORY                  PIC 9 COMP-3.
           05 CUSTOMERBALANCE                   PIC 9(8)V99 COMP-3.

As you can see, after the field name is a PIC clause describing the field data type, the most common are X to represent any character and 9 to define the field as numeric.

There is a special type of COBOL field called FILLER used to reserve space in a COBOL record, commonly for future expansion or to fill a gap created by a redefined field. A file can have any number of FILLER fields. Something important to say about fillers is that it can contain any kind of data, even binary data. This is something you need to be aware of and not assume that filler fields are filled by spaces.

The EBCDIC Character Encoding

Mainframes don't use the ASCII character set; instead they use EBCDIC (Extended Binary Coded Decimal Interchange Code) encoding. It was created with the intention to make it easier for the input of data with punched cards. As a consequence, character codes have gaps. For example the character codes "hij" in EBCDIC are encoded as hex numbers 0x88, 0x89, 0x91 (136, 137, 145 in decimal).

Numeric Representation in Mainframes

As you know, all computers and electronic systems work with bits; since the word bit comes by joining the words binary digit, it's wise to think that mainframes uses the binary system for numeric representation. But think about it, a byte doesn't mean anything on its own unless we assign a meaning to it. For example the byte 0x66 can represent the number 109, if we convert the byte's value from hexadecimal to decimal, or the number 66 if we take into consideration the decimal numbers composing the byte.

Mainframes use the last method called Binary Coded Decimal, or BCD for numerical representation. As its name says it consists on encoding a decimal digit into binary format. This makes easier for a human being to read the numbers and to write them into a punched card, remember that mainframes at the beginning used punched cards and not a keyboard as an input.

The Zoned Decimal Format

Zoned numbers were the preferred method for display and data input. Its name comes from the punched card era where you need to input a number.  An operator has to punch into the punched card the digit position and a column called the zone. The numbers 0 to 9 are coded as 0xF1 to 0xF9, the EBCDIC codes for numbers. Taking this into consideration we could say that the Zoned Decimal Format is just like a text representation of numbers, but that will be true if we are dealing with unsigned numbers.

Signed zoned numbers store the sign position into the high order nibble of the rightmost byte as a hexadecimal C for the positive numbers and a D for the negative ones. So, according to the EBCDIC character map, the signed position of the field will display as a character value of "A-I" for positive numbers and "J-R" for negative numbers. Also the number zero will be represented as a "{" and "}", the character codes of 0xC0 and 0xD0 respectively.

The problem with signed zoned numbers is that you can get different results on the output text. If the signed zoned numbers were converted to ASCII prior to file transmission then the sign position for positive numbers will be in the ASCII range of 0x30 to 0x39, that are the ASCII codes for numbers 0 to 9 and the sign position for negative numbers will be in the ASCII range of 0x70 to 0x79, representing the characters "p" to "y".

But if the zoned numbers are converted while transferred via FTP in ASCII mode, they are converted by its character representation resulting in what is a modified zoned number where the sign position for the positive numbers has the value 0x7B (ASCII character "{") for zero and 0x41 to 0x49 (ASCII characters "A" to "I") for numbers 1 to 9, and the sign position for the negative numbers are 0x7D (ASCII character "}") for zero and 0x4A to x052 (ASCII characters "J" to "R") for negative numbers 1 to 9. The next table will show us the possible values we will get for the sign digit.

Digit

EBCDIC Hex

EBCDIC Display

Strict Zoned ASCII Hex

Strict Zoned ASCII Display

Modified Zoned ASCII Hex

Modified Zoned ASCII Display

+0

0xC0

{

0x30

0

0x7B

{

+1

0xC1

A

0x31

1

0x41

A

+2

0xC2

B

0x32

2

0x42

B

+3

0xC3

C

0x33

3

0x43

C

+4

0xC4

D

0x34

4

0x44

D

+5

0xC5

E

0x35

5

0x45

E

+6

0xC6

F

0x36

6

0x46

F

+7

0xC7

G

0x37

7

0x47

G

+8

0xC8

H

0x38

8

0x48

H

+9

0xC9

I

0x39

9

0x49

I

-0

0xD0

}

0x70

p

0x7D

}

-1

0xD1

J

0x71

q

0x4A

J

-2

0xD2

K

0x72

r

0x4B

K

-3

0xD3

L

0x73

s

0x4C

L

-4

0xD4

M

0x74

t

0x4D

M

-5

0xD5

N

0x75

u

0x4E

N

-6

0xD6

O

0x76

v

0x4F

O

-7

0xD7

P

0x77

w

0x50

P

-8

0xD8

Q

0x78

x

0x51

Q

-9

0xD9

R

0x79

y

0x52

R

The next image shows how the number 1234 is encoded as an unsigned zoned, a signed positive zoned and a signed negative number.

Zoned Number Representation on EBCDIC and ASCII Format.

The Packed Decimal Format

This type of number, usually called Computational 3 or COMP-3 in COBOL, are used on Mainframes to save storage space and are preferred when the number is subject to arithmetic operations, like an account balance. Packed numbers holds two decimal digits per byte in contrary to the Zoned number representation that holds one digit per byte. For example, the number 48 can be represented into a byte as the hex number 0x48. Also, like zoned numbers, packed numbers can have implied decimal digits.

But just like with the zoned numbers, the less significant nibble of the first byte is used to hold the sign by using a hexadecimal C for positive numbers, a D for the negatives and an F for the unsigned values. The next image shows a representation of the number 1234 as an unsigned and signed packed number and the number -1234, of course as a signed packed.

Packed Number Representation.

Something to note is that when a file contains packed numbers it must be treated as a binary file, even on the mainframe environment.

This allowed programmers to economize space by using half of the size for numerical representation. Considering that back in the day the cost for storage and memory was much higher than nowadays, it was a wise decision to implement packed numbers at the expenses of increasing code complexity.

File Transfer between Mainframe and Open Systems

Usually files are transferred amongst these two platforms via FTP (File Transfer Protocol). This protocol has different ways to send a file between computers:

FTP Mode

Description

ASCII or Text

Data is converted from the host's character representation to ASCII before transmission.

Binary

File is transferred byte for byte.

EBCDIC

Used for plain text between Mainframes.

Local

Allows two computers with identical setups to send data in a proprietary format without the need to convert it to ASCII.

From the previous table we can see that there are some things to consider before transmitting a file. If the file being transferred contains packed fields then there is no other way to transfer it, but binary.

Jobs in Mainframe Systems

Mainframe jobs can be configured to store job output into a file just like SQL Server Agent. This file is usually called the sysout because of the instruction used in JCL language to specify the output file for job's messages.

Next Steps


sql server categories

sql server webinars

subscribe to mssqltips

sql server tutorials

sql server white papers

next tip



About the author
MSSQLTips author Daniel Farina Daniel Farina was born in Buenos Aires, Argentina. Self-educated, since childhood he showed a passion for learning.

This author pledges the content of this article is based on professional experience and not AI generated.

View all my tips



Comments For This Article




Tuesday, May 5, 2020 - 6:10:27 PM - George Alvarez Back To Top (85579)

I wonder if you've got any thoughts about using BCP to export data into EBCDIC so that I can send data to the mainframe. Currently I have a view where I've done all my decimal packing, then I have a java program that queries the view and creates a file and then we ftp it.

I'd love to be able to ditch the java and just use BCP to do the file creation job.  I'd be interested in your thoughts and/or experience.

Thanks

George


Monday, January 26, 2015 - 3:09:26 PM - AZJim Back To Top (36060)

Daniel ... great post.  As a former mainframe DBA, you were spot on.  One thing that I might suggest for a SQL Server (or Unix) DBA interacting with a mainframe DBA is to request the mainframe DBA to "unload" (the mainframe term for an export) to a "dataset" (a mainframe term for a file) in ASCII format.  He or she might not know that it can be done. but every utility (both native and third party) have this as a option.  This could be a way for the two disciplines to meet halfway.  Mainframers with current knowledge should also be more aware of this since globalization requires non-Western data to quite often be stored in double-byte format.  They really should be aware of the format requirements of the recipient.















get free sql tips
agree to terms