The legacy (such as IMS, VSAM files, IDS/II) to SQL migration is broken down into two phases :
- The first phase is the creation of the new database (Oracle, DB2…) that offers the advantages of relational systems and the data migration itself.
- The second phase is the transformation of the application programs in order for them to use the new database, without changing the functionalities.
During a migration project the new database must be designed so that all the legacy data can be migrated to it. At Rever, we usually start from the model of the legacy database to transform it and obtain the new one. Thanks to our own DB-Main transformation toolbox, we make sure that the new database can store the legacy data.
In this post, we will thus explain how a relational database (DB2, Oracle…) can be designed from a legacy one. The legacy database can be either IMS, VSAM files or IDS/II and the programming language used to access the data is COBOL.
There are three kinds of structures for which special attention is needed :
- Type translation
- COBOL arrays (“occurs”) transformation
- Redefines-clause transformation
The difficulty of the design resides in finding the right balance between a well designed database (according to relational database standard) and the efficiency of the legacy application that needs to access it.
Cobol has only two data types (alphanumeric – PIC X – and numeric – PIC 9), but relational DBMS (DataBase Management System) have many more (characters, varchar, numeric, date, time, blob, boolean…). The easiest way to translate the data type is to translate “PIC X” as alphanumeric and “PIC 9” as numeric. This solution has several drawbacks :
- “PIC X” may contain binary data : some DBMS do not allow to store binary data into a “char” column because they only accept printable characters (according to the database character set). Another issue with binary data is that their conversion is different from the characters. If the legacy and the new database don’t use the same character set (EBCDIC to Ascii or Utf-8, for example), a conversion of the data is needed during the migration of the data.
- It does not take advantage of the expressiveness of the new database : for example, if a column is a date, the relational DBMS can verify if the data is a valid date, can offer a function to manipulate and sort dates.
- It has performance issues : for example, if a “PIC X(1000)” is translated as a “char(1000)”, then 1000 characters will be reserved in each row, but quite often in such big columns, all the characters are not used. If it’s translated as a “varchar(1000)”, only the needed space will be used.
For all these reasons, it’s important to correctly select the data type of the columns of the new database.
COBOL array (“occurs”) transformation
In a legacy database and COBOL, records can contain arrays (or “occurs”) of data. But relational databases don’t accept such structures and thus must be translated. There are three ways to translate them, each one with pros and cons :
- As a (sub-)table: the array is transformed into a table that is connected to the main table by a foreign key.
- As a list of columns : the array is transformed into a list of columns (one column for each element of the array).
- As a big (concatenated) column : the array is transformed into a big column that will contain all the array elements concatenated.
In Cobol, it is possible to define “redefines” (use different data description entries to describe the same computer storage area). For example in the following code, ORDER and DETAIL share the same storage area. The definition to be used depends on the value of another variable. In this example, the ORDER definition is used if DET-NUM is equal to zero, otherwise it’s the definition of ORDER that is used.
Relational databases don’t offer such a functionality, so each redefinition must be mapped to a different structure.
There are two ways to transform redefines :
- Each redefines is transformed as a table : in our example, ORDER and DETAIL became separate tables.
- Each field of the redefines is transformed as a (nullable) column : each field of the different redefines became a (nullable) column, check/triggers must be added to verify that columns from different redefines contain value.
As you can see, the conception of the database is a very important phase because it will determine the performance and readability of the new database.
To correctly design the new database, a very good understanding of the legacy one is needed. This understanding can be time consuming and can be achieved through reverse engineering.