Third article in a four-part series.
Data migration consists in reading the data of the legacy base in order to write them in the new base, while guaranteeing that all the data have been copied. This phase is decomposed in three steps :
- data extraction from the legacy database in order to produce one file per IMS segment
- data transformation or transformation of the read data in order to format them for the new database
- data loading in the new database
1. Data extraction
Each database is read sequentially to produce one file per segment. This file contains the raw data without any transformation. The reverse engineering phase has allowed us to model the IMS database. Rever has developed a DB-MAIN plugin which permits to generate the components necessary to the data extraction from the model of the IMS database. Namely, a reading program (COBOL) by DBD, the compilation and execution JCLs. The program sequentially reads the full database to produce one file per segment.
2. Data transformation
Thanks to the model of the IMS database, to the model of the new database and to the mapping between these two databases, it's possible to write programs which will read the resulting files of the extraction to transform them in files that can be read by the target database loader. Here again, Rever has developed a DB-MAIN plugin which generates the transformation programs (one per segment) based on both models and on the mapping between both models. These programs read the data from a segment, transform them to comply with the relational model and produce one file per table (corresponding with the target DBMS loader).
3. Data loading
In this phase, we will use the data loader provided with the target DBMS. Rever has developed a DB-MAIN plugin which generates the data loading scripts based on the target model.
Validation of data after migration is a step often overlooked but yet very important. Without this validation, if an error is detected during the application's testing using the new database, you'll have to check if said error is due to the migration of the database or if it comes from the application itself. Instead, when the data migration is validated, there's a good chance that the error does come from the application.
Three types of validation are available :
- Counters : will verify that the number of records before and after migration is the same for each type of records.
- Checksum : it is possible to calculate a checksum for the whole data in a column. You just have to compare the calculated checksum for a same column before and after migration. If the checksum is the same before and after, there's a good chance that the migration of this column is correct.
- XML comparison : the two previous methods allow us to verify that all the data of a table or of a column have been correctly migrated. But they don't allow you to check that the links have been migrated too (for example, that the orders of a customer are linked to another). This validation extracts (in whole or in part) the data from each database in XML documents of the same structure. Comparison of both documents permits to verify that both databases contain the same things.
The first validation is the easiest to implement, but is less precise.
On the other hand, the third option is more difficult to implement and is expensive to execute, but offers the certitude that the migration went well.
REVER has developed different plugins allowing the automation of the implementation of all 3 validation techniques.
In the fourth part of this series of articles, we will see how the programs can be adapted so that they can access the new database.
- IMS = Information Management System
- Cobol = Common Business-Oriented Language
- DBD = Database Description
- DBMS = Database Management System
- JCL = Job Control Language
- XML = Extensible Markup Language
- Checksum = a value used to verify the integrity of a file or data transfer
Find out more ? DB-Main is a data-modeling and data-architecture tool. Historically linked to the database engineering lab of the UNamur, Rever continues to improve the software in order to help developers and analysts in most data engineering processes.