Perhaps the trickiest part of data migration is the source migration. The largest problem is the large number of different programming languages, and different methods for accessing databases in all of those languages.
The source migration piece of DataDrone2 is very modular. Users indicate the type of source from the list of available modules. The module then opens the files associated with the specific type selected. The location of the customer application code is located and DataDrone2 finds applicable pieces of code that need to be changed via a two pass analysis and makes those changes. DataDrone2 then converts both the language specific code and the SQL then writing the new source out to a new file prefixed with “DD2_” for easy identification.
Source code migration is accomplished through a process that searches the source code looking for pieces that interact with the database. After it identifies those pieces, it then goes through a process of parsing the piece, breaking it down into a syntax tree. The individual elements of the syntax tree are then translated into their corresponding elements for the system that the code is being translated into. Then the syntax tree is put back together into a statement, and the new statement replaces the original one in the current source code.
For instance, to connect to a database in Informix ESQL/C, the following syntax is used: EXEC SQL CONNECT TO 'migrate_from' WITH CONCURRENT TRANSACTION;
DataDrone2 is able to identify this piece of source to convert because the vast majority of ESQL/C statements begins with 'EXEC SQL' and ends with a ';'. After identifying an applicable piece of source code to convert, we need to parse the statement into a syntax tree. The 'EXEC SQL' and ';' are placed in branches that are tagged as the starting and ending portions of the statement. The 'connect to' is placed in a branch that is tagged as the statement. The contents of 'migrate_from' are placed in a branch that is identified as the database name. 'WITH CONCURRENT TRANSACTION', is stored in a branch which identifies it as the type of lock-block this connection should have.
Now that we have a syntax tree for the ESQL/C statement, we can take the individual pieces and reassemble them into the equivalent Embedded SQL statements. In this example, the only portions of the example that have to be modified are the name of the database what we are connecting to, and the equivalent statement for the type of lock-block identified in the ESQL/C statement. The resulting statement looks like this:
EXEC SQL CONNECT TO 'migrate_to' IN SHARE MODE;
The above is a very simple example and really only demonstrates how DataDrone2 converts code that is specific to ESQL/C. The module for converting SQL queries between databases is shared by each of the language specific modules. Its responsibility is to convert any code that is tagged as an SQL query from one database to the other. This has been built into a separate module because the code that converts SQL queries is language independent.
An example of an SQL query that would require migration is:
DECODE(SUBSTR(cust_type, 0, 1), 'A', 'a type', 'B', 'b type', 'C', 'c type', 'default')
The important part of this statement is the DECODE. DB2 does not have a DECODE, but Informix does. The DB2 equivalent is the CASE statement. Functionally, the parser is going to break down the query into a syntax tree in like manner as with the ESQL/C statement above. The 'SELECT' will be placed on a branch that will be tagged identifying this as a select query. The database name 'customers' will be stored on a branch tagged as the table name. The two fields will get their own branches, identified by both the fact that they are fields to be returned by the select query and the order that they are presented in. The DECODE field will have additional branches for each of the parameters passed to it. SUBSTR is different though. As a statement that does not need to be translated, it will not have separate for its parameters.
Once the entire query has been parsed into a syntax tree the module can go to work putting it back together as a functional DB2 query. The only changes to this example is converting the DECODE into a CASE, and that conversion is performed once we have the syntax tree. Simply, each of the parameters passed to the DECODE are placed in their appropriate locations in the CASE.
CASE SUBSTR(cust_type, 0, 1)
WHEN 'A' THEN 'a type'
WHEN 'B' THEN 'b type'
WHEN 'C' THEN 'c type'
In conclusion, the source code migration basically happens in three steps. First, we parse identified source code looking for applicable portions of the code to migrate. Then, we generate a syntax tree for the applicable code. If that code contains an SQL query, or is an SQL query, then the SQL query is parsed into an SQL syntax tree. Finally, we take the syntax trees, convert any of the statements in them to their migrated equivalents, and then put them back together again.