Reverse Engineering

Reverse Engineering involves reading application source code, decomposing it, and feeding the result into a process from which we can forward-generate a new version of the application in a different language.  It is the key enabler for our application modernization services.  The Codiscent Reverse Engineering Studio provides the ability to work interactively to configure, test and refine parsing templates, which use syntax very similar to the forward generation versions, and then process source code to produce data that can be used as a specification dataset by the Generative Engineering Studio.

In Reverse Engineering we tokenize source code, identify the linguistic patterns in it and generate a tree whose structure represents the elements of the source code and their relationship to one-another.  From that tree, we can identify and extract the meaningful branches to produce a tabular representation that we can export or share with the generation facility as part of a solution model.  The Reverse Engineering Studio provides the ability to control and manage the template library, the parsing process and the presentation structure of the data that is output.  The studio is designed to facilitate the process of iterating and improving performance of the templates to accommodate how various operations and functions are represented in the source code.  As experience with a code base increases, the number of changes to the templates decreases to the point that it’s possible to produce accurate trees for a vast majority of the application code, automatically.

Two important points about Codiscent’s Reverse Engineering paradigm:

  •  It’s not necessary to understand the full functionality of a source program to reverse-engineer it; it is only necessary to know how linguistic elements from the source should be represented in the target code in order to transform them.
  •  The process is designed to create stub functions in the target code for any linguistic element for which an analogous counterpart has not yet been created.  This ensures that an operator will never be stuck with an interim target program that cannot be compiled and, therefore, cannot be debugged.

Once a specification dataset has been created from the parsed source code, generating replacement code follows the same path as any other set of generated components.

Comments are closed.