So I thought I’d write a little about my current development project. It is a port to the .Net platform of the SSI classic “Curse of the Azure Bonds” (which I refer to as coab, but I guess there really should be a t in there). This project started in February of this year when I decided to work on Coab instead of Pool of Radiance (the first in the SSI AD&D series).
So I used the ever trusty IDA Pro from Hex Rays to open the game up, but like a lot of games from this period, the game was compress/encrypted. I got bored of trying to manual decode it, so I used the very helpful debug.exe to decode the game for me. After single stepping through the decompression code a few times, I wanted to dump the ram image of the game. I could not workout the syntax for the write to file command, so I dumped memory to screen. Dump to screen would only do 64K sections, so I needed to change the DI register. This ended up been done by altering the code about to be executed to alter the DI register as required. I wrote all the required commands into a text file so I could redirect this into debugger, and redirect the output to a file. This then gave me an ASCII file of the memory dump. I wrote a C++ program to parse the file and write a bin file. I then loaded this into IDA using the same offset as the original program was loaded at by debug.exe.
One of the things I’d noticed in all this was the way debug.exe loaded the game (16 bit DOS) was different to how IDA loads it. IDA loaded it verbatim and debug.exe removed one byte and altered the next.
Anyway with the uncompressed image now in IDA, I spent a week rebuilding the segment table that the game used. I then found my IDA scripts I had written back in 2000 while working on Pool of Radiance and used them to load the overlay file, and correctly re-write the jump tables etc. The first problem I found was that the scripts that I had were not the last version I had developed when working on PoR due to back-up issues, so I spent a week rewriting the scripts. I also fine tuned how they did there work. Once I did that I started to decode the functions, but soon noticed the library functions were all unmarked due to the flirt engine not have been run. So I load the original decoded bin file again and found out how to force the flirt engine to run. I saved off the changes. Somewhere in there I either applied the changes to the work in progress, or just started again with the flirt version, ether way, I started making progress on decoding the game.
This carried on for about a month in total, when I noticed that even though I’d dump 192K of the decoded game memory, I only had ~70K of it in IDA, I then double check the dump procedure, and the ASCII to bin program. In the end the problem lay in ASCII to bin program as I’d put a limit in there for testing (~4000 lines) that was stopping parsing before the end of file was reached. So I re-parsed the ASCII file. Then I wrote a IDA script to import the missing file at the correct point, and extend the segments to deal with it. The original problem was that the data segment was half missing. With this resolved more of data look-ups in the code made sense.
Another month passed, and I now had a pretty good understanding of the game structures used by the game. The biggest problems were the verbosity of the assembly code. So around May I start writing a C# program to translate the assembly code into C#. The first thing I did was create files from the segments. There were about 50 memory segments in the game due to the overlay memory management system. Then I got function blocks parsed and correct C# replacements. I then worked on parsing the parameters, and local variables. All the rest of the assembly was written in as comments.
At this time there was\ ~110K lines of assembly to translate. I started from scratch a few times due to errors in the translator, but I ended up getting good at writing Visual Studio regex to match the asm. In reality the fact the original game was un-optimised Pascal was quite nice. Structure access and assignments are done the same each and every time. The one thing that is very annoying about it been Pascal based is the base 1 arrays. Because the global data segment would have a single address used as a byte, and as a word array, and you need to sort the two usage’s out. This is one place IDA is (or I should say was as I’m using a older version) not to great at.
It is now August and there are only 24,485 lines of assembly left in the code. There are ~2.5K errors to deal with, mainly steaming from parameter mismatch, and C#’s overly picky maths rules. But it’s a work in progress. I’m really enjoying working on it.