Puyo Puyo Tsu/Reverse Engineering Process

From Puyo Nexus Wiki
Jump to: navigation, search

Reverse engineering a program means figuring out various things:

  • where and how data gets stored: for instance, where is stored the current state of each player's board or how is it represented;
  • what computations are done with those data: what algorithm computes the next pair a player will get? how is the score calculated?

To do so, one needs to read the program code, which will reveal the algorithms, but memory observation is also of importance, mainly with regards to RAM memory where variables, parameters and various states are stored while the system is running.

Use of debugging tools

Debuggers come in handy:

  • one can run the program and see the actual code that gets executed at the same time;
  • one can stop the program execution when some condition is met: a particular instruction has been reached, specific address in memory have been read from or written to. Complicated conditional expressions can be evaluated to trigger breakpoints upon writing or reading specific values at specific addresses;
  • one can slowly run the program by single-stepping through the code, pausing after each instruction. This enables the understanding of every bit of an algorithm by watching it unfold. Any required amount of time can be spent on manually figuring out how things went.

Original Author's Notes

Here are the original reverser's notes on the process, quoted from the forums.

From http://puyonexus.com/forum/viewtopic.php?p=41606#p41606

So I did exactly that: I've set watchpoints at various memory addresses to figure out which program functions were writing/reading there (by noting the value of the PC register when execution stopped). Then went to IDA Pro to analyze that part of the code, then single-stepped my way through the rest of the function.

At times, I used the tracing capability of the emulator: it writes a text file whose lines will each describe the successive instructions that were actually executed, along with their location. That helped figuring out where functions were called from. Indeed, old programming tricks used by game developers at the time make it painful to analyze ROMs statically, and automated tools have a hard time figuring everything out as well. So you have to tell IDA to interpret parts of the ROM as code or data.

From http://puyonexus.com/forum/viewtopic.php?p=41615#p41615 :

[Replying to] Wizzkidwas: it's been 2 weeks since I've first looked at how I would actually proceed. I've spent a week doing nothing, trying out the emulator, debugger... then a full week seriously looking at that stuff on my spare time (about 5 hours a day). I got lucky and literally stumbled upon the board buffers in memory, while looking through areas that were touched during a battle. Once I got that, making my way to the RNG was a matter of hours.

In the meantime, I had another process going on: I traced the CPU execution from powerup to the beginning of a battle, then identified all code paths that had been executed only once. I hoped to exclude most of the sound and video related code this way, and found about 30 interesting subroutines. Turns out one of those did initialize the RNG and I would have spotted the stuff this way.

But this is nothing exceptionally difficult. I inferred the game mechanics before looking at the code, so I had a few hypothesis at how it could work. For instance, I guessed some function would be called every time you place a pair to pick a new one from a random pool. Also, looking at the game, I knew something was going on with different pools for different color-subsets but some overwriting was going on to make the first pairs the same.

Now the hard work is mainly identifying every object (variable, buffer) in memory (RAM), and what role does it play (which setting it is, score, counter, buffer...). Once that is done, you're pretty much through, as you will easily get a list of every routine that touches them, and what their algorithms are.

The only painful stuff is something I'll talk about in a future post: how the RNG is pseudo-randomly iterated between battles, based on CPU interrupts (VBLANK, VINT, HINT, player inputs...) and various events I have yet to figure out.