Skip to content
joelpx edited this page Mar 5, 2016 · 14 revisions

Add a support for a new architecture

Work in progress...

Specific files for an architecture are in the folder reverse/lib/arch/<NEW_ARCH>. Four files are mandatory to add a new architecture :

  • output.py : this is the implementation of the abstract class reverse.lib.output.
  • utils.py : it defines some functions to detect jump/return/call/compare instructions and how instruction symbols must be printed (example add for x86 is "+=").
  • process_ast.py : you can define functions to modify the ast after a decompilation.
  • __init__.py : it contains the list of all functions defined in process_ast.py.

output.py

Two functions from reverse.lib.output may be useful : _imm and _add. The first is used to print an immediate value and the second to print a string.

For RISC architectures you can get the operand size by doing self.gctx.dis.mode & CS_MODE_32.

from capstone import CS_MODE_32
from capstone.<ARCH> import ...
from reverse.lib.output import OutputAbs
from reverse.lib.arch.<NEW_ARCH>.utils import (inst_symbol, is_call, is_jump, is_ret, is_uncond_jump, cond_symbol)

class Output(OutputAbs):

_operand

  • i : capstone instruction
  • num_op : the nth operand to print from i.operands
  • hexa : if the operand is an immediate and must be printed in hexa
  • show_deref : used with memory access, it indicates if it should print *(). For example, the lea instruction in x86 set show_deref to False.
  • force_dont_print_data : if False and if the operand is a pointer (immediate) to a string, it will print the string near. Set it to True is used for call and jumps : a string is never printed.

This function is called on each operands of each instructions.

def _operand(self, i, num_op, hexa=False, show_deref=True, force_dont_print_data=False):
    def inv(n):
        return n == CS_OP_INVALID

    op = i.operands[num_op]

    if op.type == CS_OP_IMM:
        self._imm(op.value.imm, op_size, hexa, force_dont_print_data=force_dont_print_data)

    elif op.type == CS_OP_REG:
        self._add(i.reg_name(op.value.reg))

    elif op.type == MIPS_OP_MEM:
        mm = op.mem
        printed = False

        # Is the access contains a register with a known value ?
        # example : for x86 we can compute any access [eip + DISP]
        # We should call `self.deref_if_offset` for any known address.

        # This code is more or less generic, you just need to adapt it to the
        # architecture. (memory access can have a base, segment, index, disp,
        # shift (for arm), ...

        if show_deref:
            self._add("*(")
        if not inv(mm.base):
            self._add("%s" % i.reg_name(mm.base))
            printed = True

        if mm.disp != 0:
            section = self._binary.get_section(mm.disp)
            is_label = self.is_label(mm.disp)

            if is_label or section is not None:
                if printed:
                    self._add(" + ")
                self._imm(mm.disp, 0, True, section=section, print_data=False,
                          force_dont_print_data=force_dont_print_data)
            else:
                if printed:
                    if mm.disp < 0:
                        self._add(" - %d" % (-mm.disp))
                    else:
                        self._add(" + %d" % mm.disp)
                else:
                    self._add("%d" % mm.disp)

        if show_deref:
            self._add(")")




def _if_cond(self, cond, fused_inst):

def _sub_asm_inst(self, i, tab=0, prefix=""):
Clone this wiki locally