Skip to content

implements the superset disassembler #1630

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 31 commits into
base: master
Choose a base branch
from

Conversation

KennethAdamMiller
Copy link
Contributor

This PR implements the superset disassembler as described in the following paper. The top post acts as a working documentation. We will update it as the work and discussions proceed.

Requirements Specification

Functional Requirements

  • 1.1 The superset disassembler shall be a selectable and configurable option.
  • 1.2 The superset disassembler shall expose a user interface, that will enable configuration of multiple options
  • 1.3 The superset disassembler shall expose a library interface.

The first requirement will enable seamless integration in the platform, I would like to be able to do bap /bin/ls --disassembler=superset. It shall be packed as a separate configuration feature, so that we can do ./configure --enable-superset-disassembler. The second requirement will allow me to choose my preferences, e.g., bap /bin/ls --disassembler=superset --superset-disassembler-features=loops-with-breaks. Finally, the disassembler should expose a stable API which could be used to build ad-hoc and fine grained solutions.

So, if you agree with those goals, let's always keep them in mind. Right now, as far as I understand the code, only option 3 is partially fulfilled.

Administrative Requirements

Next, are non-functional requirements, so to say administrative issues. As an artifact, this code doesn't belong to bap_disasm. Neither it should be internal to bap.std library nor it shall be a part of Bap.Std interface. (The same is true for recursive descent disassembler, and we will remove it in BAP 2.0 in a separate library). Therefore it should be packed into two (optionally three) components. The reusable library, which exposes the programmatic interface to the disassembler. The library shall depend on the Bap.Std interface and, if necessary, others. A plugin, which exposes some of the library interface to the command line interface, making sane defaults. And a frontend, which will provide utility functions (we can pack them into the plugin, we will see later, whether we need it or not, the main concern would be dependencies). Therefore, we have the following tasks:

  • 2.1 move the library code out of lib/bap_disasm.
  • 2.2 implement the plugin which will set up and load the superset disassembler
  • 2.3 (optional) implement the frontend
  • 2.4 ensure that all compilation units are properly namespaced

Concerning the requirement 2.1, it is not really necessary to keep it in the bap repository, if you want you can keep it in your own repository, move your repository to BinaryAnalysisPlatform organization, or pick a place in the bap repository, e.g., lib/bap_superset_disassembler.

And no matter what choice you will made, you have to give proper names to all your compilation units, aka files. OCaml has a flat namespace for compilation units, so if you have a file named features.ml you will not be able to link any other plugin or library that has the same file in its code base. Therefore, you need to prefix all your files, e.g., start all library files with bap_superset_disassembler_ and all your plugin files with super_disassembler_ prefixes.

Coding Standards Requirements

Those are self-explanatory

  • 3.1 no dead code
  • 3.2 no commented out code
  • 3.3 all modules shall have interface files (exception: modules that define only types or module types)
  • 3.4 no debugging output in the library code
  • 3.5 no todos in the released code
  • 3.6 no exceptions beyond failed invariants or preconditions

Quality Requirements

  • 4.1 provide unit tests, that ensure invariants of crucial components
  • 4.2 provide a set of functional tests

The number of the tests is to be decided. Though I would like to have close to 100% coverage of the core components.

Documentation Requirements

  • 5.1 provide an overall description of the algorithm
  • 5.2 provide the detailed description of the disassembler architecture and implementation
  • 5.3 document the public interface

The overall description should include a brief overview of the algorithm, purposes and tradeoffs. It shall reference the paper. If there are any differences between the paper and the implementation they should be highlighted. This documentation will end up in the plugin man page. So a user shall be able to understand without further ado, why does he need this plugin, how to enable it, and how to configure.

The detailed documentation is needed for us to support and bug fix it. It could be spread around the github discussions, comments in the internal mli files, and ml files. It shall document the purposes and invariants (if any) of all modules, and some crucial functions.

Finally, all public (accessible via the public mli file) functions shall be thoroughly documented, so that a user can apply them without having to refer to the implementation.

@KennethAdamMiller
Copy link
Contributor Author

KennethAdamMiller commented May 16, 2025

Some remaining todo items:

  • clean up heuristics of all but interpretation depth one, grammar, callsites, img entry and fixpoint convergence
  • document interpretation depth
  • get rid of abstract_ssa, liveness
  • consolidate grammar over differences in heuristics

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant