My life mission is to reduce the global cost of software maintenance by a factor of 100. This motivates my coaching business directly, as I teach engineers how to recognize decisions that are almost-impossible to undo.
Research-wise, I believe the highest impact point is not in building tools that automate software maintenance tasks, but making these tools easier to build. I’ve explained my position in my EAG Boston 2017 Talk “The Catastrophic Risk of Software Maintenance,” and in my interview with Future of Coding.
Software Language Engineering
The PL community can produce magic that can automatically super-optimize your code, port it to a different system, or even write parts for you. Few of these tools have made it into commercial tools, and the ones that have are usually multimillion-dollar endeavors built for a single company. A major reason for this is heterogeneity. There are hundreds of languages, dialects, variations, and language hybrids in use (“500-Language Problem”). There are hundreds of tools we may want to write for each (which I call the “500-Problem Problem”). However, the way we build tools today, little work can be shared between each.
Hence, my primary interest is in making program analysis, synthesis, and transformation tools easier to build and more general. This research falls under the umbrella of Software Language Engineering, and draws heavily on generic-programming research as found in the ICFP community. The three pillars of this research are (1) automatically deriving tools from a language’s semantics (e.g.: Mandate), (2) sharing work between similar tools for different languages (e.g.: Cubix), and (3) sharing work between different tools for the same language.
I am particularly interested in better ways of building whole-program restructuring tools such as the Boeing Migration Tool. This is both because I believe whole-program transformations will be critical in solving software maintenance challenges, and because source-to-source transformations are fundamentally harder than either pure analysis or synthesis, due to their need to preserve information between source and target.
An important related problem is the ability to operate on multiple representations of a program, while preserving information. This is the field of bidirectional transformations (“BX”).
- Cubix (OOPSLA 2018): A program-transformation framework that achieves “One Tool, Many Languages”
- Mandate (in submission): A “control-flow graph generator generator.” The input is an operational semantics of a language; the output is a CFG-generator for that language.
Causal Inference for Program Analysis
Causal inference is a relatively-new branch of statistics which answers questions such as “if we perform action A, will event B occur?” However, it offers a special promise for program analysis: In order to relate two parts of a program, conventional program analysis must be able to read and understand everything in between. Causal inference techniques are well-suited for for answering “action-at-a-distance” questions by treating deterministic processes as black or gray boxes, rather than the white-box approach of conventional program analysis.
I am interested in these questions, along with more general questions at the intersection of causal inference and programming languages.
- Causal inference in probabilistic programming, with Zenna Tavares and Xin Zhang (ongoing)
See also the skunkworks page.
Binary Modification and Reverse-Engineering
Since 2010, I have led development of Project Ironfist, a mod for the classic PC game Heroes of Might and Magic II . To our knowledge, Ironfist is the first and only game mod that works by undoing the linking process and combining fragments of the original program with new code. This is a powerful technique that provides most of the flexibility of doing a complete rewrite of the program, but with much less up-front effort. Much of the technical details are described in the Ironfist Wiki.
Using this expertise, I developed a binary modification engine which plays a core role in one company’s products; the details are still confidential, and covered by multiple pending patents.
To support Ironfist development, I have built 4 tools for the IDA reverse-engineering tool and its accompanying decompiler, Hex-Rays. They are:
- Nopper, a plugin for disabling chunks of code
- REProgram, a plugin for load-time patching of binaries, capable of replacing a small region of a binary with an arbitrarily large amount of code.
- Referee, which improves Hex-Rays’ support for structure references. A Python port also exists, built by Joe Leong.
- to_masm, which converts a binary into a form where it may be reassembled, but with the ability to toggle whether each part should be replaced with new code.
Finally, I have taught a short reverse-engineering course during MIT’s Independent Activities Period every year since 2016.