Incremental Parametric Syntax for Multi-Language Transformation

Abstract

We present a new system for building source-to-source transformations that can run on multiple programming languages, based on a new way of representing programs called incremental parametric syntax. We construct incremental parametric syntaxes for C, Java, JavaScript, Lua, and Python, and demonstrate two multi-language program transformations that can run on all of them. Our evaluation shows that (1) once a transformation is written, relatively little work is required to configure it for a new language (2) transformations built this way output readable code which preserve the structure of the original, according to participants in our human study, and (3) despite dealing with many languages, our transformations can still handle language corner-cases, and pass 90% of compiler test suites. Incremental parametric syntax is based on the datatypes a la carte approach for constructing modular syntax, but extends it with the notion of a sort injection, which allows intermixing language-specific and generic components in a type-safe and modular fashion. Instead of translating each language to a common representation, this allows having a family of representations, each specific to a language, but sharing common components. Our system can construct an incremental parametric syntax semi-automatically from a third-party library for that language, with the user only writing code for the portions they wish to translate into generic components. The resulting incremental parametric syntax is isomorphic to the original representation, allowing transformations to be fully information-preserving. The user can begin by only translating a small fragment of a language into generic components, enough to support a few transformations, and incrementally add more. Our experience shows that constructing an incremental parametric syntax for a new language is easy, typically taking less than a day of work, and that multi-language transformations built against incremental parametric syntaxes can be configured for a large number of languages, with only a small amount of work per language.

Incremental parametric syntax is based on the datatypes à la carte approach for constructing modular syntax, but extends it with the notion of a sort injection, which allows intermixing language-specific and generic components in a type-safe and modular fashion. Instead of translating each language to a common representation, this allows having a family of representations, each specific to a language, but sharing common components. Our system can construct an incremental parametric syntax semi-automatically from a third-party library for that language, with the user only writing code for the portions they wish to translate into generic components. The resulting incremental parametric syntax is isomorphic to the original representation, allowing transformations to be fully information-preserving. The user can begin by only translating a small fragment of a language into generic components, enough to support a few transformations, and incrementally add more. Our experience shows that constructing an incremental parametric syntax for a new language is easy, typically taking less than a day of work, and that multi-language transformations built against incremental parametric syntaxes can be configured for a large number of languages, with only a small amount of work per language.

Publication
Master’s Thesis