Andy Melnikov (nponeccop) wrote,
Andy Melnikov

HNC hacking guide - part 2

# The Optimizer

The optimizer consists of:

* Optimizer Facade
* Transformation Lifter
* Inbound Edges Counter
* Inliner

## The Facade

The facade exports single `optimize :: Definition -> Definition` function
transforming untyped terms.

Note that the optimizer doesn't support full terms yet - only terms consisting of a single top-level definition.

## The Lifter

The lifter (see `HN.Optimizer.WithGraph`) lifts graph transformations to term transformations.
It transforms a term to a graph,
applies the passed transformation and transforms the result back to term form.

So the major components of the Lifter are Graph Compiler (see `HN.Optimizer.GraphCompiler`)
and Graph Decompiler (see `HN.Optimizer.GraphDecompiler`).

Graph compilation maps hierarchical identifier space to flat space of graph node labels.
Graph decompilation performs the reverse mapping.
However, the compiler only stores a simple bidirectional flat map (see `HN.Optimizer.LabelFor`).
The scopes are reconstructed by The Decompiler using HOOPL built-in Dominator analysis.
So, the decompiler automatically performs a Let Floating transformation - all lets are moved
as deeply as possible. Also, as the Decompiler (and HOOPL in general) only visits the nodes
reachable from entry nodes, we get some dead code elimination for free.

As our IR is not a sequence of simple assembly-level instructions, it is not clear how to
transform our terms into graphs.

The Hoopl paper suggests: "In a higher-level intermediate form, a node might represent a simple

As we don't support statement sequences yet, our HOOPL blocks are degenerate: they consist of
a single statement. Logically, it means a single closed-closed HOOPL node. However,
HOOPL seems not to support such degenerate blocks. It might be possible to construct it by hand
using low-level operations, but no high-level helper functions exist for the C C case.

So our blocks consist of 2 nodes (see `HN.Optimizer.Node`): a dummy `Entry :: Label -> Node C O`
and a real `Exit :: DefinitionNode -> Node O C`.

data DefinitionNode
= LetNode [Label] (Expression Label)
| ArgNode
| LibNode


As you can see, there are 3 subtypes of nodes, and by extension blocks.

LetNode represents "in" expressions of definitions. An expressions can refer to formal arguments,
to other definitions (either local or outer) and to library definitions written in C++.

As described above, scope structure is made flat, so it doesn't matter whether a definition is
local or outer: references to both of them are represented by Label, and bodies by LetNode.

The [Label] part represent the formal arguments of a definition. So all its labels must
refer to 'ArgNode' blocks (blocks with real node containing an `ArgNode`). Note that we cannot
infer this list from the `Expression` member: arguments can be used by expression in any order,
and some arguments can be even unused.

We cannot trivially strip unused formal arguments as we strip unused definitions: we must also remove
corresponding actual arguments from all call sites, and we must check that the function is never
passed as an argument to another function. That requires writing a separate optimization pass.

# The Counter

To inline a definition, we first count references to the definition. As we chose the references
to be edges in the HOOPL control graph, we must find out how many edges enter each block.

This information can be obtained polymorphically for any HOOPL graph, as its purely graph-theoretic
and doesn't depend on internal structure of nodes. So the corresponding Hoopl pass

(see `HN.Optimizer.Inbound.passF`) can be made polymorphic in node type similarly to

domPass :: (NonLocal n, Monad m) => FwdPass m n Doms
This pass can then be included in HOOPL collection of useful universal passes (see `Compiler.Hoopl.Passes.*`)

The pass doesn't perform any rewrites.

# The Inliner

The Inliner uses the facts obtained from the Counter to inline definitions by performing
rewrites. It also updates the facts as rewrites are performed. It is an open
question whether a single deep rewriting by inliner is enough, or we should perform
Counter and Inliner passes until a fixed point is reached.

The problem is that inlining is a forward rewriting pass, but counting is a backward analysis.

The HOOPL paper states:

"In general, dataflow analysis establishes properties of paths:

* An assertion about all paths *to* a program point is established by a *forward* analysis.
* An assertion about all paths *from* a program point is established by a *backward* analysis."

Another view is that:

* During a forward pass, information propagates from the entry point(s) downward the control graph
* During a backward pass, information propagates upward the control graph to the entry point(s)

At two sides of an edge we have a definition and one of its call sites. So:

* Forward passes collect information at call sites, propagates it to definitions and
aggregate it there (using the lattice join)
* Backward passes collect information at definitions, propagate it to call sites and
aggregate it there

The Counter pass collects call count at call sites and aggregates partial call counts from
each call site to get a total count, so it's a Forward pass.

The Inliner pass collects definition bodies at definitions and propagates them to call sites,
so it's a Backward pass. The aggregation step is trivial, as all bodies are the same.

In fact, two pieces of data must be passed to call sites: definition bodies and a flag
whether they should be inlined.

I use a trick to combine them:

type ListFact = WithTopAndBot DefinitionNode

listLattice = addPoints' "ListFact" undefined

These lines create a lattice containing:
* all values of DefinitionNode type
* special Top value as the unique top element
* special Bottom value as the unique bottom element

`WithTopAndBot` defines the type of lattice values, and `addPoints'` implements the join
operation for the lattice.

Here is some laws from lattice theory:

* join x y = join y x
* join x Bot = x
* join x Top = Top

So most joins can be implemented automatically by `addPoints'`. We only must
provide an implementation of join if both x and y are "middle" elements, that is,
if they are both DefinitionNode. The middle elements are called PElem in HOOPL parlance.

In our case such join can never happen, so we pass undefined to addPoints'.

As an initial fact base we take facts collected by Counter and modify them:

intitalFactBase = mapMap modify counterResult where
modify 1 = Bot
modify _ = Top

So, if a definition identified by label can be inlined, we have Bot in the initial fact base,
and if a defintion cannot be inlined, we have Top. So our initial fact base only contains
Top or Bottom for each definition.

Then our transfer function unconditionally returns definition body, that is, a middle
lattice element.

This fact gets immediately (even before it reaches call site)
aggregated with the default fact, so:

if a definition can be inlined, we have join Bot definitionBody = definitionBody
if a definition cannot be inlined, we have join Top definitionBody = Top

So at this step the Bot is eliminated: only definitionBody or Top can be propagated to call
sites and joined there. A fascinating fact is that at the call sites only one type of joins
can happen - join Top Top, implemented by `addPoints'`. `definitionBody` is never joined
with anything because we found that there is only one edge.

So after aggregation step the following arrives to call sites:

* Top if a definition cannot be inlined
* definitionBody if a definition can be inlined

So the rewrite function has all information to perform a rewrite: both the condition and the
definition body to inline.
Tags: fp, hn0, programming

  • Post a new comment


    default userpic

    Your reply will be screened

    Your IP address will be recorded 

    When you submit the form an invisible reCAPTCHA check will be performed.
    You must follow the Privacy Policy and Google Terms of use.