Expand description

In-memory representation of compiled machine code, with labels and fixups to refer to those labels. Handles constant-pool island insertion and also veneer insertion for out-of-range jumps.

This code exists to solve three problems:

  • Branch targets for forward branches are not known until later, when we emit code in a single pass through the instruction structs.

  • On many architectures, address references or offsets have limited range. For example, on AArch64, conditional branches can only target code +/- 1MB from the branch itself.

  • The lowering of control flow from the CFG-with-edges produced by BlockLoweringOrder, combined with many empty edge blocks when the register allocator does not need to insert any spills/reloads/moves in edge blocks, results in many suboptimal branch patterns. The lowering also pays no attention to block order, and so two-target conditional forms (cond-br followed by uncond-br) can often by avoided because one of the targets is the fallthrough. There are several cases here where we can simplify to use fewer branches.

This “buffer” implements a single-pass code emission strategy (with a later “fixup” pass, but only through recorded fixups, not all instructions). The basic idea is:

  • Emit branches as they are, including two-target (cond/uncond) compound forms, but with zero offsets and optimistically assuming the target will be in range. Record the “fixup” for later. Targets are denoted instead by symbolic “labels” that are then bound to certain offsets in the buffer as we emit code. (Nominally, there is a label at the start of every basic block.)

  • As we do this, track the offset in the buffer at which the first label reference “goes out of range”. We call this the “deadline”. If we reach the deadline and we still have not bound the label to which an unresolved branch refers, we have a problem!

  • To solve this problem, we emit “islands” full of “veneers”. An island is simply a chunk of code inserted in the middle of the code actually produced by the emitter (e.g., vcode iterating over instruction structs). The emitter has some awareness of this: it either asks for an island between blocks, so it is not accidentally executed, or else it emits a branch around the island when all other options fail (see Inst::EmitIsland meta-instruction).

  • A “veneer” is an instruction (or sequence of instructions) in an “island” that implements a longer-range reference to a label. The idea is that, for example, a branch with a limited range can branch to a “veneer” instead, which is simply a branch in a form that can use a longer-range reference. On AArch64, for example, conditionals have a +/- 1 MB range, but a conditional can branch to an unconditional branch which has a +/- 128 MB range. Hence, a conditional branch’s label reference can be fixed up with a “veneer” to achieve a longer range.

  • To implement all of this, we require the backend to provide a LabelUse type that implements a trait. This is nominally an enum that records one of several kinds of references to an offset in code – basically, a relocation type – and will usually correspond to different instruction formats. The LabelUse implementation specifies the maximum range, how to patch in the actual label location when known, and how to generate a veneer to extend the range.

That satisfies label references, but we still may have suboptimal branch patterns. To clean up the branches, we do a simple “peephole”-style optimization on the fly. To do so, the emitter (e.g., Inst::emit()) informs the buffer of branches in the code and, in the case of conditionals, the code that would have been emitted to invert this branch’s condition. We track the “latest branches”: these are branches that are contiguous up to the current offset. (If any code is emitted after a branch, that branch or run of contiguous branches is no longer “latest”.) The latest branches are those that we can edit by simply truncating the buffer and doing something else instead.

To optimize branches, we implement several simple rules, and try to apply them to the “latest branches” when possible:

  • A branch with a label target, when that label is bound to the ending offset of the branch (the fallthrough location), can be removed altogether, because the branch would have no effect).

  • An unconditional branch that starts at a label location, and branches to another label, results in a “label alias”: all references to the label bound to this branch instruction are instead resolved to the target of the branch instruction. This effectively removes empty blocks that just unconditionally branch to the next block. We call this “branch threading”.

  • A conditional followed by an unconditional, when the conditional branches to the unconditional’s fallthrough, results in (i) the truncation of the unconditional, (ii) the inversion of the condition’s condition, and (iii) replacement of the conditional’s target (using the original target of the unconditional). This is a fancy way of saying “we can flip a two-target conditional branch’s taken/not-taken targets if it works better with our fallthrough”. To make this work, the emitter actually gives the buffer both forms of every conditional branch: the true form is emitted into the buffer, and the “inverted” machine-code bytes are provided as part of the branch-fixup metadata.

  • An unconditional B preceded by another unconditional P, when B’s label(s) have been redirected to target(B), can be removed entirely. This is an extension of the branch-threading optimization, and is valid because if we know there will be no fallthrough into this branch instruction (the prior instruction is an unconditional jump), and if we know we have successfully redirected all labels, then this branch instruction is unreachable. Note that this works because the redirection happens before the label is ever resolved (fixups happen at island emission time, at which point latest-branches are cleared, or at the end of emission), so we are sure to catch and redirect all possible paths to this instruction.

Branch-optimization Correctness

The branch-optimization mechanism depends on a few data structures with invariants, which are always held outside the scope of top-level public methods:

  • The latest-branches list. Each entry describes a span of the buffer (start/end offsets), the label target, the corresponding fixup-list entry index, and the bytes (must be the same length) for the inverted form, if conditional. The list of labels that are bound to the start-offset of this branch is complete (if any label has a resolved offset equal to start and is not an alias, it must appear in this list) and precise (no label in this list can be bound to another offset). No label in this list should be an alias. No two branch ranges can overlap, and branches are in ascending-offset order.

  • The labels-at-tail list. This contains all MachLabels that have been bound to (whose resolved offsets are equal to) the tail offset of the buffer. No label in this list should be an alias.

  • The label_offsets array, containing the bound offset of a label or UNKNOWN. No label can be bound at an offset greater than the current buffer tail.

  • The label_aliases array, containing another label to which a label is bound or UNKNOWN. A label’s resolved offset is the resolved offset of the label it is aliased to, if this is set.

We argue below, at each method, how the invariants in these data structures are maintained (grep for “Post-invariant”).

Given these invariants, we argue why each optimization preserves execution semantics below (grep for “Preserves execution semantics”).

Structs

A buffer of output to be produced, fixed up, and then emitted to a CodeSink in bulk.

A MachBuffer once emission is completed: holds generated code and records, without fixups. This allows the type to be independent of the backend.

A label refers to some offset in a MachBuffer. It may not be resolved at the point at which it is used by emitted code; the buffer records “fixups” for references to the label, and will come back and patch the code appropriately when the label’s location is eventually known.

A source-location mapping resulting from a compilation.

Record of stack map metadata: stack offsets containing references.

Enums

A stack map extent, when creating a stack map.