slothy.core.config
SLOTHY configuration
Module Contents
Classes
Configuration for Slothy. |
API
- exception slothy.core.config.InvalidConfig
Bases:
Exception
Exception raised when an invalid SLOTHY configuration is detected
Initialization
Initialize self. See help(type(self)) for accurate signature.
- class slothy.core.config.Config(Arch, Target)
Bases:
slothy.helper.NestedPrint
,slothy.helper.LockAttributes
Configuration for Slothy.
This configuration object is used both for one-shot optimizations using SlothyBase, as well as stateful multi-pass optimizations using Slothy.
Initialization
- property arch
The module defining the underlying architecture used by Slothy.
TODO: Add details on what exactly is assumed about this module.
- property target
The module defining the target microarchitecture used by Slothy.
TODO: Add details on what exactly is assumed about this module.
- property outputs
List defining of architectural or symbolic registers that should be considered as outputs of the input snippet.
- property reserved_regs
Set of architectural registers _not_ available for register renaming. May be unset (None) to pick the default reserved registers for the target architecture.
In the lingo of inline assembly, this can be seen as the complement of the clobber list.
Note
Reserved registers are, by default, considered “locked”: They will not be _introduced_ during renaming, but existing uses will not be touched. If you want to remove existing uses of reserved registers through renaming, you should disable reserved_regs_are_locked.
Warning
When this is set, it _overwrites_ the default reserved registers for the target architecture. If you still want the default reserved registers to remain reserved, you have to explicitly list them!
- property reserved_regs_are_locked
Indicates whether reserved registers should be locked by default.
Reserved registers are not introduced during renaming. However, where they are already used by the input assembly, their use will not be eliminated or altered – that is, reserved registers are ‘locked’ by default.
Disable this configuration option to allow (in fact, force) renaming of existing uses of reserved registers. This can be useful when trying to eliminate uses of particular registers from some piece of assembly.
- property selftest
Indicates whether SLOTHY performs an empirical equivalence-test on the optimization results.
When this is set, and if the target architecture and host platform support it, this will run an empirical equivalence checker trying to confirm that the input and output of SLOTHY are likely functionally equivalent.
The primary purpose of this checker is to detect issue that would presently be overlooked by the selfcheck:
The selfcheck is currently blind to address offset fixup. If something goes wrong, the input and output will not be functionally equivalent, but we would only notice once we actually compile and run the code. The selftest will likely catch issues.
When using software pipelining, the selfcheck reduces to a straightline check for a bounded unrolling of the loop. An unbounded selfcheck is currently not implemented. With the selftest, you still need to fix a loop bound, but at least you can equivalence-check the loop-form (including the compare+branch instructions at the loop boundary) rather than the unrolled code.
Important
To run this, you need llvm-nm, llvm-readobj, llvm-mc in your PATH. Those are part of a standard LLVM setup.
Note
This is so far implemented as a repeated randomized test – nothing clever.
- property selftest_iterations
If selftest is set, indicates the number of random selftest to conduct
- property selftest_address_registers
Dictionary of (reg, sz) items indicating which registers are assumed to be pointers to memory, and if so, of what size.
- property selftest_default_memory_size
Default buffer size to use for registers which are automatically inferred to be used as pointers and for which no memory size has been configured via address_registers.
- property selfcheck
Indicates whether SLOTHY performs a self-check on the optimization result.
The selfcheck confirms that the scheduling permutation found by SLOTHY yields an isomorphism between the data flow graphs of the original and optimized code.
Warning
Do not unset this option unless you know what you are doing. It is vital in catching bugs in the model generation early.
Warning
The selfcheck is not a formal verification of SLOTHY’s output! There are at least two classes of bugs uncaught by the selfcheck:
User configuration issues: The selfcheck validates SLOTHY’s optimization in the context of the provided configuration. Validation of the configuration is the user’s responsibility. Two common pitfalls include missing reserved registers (allowing SLOTHY to clobber more registers than intended), or missing output registers (allowing SLOTHY to overwrite an output register in subsequent instructions).
This is the most common source of issues for code passing the selfcheck but remaining functionally incorrect.
Bugs in address offset fixup: SLOTHY’s modelling of post-load/store address increments is deliberately inaccurate to allow for reordering of such instructions leveraging commutativity relations such as
LDR X,[A],#imm; STR Y,[A] === STR Y,[A, #imm]; LDR X,[A],#imm
Hint
See also section “Address offset rewrites” in the SLOTHY paper
Bugs in SLOTHY’s address fixup logic would not be caught by the selfcheck. If your code doesn’t work and you are sure to have configured SLOTHY correctly, you may therefore want to double-check that address offsets have been adjusted correctly by SLOTHY.
- property selfcheck_failure_logfile
The filename for the log of a failing selfcheck.
This is printed in the terminal as well, but difficult to analyze for its sheer size.
- property unsafe_address_offset_fixup
Whether address offset fixup is enabled
Address offset fixup is a feature which leverages commutativity relations such as
ldr X, [A], #immA; str Y, [A, #immB] == str Y, [A, #(immB+immA)] ldr X, [A], #immA
to achieve greater instruction scheduling flexibility in SLOTHY.
Important
When you enable this feature, you MUST ensure that registers which are used for addresses are not used in any other instruction than load and stores. OTHERWISE, THE USE OF THIS FEATURE IS UNSOUND (you may see ldr/ str instructions with increment reordered with instructions depending on the address register).
By default, this is enabled for backwards compatibility.
Note
For historical reason, this feature cannot be disabled for the Armv8.1-M architecture model. A refactoring of that model is needed to make address offset fixup configurable.
Note
The user-imposed safety constraint is not a necessity – in principle, SLOTHY could detect when it is safe to reorder ldr/str instructions with increment. It just hasn’t been implemented yet.
- property allow_useless_instructions
Indicates whether SLOTHY should abort upon encountering unused instructions.
SLOTHY requires explicit knowledge of the intended output registers of its input assembly. If this option is set, and an instruction is encountered which writes to a register which (a) is not an output register, (b) is not used by any later instruction, then SLOTHY will flag this instruction and abort.
The reason for this behaviour is that such unused instructions are usually a sign of a buggy configuration, which would likely lead to intended output registers being clobbered by later instructions.
Warning
Don’t disable this option unless you know what you are doing! Disabling this option makes it much easier to overlook configuration issues in SLOTHY and can lead to hard-to-debug optimization failures.
- property variable_size
Model number of stalls as a parameter in the constraint model.
If this is set, one-shot SLOTHY optimization will make the number of stalls flexible in the model and, by default, task the underlying constraint solver to minimize it.
If this is not set, one-shot SLOTHY optimizations will search for solutions with a fixed number of stalls, and an external binary search be used to find the minimum number of stalls.
For small-to-medium sizes assembly input, this option should be set, and will lead to faster optimization. For large assembly input, the user should experiment and consider unsetting it to reduce model complexity.
- property keep_tags
Indicates whether tags in the input source should be kept or removed.
Tags include pre/core/post or ordering annotations that usually become meaningless post-optimization. However, for preprocessing runs that do not reorder code, it makes sense to keep them.
- property inherit_macro_comments
Indicates whether comments at macro invocations should be inherited to instructions in the macro body.
- property ignore_tags
Indicates whether tags in the input source should be ignored.
- property register_aliases
Dictionary mapping symbolic register names to architectural register names. When using Slothy, this can be indirectly populated by placing .req expressions in the input assembly. When using SlothyBase directly, this needs to be filled in by hand.
This is always joined with a list of default aliases (such as lr mapping to r14) specified in the target architecture.
- add_aliases(new_aliases)
Add further register aliases to the configuration
- property rename_inputs
A dictionary mapping input register names (symbolic or architectural) to their renaming configuration.
There are three supported renaming configurations per input: “static”, “any”, or a fixed architectural register. The configuration “any” means that the input may be freely renamed, and that the renaming is chosen at model solving time. This is the most flexible, but also the most demanding option. The configuration “static” means that the renaming is chosen at model construction time, as follows: If the input does already have an architectural name, it will keep it. If, otherwise, it is a symbolic input, it will be statically assigned an architectural name at model construction time. Finally, if the input is explicitly assigned an architectural register name, this name will be enforced. Note that this even applies to inputs which already have an architectural name – that is, you can use this option to change the architectural allocation of inputs.
The special keys “symbolic”, “arch” apply to all symbolic and architectural inputs, respectively. The key “other” applies to all inputs for which no other key matches.
The default value is { “symbolic”: “any”, “arch” : “static” } – that is, architectural inputs are not renamed, while symbolic inputs are dynamically renamed.
Examples:
Generally, unless you are prepared to modify surrounding code, you should have “arch” : “static”, which will not rename inputs which already have architectural register names.
Config.rename_inputs = { “other” : “any” } This would rename _all_ inputs, regardless of whether they’re symbolic or not. Thus, you’d likely need to modify surrounding code.
Config.rename_inputs = { “in” : “r0”, “arch” : “static”, “symbolic” : “any” } This would rename the symbolic input GPR ‘in’ to ‘r0’, keep all other inputs which already have an architectural name, while dynamically assigning suitable registers for symbolic inputs.
In case of a successful optimization, the assignment of input registers to architectural registers is given by the dictionary Result.input_renaming.
- property rename_outputs
A dictionary mapping output register names (symbolic or architectural) to their renaming configuration.
Analogous to Config.rename_inputs.
The default value is { “symbolic”: “any”, “arch” : “static” } – that is, architectural outputs are not renamed, while symbolic outputs are dynamically renamed.
In case of a successful optimization, the assignment of input registers to architectural registers is given by the dictionary Result.input_renaming.
- property inputs_are_outputs
If set, any input in the assembly to be optimized (that is, every register that is used as an input before it has been written to) is treated as an output. _Moreover_, such simultaneous input-outputs are forced to reside in the same architectural register at the beginning and end of the snippet.
This should usually be set when optimizing loops.
- property locked_registers
List of architectural registers that should not be renamed when they are used as output registers. Reserved registers are treated as locked if the option reserved_regs_are_locked is set.
- property sw_pipelining
Subconfiguration for software pipelining. Enabled/Disabled via the sub-field sw_pipelining.enabled. See Config.SoftwarePipelining for more information.
- property constraints
Subconfiguration for constraints to be considered by SLOTHY, e.g. whether latencies or functional units are modelled. See Config.Constraints for more information.
- property hints
Subconfiguration for hints to be considered by SLOTHY. See Config.Hints for more information.
- property max_solutions
The maximum number of solution found by the underlying constraint solver before it stops the search.
- property with_preprocessor
Indicates whether the C preprocessor is run prior to optimization.
- property with_llvm_mca
Indicates whether LLVM MCA should be run prior and after optimization to obtain approximate performance data based on LLVM’s scheduling models.
If this is set, Config.compiler_binary need to be set, and llcm-mca in your PATH.
- property llvm_mca_full
Indicates whether all available statistics from LLVM MCA should be printed.
- property llvm_mca_issue_width_overwrite
Overwrite LLVM MCA’s in-built issue width with the one SLOTHY uses
- property with_llvm_mca_before
Indicates whether LLVM MCA should be run prior to optimization to obtain approximate performance data based on LLVM’s scheduling models.
If this is set, Config.compiler_binary need to be set, and llcm-mca in your PATH.
- property with_llvm_mca_after
Indicates whether LLVM MCA should be run after optimization to obtain approximate performance data based on LLVM’s scheduling models.
If this is set, Config.compiler_binary need to be set, and llcm-mca in your PATH.
- property compiler_binary
The compiler binary to be used.
This is only relevant if with_preprocessor or with_llvm_mca_before or with_llvm_mca_after are set.
- property compiler_include_paths
Include path to add to compiler invocations
This is only relevant if with_preprocessor or with_llvm_mca_before or with_llvm_mca_after are set.
- property timeout
The timeout in seconds after which each invocation of the underlying constraint solver stops its search. A positive integer.
- property retry_timeout
The timeout in seconds after which the underlying constraint solver stops its search, in case of secondary optimization passes for other objectives than performance optimization (e.g., minimization of iteration overlapping).
- property do_address_fixup
Indicates whether post-optimization address fixup should be conducted.
SLOTHY’s modelling of post-load/store address increments is deliberately inaccurate to allow for reordering of such instructions leveraging commutativity relations such as:
` LDR X,[A],#imm; STR Y,[A] === STR Y,[A, #imm]; LDR X,[A],#imm `
When such reordering happens, a “post-optimization address fixup” of immediate load/store offsets is necessary. See also section “Address offset rewrites” in the SLOTHY paper.
Disabling this option will skip post-optimization address fixup and put the burden of post-optimization address fixup on the user. Disabling this option does NOT tighten the constraint model to forbid reorderings such as the above.
- WARNING: Don’t disable this option unless you know what you are doing!
Disabling this will likely lead to optimized code that is functionally incorrect and needing manual address offset fixup!
- property ignore_objective
Indicates whether the secondary objective (such as minimization of iteration overlapping) should be ignored.
- property objective_precision
The proximity to the estimated optimum solution at which the solver will stop its search.
For example, a value of 0.05 means that the solver will stop when the current solution is within 5% of the current estimate for the optimal solution.
- property objective_lower_bound
A lower bound for the objective at which to stop the search.
- property has_objective
Indicates whether a different objective than minimization of stalls has been registered.
- property absorb_spills
- property split_heuristic
Trade-off between runtime and optimality: Split each code block to be optimized into a fixed number of subchunks and optimize them one by one, rather than attempting a single large optimization.
If enabled, the numeric option split_heuristic_factor determines the number of factors to split each block of code into.
- property split_heuristic_factor
If split_heuristic is enabled, the number of factors to split each code block into prior to passing it to the core of Slothy.
The value of this option is irrelevant if split_heuristic is False.
- property split_heuristic_abort_cycle_at_high
During the split heuristic, a threshold for the number of stalls in the current optimization window above which the current pass of the split heuristic should stop.
- property split_heuristic_abort_cycle_at_low
During the split heuristic, a threshold for the number of stalls in the current optimization window below which the current pass of the split heuristic should stop.
- property split_heuristic_stepsize
If split heuristic is used, the increment for the sliding window. By default, this is twice the split factor. For example, a split factor of 5 means that the window size is 0.2 of the overall code size, and the default step size of 0.1 means that the sliding windows will be [0,0.2], [0.1,0.3], …
- property split_heuristic_optimize_seam
If the split heuristic is used, the number of instructions above and beyond the current sliding window that should be fixed but taken into account during optimization.
- property split_heuristic_chunks
If split heuristic is used, explicitly lists the optimization windows to be used. If unset, a sliding or adaptive optimization window will be used.
- property split_heuristic_bottom_to_top
If the split heuristic is used, move the sliding window from bottom to top rather than from top to bottom.
- property split_heuristic_region
Restrict the split heuristic to a sub-region of the code.
For example, if this is set to [0.25,0.75], only the middle half of the input will be optimized through the split heuristic.
This option can be combined with other options such as the split factor. For example, if the split region is set fo [0.25, 0.75] and the split factor is 5, then optimization windows of size .1 will be considered within [0.25, 0.75].
Note that even if this option is used, the specification of inputs and outputs is still with respect to the entire code; SLOTHY will automatically derive the outputs of the subregion configured here.
- property split_heuristic_preprocess_naive_interleaving
Prior to applying the split heuristic, interleave instructions according to lowest depth, without applying register renaming.
This can be useful if the code to be optimized is comprised of independent computations operating on different architectural state (e.g. scalar vs. SIMD); in this case, the naive preprocessing will ‘zip’ the different computations prior to applying the core optimization.
- property split_heuristic_preprocess_naive_interleaving_by_latency
If split heuristic with naive preprocessing is used, this option causes the naive interleaving to be by latency-depth rather than latency.
- property split_heuristic_estimate_performance
After applying the split heuristic, run SLOTHY again on the entire code to estimate the performance and display un-used issue slots in the output.
- property split_heuristic_repeat
If split_heuristic is enabled, the number of times the splitting heuristic should be repeated.
Note: This is an experimental option the practical value of which has not yet been thoroughly studied. Try if you like, but beware the bump in runtime for the optimization.
The value of this option is irrelevant if split_heuristic is False.
- property split_heuristic_preprocess_naive_interleaving_strategy
Strategy for naive interleaving preprocessing step
Supported values are:
“depth”: Always pick the instruction with the lower possible depth in the DFG first.
“alternate”: Try to evenly alternate between instructions tagged with “interleaving_class=0/1”.
- copy()
Make a deep copy of the configuration
- class SoftwarePipelining
Bases:
slothy.helper.NestedPrint
,slothy.helper.LockAttributes
Subconfiguration for software pipelining
Initialization
- property enabled
Determines whether software pipelining should be enabled.
- property unroll
The number of times the loop body should be unrolled.
- property pre_before_post
If both early and late instructions are allowed, force late instructions of iteration N to come _before_ early instructions of iteration N+2.
- property allow_pre
Allow ‘early’ instructions, that is, instructions that are pulled forward from iteration N+1 to iteration N. A typical example would be an early load.
- property allow_post
Allow ‘late’ instructions, that is, instructions that are deferred from iteration N to iteration N+1. A typical example would be a late store.
- property unknown_iteration_count
Determines whether the number of iterations is statically known and larger than the number of exceptional iterations hoisted out by SLOTHY (at most 2).
Set this to True if the loop can have any number of iterations.
- property minimize_overlapping
Set the objective to minimize the amount of iteration overlapping
- property optimize_preamble
Perform a separate optimization pass for the loop preamble.
- property optimize_postamble
Perform a separate optimization pass for the loop postamble.
- property max_overlapping
The maximum number of early or late instructions. None means that any number of early/late instructions is allowed.
- property min_overlapping
The minimum number of early or late instructions. None means that any number of early/late instructions is allowed.
- property halving_heuristic
Performance improvement heuristic: Rather than running a general software pipelining optimization, proceed in two steps: First, optimize loop body _without_ software pipelining. Then, split it as [A;B] and optimize [B;A]. The final result is then A; optimized([B;A]); B, with A being the preamble, B the postamble, and optimized([B;A]) the loop kernel.
- property halving_heuristic_periodic
Variant of the halving heuristic: Consider loop boundary when optimizing [B;A] in the second step of the halving heuristic. This is computationally more expensive but avoids bottlenecks at the loop boundary that could otherwise ensue.
This is only meaningful is the halving heuristic is enabled.
- property halving_heuristic_split_only
Cut-down version of halving-heuristic which only splits the loop [A;B] into A; [B;A]; B but does not perform optimizations.
- property max_pre
The maximum relative position (between 0 and 1) of an instruction that should be considered as a potential early instruction. For example, a value of 0.5 means that only instruction in the first half of the original loop body are considered as potential early instructions.
- class Constraints
Bases:
slothy.helper.NestedPrint
,slothy.helper.LockAttributes
Subconfiguration for performance constraints
Initialization
- property stalls_allowed
The number of stalls allowed. Internally, this is the number of NOP instructions that SLOTHY introduces before attempting to find a stall-free version of the code (or, more precisely: a version matching all constraints, which may be weaker than stall-free).
This is only meaningful for direct invocations to SlothyBase. You should not set this field when interfacing with Slothy.
- property stalls_maximum_attempt
The maximum number of stalls to attempt before aborting the optimization and reporting it as infeasible.
Note that since SLOTHY does not (yet?) introduce stack spills, a symbolic assembly snippet may be impossible to even concretize with architectural register names, regardless of the number of stalls one allows.
- property stalls_minimum_attempt
The minimum number of stalls to attempt.
This may be useful if it’s known for external reasons that searching for optimiztions with less stalls is infeasible.
- property stalls_first_attempt
The first number of stalls to attempt.
This may be useful if it’s known for external reasons that searching for optimization with less stalls is infeasible.
- property stalls_precision
The precision of the binary search for the minimum number of stalls
SLOTHY will stop searching if it can narrow down the minimum number of stalls to an interval of the length provided by this variable. In particular, a value of 1 means the true minimum if searched for.
- property stalls_timeout_below_precision
If this variable is set to a non-None value, SLOTHY does not abort optimization once binary search is operating on an interval smaller than the stall precision, but instead sets a different (typically smaller) timeout.
- property model_latencies
Determines whether instruction latencies should be modelled.
When set, SLOTHY will enforce that instructions are placed in accordance with the latency of the instructions that they depend on.
- property model_functional_units
Determines whether functional units should be modelled.
When set, SLOTHY will enforce that instructions are placed in accordance with the presence and throughput of functional units that they depend on.
- property functional_only
Limit Slothy to register renaming
- property allow_reordering
Allow Slothy to reorder instructions
Disabling this may be useful to e.g. reassign register names in code that has already been scheduled properly.
- property allow_renaming
Allow Slothy to rename registers
Disabling this may be useful in conjunction with !allow_reordering in order to find the number of model violations in a piece of code.
- property allow_spills
Allow Slothy to introduce stack spills
When this option is enabled, Slothy will consider the introduction of stack spills to reduce register pressure.
This option should only be disabled if it is known that the input assembly suffers from high register pressure. For example, this can be the case for symbolic input assembly.
- property spill_type
The type of spills to generate
This is usually spilling to the stack, but other options may exist. For example, on Armv7-M microcontrollers it can be useful to spill from the GPR file to the FPR file.
It is expected that this option is set as a dictionary, for example, with the key determining whether the spills are supposed to be to the stack or to the FPR file, and the value defining a starting index for the FPRs in the latter case.
The exact influence of this option is architecture dependent. You should consult the Spill class in the target architecture model to understand the options.
- property minimize_spills
Minimize number of stack spills
When this option is enabled, the Slothy will pass minimization of stack spills as the optimization objective to the solver.
- property max_displacement
The maximum relative displacement of an instruction.
Examples:
If set to 1, instructions can be reordered freely.
If set to 0, no reordering will happen.
If set to 0.5, an instruction will not move by more than N/2 places between original and re-scheduled source code.
This is an experimental feature for the purpose of speeding up otherwise intractable optimization tasks.
Warning
This only takes effect in straightline optimization (no software pipelining).
- class Hints
Bases:
slothy.helper.NestedPrint
,slothy.helper.LockAttributes
Subconfiguration for solver hints
Initialization
- property all_core
When SW pipelining is used, hint that all instructions should be ‘core’ instructions (not early/late).
- property order_hint_orig_order
Hint at using the initial program order for the program order variables.
- property rename_hint_orig_rename
Hint at using the initial program order for the program order variables.
- property ext_bsearch_remember_successes
When using an external binary search, hint previous successful optimization.
See also Config.variable_size.
- _check_rename_config(lst)