slothy.core.core
Module Contents
Classes
The results of a one-shot SLOTHY optimization run |
|
Stateless core of SLOTHY. |
API
- exception slothy.core.core.SlothyException
Bases:
Exception
Generic exception thrown by SLOTHY
Initialization
Initialize self. See help(type(self)) for accurate signature.
- exception slothy.core.core.SlothySelfCheckException
Bases:
slothy.core.core.SlothyException
Exception thrown by SLOTHY during tht selfcheck
Initialization
Initialize self. See help(type(self)) for accurate signature.
- class slothy.core.core.Result(config)
Bases:
slothy.helper.LockAttributes
The results of a one-shot SLOTHY optimization run
Initialization
- property orig_code
Optimization input: Source code
- _gen_orig_code_visualized_perf()
- _gen_orig_code_visualized_perm()
- _gen_orig_code_visualized()
- property cycles
The number of cycles that SLOTHY thinks the code will take.
If software pipelining is enabled, this is the expected average cycle count per iteration.
- property cycles_bound
A lower bound for the number of cycles obtained during optimization.
This may be lower than the estimated cycle count of the result itself if optimization terminated prematurely, e.g. because of a timeout.
- property ipc_bound
An upper bound on the instruction/cycle (IPC) count obtained during optimization.
This may be lower than the IPC value of the result itself if optimization terminated prematurely, e.g. because of a timeout.
- property optimization_wall_time
Returns the amount of wall clock time in seconds the optimization has taken
- property optimization_user_time
Returns the amount of CPU time in seconds the optimization has taken
- property ipc
The instruction/cycle (IPC) count that SLOTHY thinks the code will have.
- property orig_code_visualized
Optimization input: Source code, including visualization of reordering
- property orig_inputs
The list of input registers in the _original_ source code.
- property orig_outputs
The list of output registers in the _original_ source code.
- property codesize
The number of instructions in the (original and optimized) source code.
- property codesize_with_bubbles
Performance-measure for the optimized source code.
This is the number of issue slots used by the optimized code. Equivalently, after division by the target’s issue width, it is SLOTHY’s expectation of the performance of the code in cycles.
It is also the codomain of the xxx_with_bubbles dictionaries.
- property pre_core_post_dict
Dictionary indicating interleaving of iterations.
This dictionary consists of items (i, (pre, core, post)), where i is the original program order position of an instruction, and pre, core, post indicate whether that instruction is an early, core or late instruction in the optimized source code.
An early instruction is one which is pulled into the previous iteration. A late instruction is one which is deferred until the next iteration. A core instruction is one which is left in its original iteration.
This property is only meaningful when software pipelining is enabled.
See also is_pre, is_core, is_post.
- is_pre(i, original_program_order=True)
Indicates if the instruction in original program order position i (starting at 0) was marked ‘early’ and thereby pulled into the previous iteration.
If original_program_order is False, the index instead refers to the _new_ program order position with in the kernel of the optimized loop.
This only makes sense when software pipelining was enabled.
- is_core(i, original_program_order=True)
Indicates if the instruction in original program order position i (starting at 0) was neither marked ‘early’ nor ‘late’, so stayed in its original iteration.
If original_program_order is False, the index instead refers to the _new_ program order position with in the kernel of the optimized loop.
This only makes sense when software pipelining was enabled.
- is_post(i, original_program_order=True)
Indicates if the instruction in original program order position i (starting at 0) was marked ‘late’ and thereby pulled into the next iteration.
This only makes sense when software pipelining was enabled.
- num_pre()
In a software pipelining result, the number of ‘early’ instructions.
- num_core()
In a software pipelining result, the number of ‘late’ instructions.
- num_post()
In a software pipelining result, the number of ‘core’ instructions (neither early nor late).
- num_prepost()
In a software pipelining result, the number of early or late instructions. This can be seen as a measure for the amount of interleaving across iterations that has happened.
- num_exceptional_iterations()
The number of loop iterations jointly covered by the loop preamble and postamble. In other words, the amount by which the iteration count for the optimized loop kernel is lower than the original iteration count.
- property reordering_with_bubbles
The reordering map linking original and optimized source code.
The output ordering includes ‘bubbles’ reflecting where SLOTHY thinks that the target microarchitecture would stall.
This is a map from [0,…,codesize) to [0,…,codesize_with_bubbles).
- cycle_position_with_bubbles()
Maps the original program order position of an instruction to the cycle number in which SLOTHY thinks (according to its microarchitecture model) the instruction would execute.
- reordering_with_bubbles_inv()
The inverse reordering permutation linking optimized and original source code
- get_reordering_with_bubbles(copies)
The reordering permutation linking original and optimized source code after unrolling copies times. The output ordering includes ‘bubbles’ reflecting where SLOTHY thinks that the target microarchitecture would stall.
- get_periodic_reordering_with_bubbles(copies)
- get_periodic_reordering_with_bubbles_inv(copies)
The inverse permutation of get_periodic_reordering_with_bubbles()
- get_periodic_reordering(copies)
- get_periodic_reordering_inv(copies)
- get_reordering(copies, no_gaps)
- get_fully_unrolled_loop(iterations)
- get_unrolled_kernel(iterations)
- reordering()
The reordering permutation linking original and optimized source code
- periodic_reordering_with_bubbles()
- periodic_reordering_with_bubbles_inv()
The inverse dictionary to periodic_reordering_with_bubbles
- periodic_reordering()
- periodic_reordering_inv()
The inverse permutation to periodic_reordering
- reordering_inv()
The inverse reordering permutation linking optimized and original source code
- property fixlen
- property code_raw
Optimized code, without annotations
- _get_code(visualize_reordering)
- property code
The optimized source code
- _get_full_code(log)
- selfcheck(log)
Checks that the original and optimized source code have isomorphic DFGs. More specifically, that the reordering permutation stored in Result object yields an isomorphism between DFGs.
When software pipelining is used, this is a bounded check for a fixed number of iterations.
- selftest(log)
Run empirical self test, if it exists for the target architecture
- selfcheck_with_fixup(log)
Do selfcheck, and consider preamble/postamble fixup in case of SW pipelining
In the presence of cross iteration dependencies, the preamble and postamble may be functionally incorrect and need fixup.
- _selfcheck_core(log)
- property inputs
The list of input registers in the _optimized_ source code. This is a list of architectural registers, and relates to the inputs of the original source code via Result.input_renaming. For the list of original input register names, use Result.orig_inputs.
- property outputs
The list of output registers in the _optimized_ source code. This is a list of architectural registers, and relates to the outputs of the original source code via Result.output_renaming. For the list of original output register names, use Result.orig_outputs.
- property input_renamings
Dictionary mapping original input names to architectural register names used in the optimized source code. See also Config.rename_inputs.
- property output_renamings
Dictionary mapping original output names to architectural register names used in the optimized source code. See also Config.rename_outputs.
- property stalls
The number of stalls in the optimization result.
More precisely: The number of cycles c such that optimization succeeded with up to c * issue_width unused issue slots.
- _build_stalls_idxs()
- property stall_positions
The positions of instructions in the optimized assembly where SLOTHY expects a stall or unused issue slot.
- property kernel
When using software pipelining, the loop kernel of the optimized loop.
- property kernel_input_output
When using software pipelining, the dependencies between successive loop iterations.
This is useful if you want to further optimize the preamble (and perhaps some code preceeding it), because the kernel dependencies are the output of the preamble.
- property preamble
When using software pipelining, the preamble of the optimized loop.
- property postamble
When using software pipelining, the postamble of the optimized loop.
- property config
The configuration that was used for the optimization.
- property success
Whether the optimization was successful
- __bool__()
- property valid
Indicates whether the result object is valid.
- _require_sw_pipelining()
- static _fixup_reordered_pair(t0, t1, logger)
- static _fixup_reset(nodes)
- static _fixup_finish(nodes, logger)
- _offset_fixup_sw(log)
- _offset_fixup_straightline(log)
- offset_fixup(log)
Fixup address offsets after optimization
- fixup_preamble_postamble(log)
Potentially fix up the preamble and postamble
When software pipelining is used in the context of a loop with cross-iteration dependencies, the core optimization step might lead to functionally incorrect preamble and postamble. This function checks if this is the case and fixes preamble and postamble, if necessary.
- class slothy.core.core.SlothyBase(Arch: any, Target: any, *, logger: any = None, config: any = None)
Bases:
slothy.helper.LockAttributes
Stateless core of SLOTHY.
This class is the technical heart of the package: It implements the conversion of a software optimization problem into a constraint solving problem which can then be passed to an external constraint solver. We use Google OR-Tools.
SlothyBase is agnostic of the target architecture and microarchitecture, which are specified at construction time.
- Parameters:
Arch (any) – A model of the underlying architecture.
Target (any) – A model of the underlying microarchitecture.
logger (any) – The logger to be used. If omitted, a child of the root logger will be used.
config (any) – The configuration to use. If omitted, the default configuration will be used.
Initialization
- property arch
The underlying architecture used by SLOTHY, as a read-only reference to the corresponding field in the configuration.
- property target
The underlying microarchitecture used by SLOTHY, as a read-only reference to the corresponding field in the configuration.
- property result
The result object of the last optimization.
- property success
Indicates whether the last optimiation succeeded.
- _reset()
- _set_timeout(timeout)
- optimize(source, prefix_len=0, suffix_len=0, log_model=None, retry=False)
- _load_source(source, prefix_len=0, suffix_len=0)
- _mark_loop_siblings()
When using SW pipelining, we internally use two loop iterations. Add references between corresponding instructions in both iterations.
- _init_model_internals()
- _usage_check()
- _reg_is_architectural(reg, ty)
- _restrict_input_output_renaming()
- _set_avail_renaming_registers()
- _dump_avail_renaming_registers()
- _add_register_usage(t, reg, reg_ty, var, start_var, dur_var, end_var, spill_var)
- _backup_original_code()
- class CpSatSolutionCb(logger, objective_description, max_solutions=32, is_good_enough=None, printer=None, variables=None)
Bases:
ortools.sat.python.cp_model.CpSolverSolutionCallback
A solution callback class represents objects that are alive during CP-SAT operation and equipped with a callback that is triggered every time CP-SAT finds a new solution.
This callback counts the solutions found so far, and aborts the search when the solution is sufficiently close to the optimum.
Initialization
- on_solution_callback()
Triggered when OR-Tools finds a solution to the current constraint problem
- solution_count()
The number of solutions found so far
- fixup_preamble_postamble()
Potentially fix up the preamble and postamble
When software pipelining is used in the context of a loop with cross-iteration dependencies, the core optimization step might lead to functionally incorrect preamble and postamble. This function checks if this is the case and fixes preamble and postamble, if necessary.
- _extract_result()
- _extract_positions(get_value)
- _extract_input_output_renaming()
- _extract_register_renamings(get_value)
- _extract_kernel_input_output()
- _extract_spills()
- _extract_code()
- _add_path_constraint(consumer, producer, cb)
Add model constraint cb() relating to the pair of producer-consumer instructions Outside of loop mode, this ignores producer and consumer, and just adds cb(). In loop mode, however, the condition has to be omitted in two cases:
The producer belongs to the early part of the first iteration, but the consumer doesn’t.
In this case, the early part of the first iteration is actually the early part of the third iteration.
The consumer belongs to the late part of the second iteration, but the producer doesn’t.
- _add_path_constraint_from(consumer, producer, cb_lst)
- _get_nodes_by_program_order(low=False, high=False, allnodes=False, inputs=False, outputs=False)
- _get_nodes_by_depth(**kwargs)
- _get_nodes(by_depth=False, **kwargs)
- _add_variables_scheduling()
Create variables for start, end and duration of every instruction, and assign the duration intervals to the units that run the instructions
- _add_variables_functional_units()
- _add_variables_dependencies()
- _add_variables_register_renaming()
Add boolean variables indicating if an instruction uses a certain output register
- _add_variables_loop_rolling()
- _is_low(t)
- _is_high(t)
- _is_input(t)
- _is_output(t)
- _iter_dependencies(with_virt=True, with_duals=True)
- _iter_dependencies_with_lifetime()
- _iter_cross_iteration_dependencies()
- _has_cross_iteration_dependencies()
- _add_constraints_lifetime_bounds_single(t)
- _add_constraints_lifetime_bounds()
- _force_allocation_variant(alloc_dict, combinations, combination_vars)
- _forbid_renaming_collision_single(var_dic_a, var_dic_b, condition=None)
- _forbid_renaming_collision_many(idx_pairs, var_dic_a, var_dic_b)
- _force_renaming_collision(var_dic_a, var_dic_b)
- _force_allocation_restriction_single(valid_allocs, var_dict)
- _force_allocation_restriction_many(restriction_lst, var_dict_lst)
- _add_constraints_register_renaming()
- _add_constraints_loop_optimization()
- _add_constraints_n_issue()
- _add_constraints_locked_ordering()
- _add_constraints_scheduling()
- _add_constraints_dependency_order()
- _add_constraints_latencies()
- _add_constraints_register_usage()
- _add_constraints_misc()
- get_inst_pairs(cond_fst=None, cond_snd=None, cond=None)
Yields all instruction pairs satisfying the provided predicate.
- _add_constraints_functional_units()
- _add_constraints_loop_periodic()
- restrict_early_late_instructions(filter_func)
Forces all instructions not passing the filter_func to be core, that is, neither early nor late instructions.
This is only meaningful if software pipelining is enabled.
- force_early(filter_func, early=True)
Forces all instructions passing the filter_func to be early, that is, neither early nor late instructions.
This is only meaningful if software pipelining is enabled.
- restrict_slots_for_instruction(t, slots)
This forces the given instruction to be assigned precisely one of the slots in the provided slot list.
This is only useful for microarchitecture models with multi-issuing (otherwise, there’s only slot 0 anyway).
For microarchitectures where functional units have a throughput of 1 instruction per cycle, this option can be an alternative way to model the occupancy of functional units: Rather than assigning length-1 intervals for the occupance of functional units and demanding that they’re disjoint – the default operation of Slothy, which works for multi-cycle instructions – one can identify the slot number with the functional unit that an instruction runs on, and then the non-overlapping constraint is subsumed by the mutual distinctiveness of the program order positions. Note, though, that this approach ‘overwrites’ the actual microarchitectural meaning of the slot number – if there are uarch constraints regarding the slot number (e.g. “instruction XYZ can only be issued on slot 0”) then it should not be used.
- restrict_slots_for_instructions(ts, slots)
Restrict issue slots for a list of instructions
- filter_instructions_by_property(filter_func)
Returns all nodes for instructions passing the filter
- filter_instructions_by_class(cls_lst)
Returns all nodes for instructions belonging the provided list of instruction classes.
- restrict_slots_for_instructions_by_class(cls_lst: list, slots: list)
Restrict issue slots for all instructions belonging to the provided list of instruction classes.
- Parameters:
cls_lst (list) – A list of instruction classes
slots (list) – A list of issue slots represented as integers.
- restrict_slots_for_instructions_by_property(filter_func: any, slots: list)
Restrict issue slots for all instructions passing the given filter function.
- Parameters:
filter_func (any) – A predicate on instructions
slots (list) – A list of issue slots represented as integers.
- _stalls_to_stats(stalls)
- _print_stalls(stalls)
- _print_stalls_and_spills(bound, variables)
- _add_objective(force_objective=False)
- _describe_solver()
- _init_external_model_and_solver()
- _NewIntVar(minval, maxval, name='')
- _NewIntervalVar(base, dur, end, name='')
- _NewOptionalIntervalVar(base, dur, end, cond, name='')
- _NewBoolVar(name='')
- _NewConstant(val)
- _Add(c)
- _AddNoOverlap(lst)
- _AddExactlyOne(lst)
- _AddMultiplicationEquality(lst, res)
- _AddImplication(a, b)
- _AddAtLeastOne(lst)
- _AddAbsEq(dst, expr)
- _AddAllDifferent(lst)
- _AddHint(var, val)
- _AddMaxEquality(varlist, var)
- _export_model()
- _solve()
- retry(fix_stalls=None)
- _dump_model_statistics()