Releases: spcl/dace
DaCe 0.14
What's Changed
This release brings forth a major change to how SDFGs are simplified in DaCe, using the Simplify pass pipeline. This both improves the performance of DaCe's transformations and introduces new types of simplification, such as dead dataflow elimination.
Please let us know if there are any regressions with this new release.
Features
- Breaking change: The experimental
dace.constant
type hint has now achieved stable status and was renamed todace.compiletime
- Major change: Only modified configuration entries are now stored in
~/.dace.conf
. The SDFG build folders still include the full configuration file. Old.dace.conf
files are detected and migrated automatically. - Detailed, multi-platform performance counters are now available via native LIKWID instrumentation (by @lukastruemper in #1063). To use, set
.instrument
todace.InstrumentationType.LIKWID_Counters
- GPU Memory Pools are now supported through CUDA's
mallocAsync
API. To enable, setdesc.pool = True
on any GPU data descriptor. - Map schedule and array storage types can now be annotated directly in Python code (by @orausch in #1088). For example:
import dace
from dace.dtypes import StorageType, ScheduleType
N = dace.symbol('N')
@dace
def add_on_gpu(a: dace.float64[N] @ StorageType.GPU_Global,
b: dace.float64[N] @ StorageType.GPU_Global):
# This map will become a GPU kernel
for i in dace.map[0:N] @ ScheduleType.GPU_Device:
b[i] = a[i] + 1.0
- Customizing GPU block dimension and OpenMP threading properties per map is now supported
- Optional arrays (i.e., arrays that can be None) can now be annotated in the code. The simplification pipeline also infers non-optional arrays from their use and can optimize code by eliminating branches. For example:
@dace
def optional(maybe: Optional[dace.float64[20]], always: dace.float64[20]):
always += 1 # "always" is always used, so it will not be optional
if maybe is None: # This condition will stay in the code
return 1
if always is None: # This condition will be eliminated in simplify
return 2
return 3
Minor changes
- Miscellaneous fixes to transformations and passes
- Fixes for string literal (
"string"
) use in the Python frontend einsum
is now a library node- If CMake is already installed, it is now detected and will not be installed through
pip
- Add kernel detection flag by @TizianoDeMatteis in #1061
- Better support for
__array_interface__
objects by @gronerl in #1071 - Replacements look up base classes by @tbennun in #1080
Full Changelog: v0.13.3...v0.14
DaCe 0.13.3
What's Changed
- Better integration with Visual Studio Code: Calling
sdfg.view()
inside a VSCode console or debug session will open the file directly in the editor! - Code generator for the Snitch RISC-V architecture (by @Noah95 and @AM-Ivanov)
- Minor hotfixes to Python frontend, transformations, and code generation (with @orausch)
Full Changelog: v0.13.2...v0.13.3
DaCe 0.13.2
What's Changed
- New API for SDFG manipulation: Passes and Pipelines. More about that in the next major release!
- Various fixes to frontend, type inference, and code generation.
- Support for more numpy and Python functions:
arange
,round
, etc. - Better callback support:
- Support callbacks with keyword arguments
- Support literal lists, tuples, sets, and dictionaries in callbacks
- New transformations: move loop into map, on-the-fly-recomputation map fusion
- Performance improvements to frontend
- Better Docker container compatibility via fixes for config files without a home directory
- Add interface to check whether in a DaCe parsing context in #998
def potentially_parsed_by_dace():
if not dace.in_program():
print('Called by Python interpreter!')
else:
print('Compiled with DaCe!')
- Support compressed (gzipped) SDFGs. Loads normally, saves with:
sdfg.save('myprogram.sdfgz', compress=True) # or just run gzip on your old SDFGs
- SDFV: Add web serving capability by @orausch in #1013. Use for interactively debugging SDFGs on remote nodes with:
sdfg.view(8080)
(or any other port)
Full Changelog: v0.13.1...v0.13.2
DaCe 0.13.1
What's Changed
- Python frontend: Bug fixes for closures and callbacks in nested scopes
- Bug fixes for several transformations (
StateFusion
,RedundantSecondArray
) - Fixes for issues with FORTRAN ordering of numpy arrays
- Python object duplicate reference checks in SDFG validation
Full Changelog: v0.13...v0.13.1
DaCe 0.13
New Features
Cutout:
Cutout allows developers to take large DaCe programs and cut out subgraphs reliably to create a runnable sub-program. This sub-program can be then used to check for correctness, benchmark, and transform a part of a program without having to run the full application.
* Example usage from Python:
def my_method(sdfg: dace.SDFG, state: dace.SDFGState):
nodes = [n for n in state if isinstance(n, dace.nodes.LibraryNode)] # Cut every library node
cut_sdfg: dace.SDFG = cutout.cutout_state(state, *nodes)
# The cut SDFG now includes each library node and all the necessary arrays to call it with
Also available in the SDFG editor:
Data Instrumentation:
Just like node instrumentation for performance analysis, data instrumentation allows users to set access nodes to be saved to an instrumented data report, and loaded later for exact reproducible runs.
* Data instrumentation natively works with CPU and GPU global memory, so there is no need to copy data back
* Combined with Cutout, this is a powerful interface to perform local optimizations in large applications with ease!
* Example use:
@dace.program
def tester(A: dace.float64[20, 20]):
tmp = A + 1
return tmp + 5
sdfg = tester.to_sdfg()
for node, _ in sdfg.all_nodes_recursive(): # Instrument every access node
if isinstance(node, nodes.AccessNode):
node.instrument = dace.DataInstrumentationType.Save
A = np.random.rand(20, 20)
result = sdfg(A)
# Get instrumented data from report
dreport = sdfg.get_instrumented_data()
assert np.allclose(dreport['A'], A)
assert np.allclose(dreport['tmp'], A + 1)
assert np.allclose(dreport['__return'], A + 6)
Logical Groups:
SDFG elements can now be grouped by any criteria, and they will be colored during visualization by default (by @phschaad). See example in action:
Changes and Bug Fixes
- Samples and tutorials have now been updated to reflect the latest API
- Constants (added with
sdfg.add_constant
) can now be used as access nodes in SDFGs. The constants are hard-coded into the generated program, so you can run code with the best performance possible. - View nodes can now use the
views
connector to disambiguate which access node is being viewed - Python frontend:
else
clause is now handled in for and while loops - Scalars have been removed from the
__dace_init
generated function signature (by @orausch) - Multiple clock signals in the RTL codegen (by @carljohnsen)
- Various fixes to frontends, transformations, and code generators
Full Changelog available at v0.12...v0.13
DaCe 0.12
API Changes
Important: Pattern-matching transformation API has been significantly simplified. Transformations using the old API must be ported! Summary of changes:
- Transformations now expand either the
SingleStateTransformation
orMultiStateTransformation
classes instead of using decorators - Patterns must be registered as class variables called
PatternNode
s - Nodes in matched patterns can be then accessed in
can_be_applied
andapply
directly usingself.nodename
- The name
strict
is now replaced withpermissive
(False by default). Permissive mode allows transformations to match in more cases, but may be dangerous to apply (e.g., create race conditions). can_be_applied
is now a method of the transformation- The
apply
method accepts a graph and the SDFG.
Example of using the new API:
import dace
from dace import nodes
from dace.sdfg import utils as sdutil
from dace.transformation import transformation as xf
class ExampleTransformation(xf.SingleStateTransformation):
# Define pattern nodes
map_entry = xf.PatternNode(nodes.MapEntry)
access = xf.PatternNode(nodes.AccessNode)
# Define matching subgraphs
@classmethod
def expressions(cls):
# MapEntry -> Access
return [sdutil.node_path_graph(cls.map_entry, cls.access)]
def can_be_applied(self, graph: dace.SDFGState, expr_index: int, sdfg: dace.SDFG, permissive: bool = False) -> bool:
# Returns True if the transformation can be applied on a subgraph
if permissive: # In permissive mode, we will always apply this transformation
return True
return self.map_entry.schedule == dace.ScheduleType.CPU_Multicore
def apply(self, graph: dace.SDFGState, sdfg: dace.SDFG):
# Apply the transformation using the SDFG API
pass
Simplifying SDFGs is renamed from sdfg.apply_strict_transformations()
to sdfg.simplify()
AccessNodes no longer have an AccessType
field.
Other changes
- More nested SDFG inlining opportunities by default with the multi-state inline transformation
- Performance optimizations of the DaCe framework (parsing, transformations, code generation) for large graphs
- Support for Xilinx Vitis 2021.2
- Minor fixes to transformations and deserialization
Full Changelog: v0.11.4...v0.12
DaCe 0.11.4
What's Changed
- If a Python call cannot be parsed into a data-centric program, DaCe will automatically generate a callback into Python. Supports CPU arrays and GPU arrays (via CuPy) without copying!
- Python 3.10 support
- CuPy arrays are supported when calling
@dace.program
s in JIT mode - Fix various issues in Python frontend and code generation
Full Changelog: v0.11.3...v0.11.4
DaCe 0.11.3
DaCe 0.11.2
DaCe 0.11.1
What's Changed
- More flexible Python frontend: you can now call functions and object methods, use fields and globals in
@dace
programs! Some examples:- There is no need to annotate called functions
@dataclass
and general object field support- Loop unrolling: implicit and explicit (with the
dace.unroll
generator) - Constant folding and explicit constant arguments (with
dace.constant
as a type hint) - Debuggability: all functions (e.g.
dace.map
,dace.tasklet
) work in pure Python as well - and many more features
- NumPy semantics are followed more closely, e.g., subscripts create array views
- Direct CuPy and
torch.tensor
integration in@dace
program arguments - Auto-optimization (preview): use
@dace.program(auto_optimize=True, device=dace.DeviceType.CPU)
to automatically run some transformations, such as turning loops into parallel maps. - ARM SVE code generation support by @sscholbe (#705)
- Support for MLIR tasklets by @Berke-Ates in (#747)
- Source Mapping by @benibenj in #756
- Support for HBM on Xilinx FPGAs by @jnice-81 (#762)
Miscellaneous:
- Various performance optimizations to calling
@dace
programs - Various bug fixes to transformations, code generator, and frontends
Full Changelog: v0.10.8...v0.11.1