Releases · spcl/dace

16 Nov 03:31

tbennun

v1.0.0

b5f91e1

v1.0.0 Latest

Latest

We are happy to announce DaCe version 1.0!

It is a major release milestone, and we went over many of the known issues over the years to ensure that this is the most stable version we can release without making fundamental changes to the framework. The Stateful DataFlow multiGraph (SDFG) intermediate representation used in this version is faithful to the original paper, which was published in 2019.

On a fundamental level, this release is no different from a minor version release (this version could have been DaCe 0.17), so there are no breaking changes from v0.x.

We would like to thank everyone who contributed to DaCe over the years and helped reach this milestone! It would not have been possible without you.

Release Notes

In addition to many issues and bugfixes courtesy of @acalotoiu, @tim0s, @htorst, @tbennun, @phschaad, @BenWeber42, @philip-paul-mueller, @luigifusco, @ThrudPrimrose, @FlorianDeconinck, @pratyai, @edopao, @kotsaloscv, and @iBug, several new features for quality of life and future development were added.

New features introduced into the SDFG IR and builder API:

Add GUIDs to SDFG elements and SDFG diff support (by @phschaad)
Added can_be_applied_to() to Transformation API (by @philip-paul-mueller)
SDFG.auto_optimize, SDFG.regenerate_code, and SDFG.as_schedule_tree are now easily accessible as API methods and fields

New Python frontend features

You can now specify the storage location of expressions inline using the @ operator or type hints. Examples:
- a = np.ones(M) @ dace.StorageType.CPU_ThreadLocal
- b: dace.float64[M, N] @ dace.StorageType.GPU_Global = np.zeros(...)

New transformations

WCRToAugAssign transformation (by @alexnick83)

New code generation features

clang-format can now be configured to be called on generated code (by @ThrudPrimrose)

Experimental features

Control flow (loop, conditional, named) regions (by @phschaad and @luca-patrignani). Stay tuned for more updates in the next development releases!

Other changes and bugfix highlights

Support for SymPy 1.13 (by @BenWeber42)
Rename misleading topological_sort to bfs_nodes by @BenWeber42 in #1590
Add multidimensional maps to GPU docs by @tbennun in #1608
Improve SDFG work-depth analysis and add SDFG simulated operational intensity analysis by @phschaad in #1607
Scalar return values are now disallowed by @philip-paul-mueller in #1609
Fixed RedundantArray's handling of "reshaping" Memlets by @philip-paul-mueller in #1603
Loop Region Code Generation by @phschaad in #1597
Bump certifi from 2023.7.22 to 2024.7.4 by @dependabot in #1614
Fix incorrect input/output of nested dace programs by @phschaad in #1615
Return correct state in nest_sdfg_subgraph by @tbennun in #1627
Made TransientReuse Less Verbose by @philip-paul-mueller in #1622
Improving the Usage of #pragma unroll by @philip-paul-mueller in #1621
Added PatternNode to dace.transformation imports. by @philip-paul-mueller in #1618
Implement user regions and function call regions by @luca-patrignani in #1623
Add UUIDs to SDFG elements by @phschaad in #1631
framecode: Fix missing BasicCFBlock argument by @iBug in #1630
Specified behaviour of Subset.covers() for different dimensionality by @philip-paul-mueller in #1637
More robust loop detection by @tbennun in #1646
Fix missed exploration of edges in constant propagation by @luigifusco in #1635
Fix infinite loop with control flow blocks by @tbennun in #1634
Print out exception on parsing fail early by @FlorianDeconinck in #1651
Reworked Optional Serializing by @philip-paul-mueller in #1647
Modified SetProperty by @philip-paul-mueller in #1653
Made CompiledSDFG in the main namespace available. by @philip-paul-mueller in #1567
SDFG Diff Tool by @phschaad in #1632
Made the SDFGState.add_mapped_tasklet() more convenient by @philip-paul-mueller in #1655
Maps With Zero Parameters by @philip-paul-mueller in #1649
Bug in constant propagation with multiple constants by @tbennun in #1658
Fixed PruneConnectors by @philip-paul-mueller in #1660
Fix array indirection to memlet subset promotion by @BenWeber42 in #1406
Renamed graph.bfs_edges to edge_bfs by @BenWeber42 in #1604
Inter-state edge assignment race test by @tbennun in #1672
Fix race conditions in Constant Propagation and Reference-To-View by @tbennun in #1679
Improve memlet label and string initialization by @tbennun in #1680
Control Flow Raising by @phschaad in #1657
Updated InlineMultistateSDFG by @philip-paul-mueller in #1689
Extend TrivialTaskletElimination for map scope by @edopao in #1650
Fix to Read and Write Sets by @philip-paul-mueller in #1678
Make is_empty() and propagate_subset() not unnecessarily rely on the src and dst by @pratyai in #1699
fix(codegen/prettycode): Use base_indentation as intended by @iBug in #1697
Warn on potential data races by @phschaad in #1712
Python frontend stability and inline storage specification by @tbennun in #1711
infer_symbols_from_datadescriptor : modification to infer offset by @kotsaloscv in #1525
Add CFG to generate_scope in tutorials by @ThrudPrimrose in #1706
Better CopyToMap by @philip-paul-mueller in #1675
More NumPy operation implementations by @tbennun in #1498
Fix jupyter's version of SDFV by @phschaad in #1714
Fix broken codegen tutorial by @romanc in #1720
CI: Update checkout and setup-python actions by @romanc in #1718
Bump version and update dependencies by @tbennun in #1722
Various Cutout Fixes by @phschaad in #1662
Various stability improvements and convenience APIs by @tbennun in #1724
Rename FORTRAN frontend tests by @pratyai in #1729
Add back clang-format support by @ThrudPrimrose in #1732
Fix problem with struct reads on interstate edges by @phschaad in #1512
Quality of life: Improved error messages by @romanc in #1731
Cherry-picked a handful of intrinsic related commits out of multi_sdfg branch. by @pratyai in #1728
Used valid FORTRAN test program for a couple frontend tests + Made floatlit2string() convert the FORTRAN real literal strings into python floats. by @pratyai in #1733
Fix pure reduce expansion for squeezed output memlets. by @pratyai in #1709
Make the import of typing.Literal portable between python versions 3.7 and 3.12 by @pratyai in #1700
Fix type inference and code generation for typeclasses and numpy types by @tbennun in #1725
SDFG API additions for version 1.0 by @tbennun in #1740
Replace another FORTRAN test program with gfortran -Wall certified test program. by @pratyai in #1736
Unskip unit tests and provide reasons for skipped tests by @tbennun in #1742
Fix OpenMP dynamic loop bounds that use persistent memory by @tbennun in #1746
Fixes for SDFGState._read_and_write_sets() by @philip-paul-mueller in #1747
Fix temporary transient counter during Python parsing of nested calls by @tbennun in #1745
Fix pystr_to_symbolic not correctly interpreting constants as boolean values in boolean comparisons by @phschaad in #1756
Fixed dace::math::pi and dace::math::nan on GPU by @philip-paul-mueller in #1759
Make scalar to symbol promotion robust to node order in state by @tbennun in #1766

Full Changelog: v0.16.1...v1.0.0

Contributors

romanc, pratyai, and 16 other contributors

Assets 2

24 Oct 14:41

tbennun

v1.0.0rc1

073b613

v1.0.0rc1 Pre-release

Pre-release

We are happy to announce the first release candidate of DaCe version 1.0!

This version uses the SDFG intermediate representation as published in the original Stateful Dataflow Multigraphs paper, which has been stable for quite some time.

On a fundamental level, this release is no different from a minor version release (this version could have been DaCe 0.17). However, with this release we would like to emphasize stability rather than new features.

If you are using DaCe and have a critical or blocking issue that makes it unstable, please create an issue and refer to it in the release discussion, so that we can add it to our release plan. Thank you for using DaCe!

Release Notes

New features:

Add GUIDs to SDFG elements and SDFG diff support (by @phschaad)
Added can_be_applied_to() to Transformation API (by @philip-paul-mueller)
Support SymPy 1.13 (by @BenWeber42)
New WCRToAugAssign transformation (by @alexnick83)
(Experimental) Control flow (loop, conditional, named) regions (by @phschaad and @luca-patrignani). Stay tuned for more updates in the next development releases!

Bugfixes:

Inter-state edge assignment race condition test in validation (by @tbennun)
Improve memlet label and string initialization (by @tbennun, @philip-paul-mueller)
Minor updates to documentation and internal APIs (by @tbennun, @phschaad, @philip-paul-mueller, @BenWeber42)
Minor fixes to the following transformations and passes: RedundantArray, TransientReuse, DetectLoop, ConstantPropagation, PruneConnectors (by @philip-paul-mueller, @tbennun, @luigifusco)
Minor frontend improvements (by @FlorianDeconinck, @BenWeber42)
Minor improvements to the code generator (by @iBug, @philip-paul-mueller)

See Full Changelog: v0.16.1...v1.0.0rc1

New Contributors

@iBug made their first contribution in #1630
@luigifusco made their first contribution in #1635

Contributors

FlorianDeconinck, iBug, and 7 other contributors

Assets 2

20 Join discussion

20 Jun 18:02

BenWeber42

v0.16.1

93b557f

v0.16.1

What's Changed

The main purpose of this release is to require NumPy < 2 for DaCe, since NumPy 2.0.0 contains breaking changes which aren't compatible with DaCe currently.

Recently, NumPy 2.0.0 has been released: https://numpy.org/news/#numpy-200-released

The release comes with documented breaking changes. Unfortunately, DaCe is currently not compatible with these changes. This also affects the recent 0.16 release of DaCe. Hence, we adjust our dependency requirements to use NumPy < 2 as a temporary work-around in this PR:

Fix numpy version to < 2.0 by @phschaad in #1601

Long term, we are tracking adding support for NumPy 2 in DaCe in this issue: #1602

Fix constant propagation failing due to invalid topological sort by @phschaad in #1589

This changeset has also landed in DaCe's development branch earlier. It fixes an issue where the ConstantPropagation pass can fail for certain graph structures.

Full Changelog: v0.16...v0.16.1

Contributors

phschaad

Assets 2

13 Jun 20:26

BenWeber42

v0.16

d6f481a

v0.16

What's Changed

CI/CD pipeline for NOAA & NASA weather and climate model by @FlorianDeconinck & @BenWeber42 in #1460, #1478 & #1575

Our collaborators NOAA & NASA have successfully used DaCe as an optimization framework and back-end for some of the components of their climate and weather model. Particularly, the FV3 dycore and GFS physics parametrization have been ported to a combination of GT4Py Python DSL and DaCe. DaCe is used within their stack as a stencil backend and as a full-program optimizer integrating stencils and glue-code together.

With this CI/CD pipeline, we run various checks for those components on every change to DaCe. This is an important step for DaCe to ensure stability for real-world applications that utilize DaCe. We are very grateful for this contribution and the collaboration with NOAA & NASA.

Changed default of serialize_all_fields to False by @BenWeber42 in #1564

This feature was already implemented in the previous 0.15.1 release in #1452, but not enabled by default. In this release, we are changing the default so that only fields with non-default values are serialized. This generally leads to a reduction in file size for SDFGs.

Since each DaCe version stores the default values of each field, it is still possible to recover these missing values. Default values should rarely change across different DaCe versions. Nevertheless, we want to caution users & developers when using SDFG files with different DaCe versions.

Analysis passes for access range analysis by @tbennun in #1484

Adds two analysis passes to help with analyzing data access sets: access ranges and Reference sources. To enable constructing sets of memlets, this PR also reintroduces data descriptor names to memlet hashes.

Reference-to-View pass and comprehensive reference test suite by @tbennun in #1485

Implements a reference-to-view pass (converting references to views if they are only set to one particular subset). Also improves the simplify pipeline in the presence of Reference data descriptors and adds multiple tests that use references.

Ndarray strides by @alexnick83 in #1506

The PR adds support for custom strides to dace.ndarray. Furthermore, the stride unit is number of elements, in contrast to NumPy/CuPy, where it is number of bytes. Custom strides are not supported for numpy.ndarray and cupy.ndarray.

Structure Support to NestedSDFGs and Python Frontend by @alexnick83 in #1366

Adds basic support for nested data (Structures) to the Python frontend. It also resolves issues with the use of Structures in nested SDFG scopes (mostly code generation).

Generalize StructArrays to ContainerArrays and refactor View class structure by @tbennun in #1504

This PR enables the use of an array data descriptor that contains a nested data descriptor (e.g., ContainerArray of Arrays). Its contents can then be viewed normally with View or StructureView.
With this, concepts such as jagged arrays are natively supported in DaCe (see test for example).
Also adds support for using ctypes pointers and arrays as arguments to SDFGs.

This PR also refactors the notion of views to a View interface, and provides views to arrays, structures, and container arrays. It also adds a syntactic-sugar/helper API to define a view of an existing data descriptor.

Add support for distributed compilation in DaceProgram by @kotsaloscv in #1551 & #1555

Adds configurable support for distributed compilation (MPI) to the Python front-end (via mpi4py). Distributed compilation can be enabled with the distributed_compilation parameter in the dace.program decorator.

Fixes and other improvements:

Remove unused deps by @jack-mcivor in #1459
Small fix for debuginfo that can be None by @kotsaloscv in #1469
Make dynamic map range docs more explicit by @tbennun in #1474
Added nan to the DaCe math namespace by @philip-paul-mueller in #1437
Fix for floordiv on GPU target by @edopao in #1471
Add merge_group to CI for merge queues by @tbennun in #1482
Fix SymPy dependency (again) by @tbennun in #1483
Fix for CUDA codegen by @edopao in #1442
Complete coverage for reference-to-view pass by @tbennun in #1488
CMakeLists.txt Improvements for CUDA by @kylosus in #1337
Faster Call for CompiledSDFG by @philip-paul-mueller in #1467
Evaluate dtype_to_typeclass at use time by @tbennun in #1494
Fix redefinition of interstate edge type in code generator by @tbennun in #1490
CuPy fixes and special cases for HIP by @tbennun in #1492
CI Update by @tim0s in #1502
FPGA CI Update by @tim0s in #1508
Bump jinja2 from 3.1.2 to 3.1.3 by @dependabot in #1503
Jupyter fix by @phschaad in #1489
Modernize HIP CMake commands, fix corner cases by @tbennun in #1518
Remove the long-deprecated symbol.get/set methods by @tbennun in #1523
Support output indirection in numpy frontend by @tbennun in #1509
Fix for const references by @alexnick83 in #1522
DeadDataFlowElimination will add type hint when removing a connector by @luca-patrignani in #1499
Fixed an issue in the Memlet duplication verification. by @philip-paul-mueller in #1526
Refactor SDFG List to CFG List by @phschaad in #1511
Dependency Edge Hotfix by @Berke-Ates in #1513
Remove Property.from_string and Property.to_string by @luca-patrignani in #1529
Fixed the {in,out}_edges() function of the DiGraph class. by @philip-paul-mueller in #1527
Fixes for structures nested in (nested) struct-arrays by @alexnick83 in #1534
Updated and fixed the MapExpansion transformation. by @philip-paul-mueller in #1532
Updated and fixed the MapDimShuffle tranformation. by @philip-paul-mueller in #1531
Use State Fissioning to Generalize Transformations by @lukastruemper in #1462
Fixed edge consolidation by @philip-paul-mueller in #1546
Fix Profiler + Minor improvements by @JanKleine in #1548
Add dtype for numpy.uintp which is compatible with C uintptr_t by @kotsaloscv in #1544
Fix bug in map_fusion transformation by @edopao in #1553
Updated the add_state_{after, before}() function. by @philip-paul-mueller in #1556
Bump idna from 3.4 to 3.7 by @dependabot in #1557
Fix infinite loops in memlet path when a scope cycle is added by @tbennun in #1559
Adds support for ArrayView to the Python Frontend by @alexnick83 in #1565
It is now possible to suppress output in view() by @philip-paul-mueller in #1566
Bump jinja2 from 3.1.3 to 3.1.4 by @dependabot in #1569
Correction in the docstring of the SDFG class's init method by @alexnick83 in #1571
Fix Subscript literal evaluation for List by @FlorianDeconinck in #1570
SDFG.save() now performs tilde expansion. by @philip-paul-mueller in #1578
Control Flow Block Constraints by @phschaad in #1476
Updated SDFV and Corresponding HTML Template by @phschaad in #1580
Changed Xilinx C++11 flag to C++14 by @BenWeber42 in #1585
Made dace::math::pow forward to std::pow more generic by @Berke-Ates @philip-paul-mueller @phschaad @BenWeber42 in #1580

New Contributors

@jack-mcivor made their first contribution in #1459
@kylosus made their first contribution in #1337
@luca-patrignani made their first contribution in #1499

Full Changelog: v0.15.1...v0.16

Contributors

tim0s, FlorianDeconinck, and 14 other contributors

Assets 2

07 Dec 18:20

BenWeber42

v0.15.1

7056675

v0.15.1

What's Changed

Highlights

Option for utilizing GPU global memory by @alexnick83 in #1405
Add tensor storage format abstraction by @JanKleine in #1392
Hierarchical Control Flow / Control Flow Regions by @phschaad in #1404
GPU code generation: User-specified block/thread/warp location by @tbennun in #1358
Implement loop-based Fortran intrinsics by @mcopik in #1394
Change strides move assignment outside if by @Sajohn-CH in #1402
Numpy fill accepts also variables by @philip-paul-mueller in #1420
Implement writeset underapproximation by @matteonu in #1425
Loop Regions by @phschaad in #1407
Compress the SDFG generated when failing/invalid for larger codebase by @FlorianDeconinck in #1456
Do not serialize non-default fields by default by @tbennun in #1452

Fixes and other improvements:

replace |& which is not widely supported by @tim0s in #1399
RTL codegen "line" error by @carljohnsen in #1403
Bump urllib3 from 2.0.6 to 2.0.7 by @dependabot in #1400
Bugfixes and extended testing for Fortran SUM by @mcopik in #1390
Remove erroneous file creation in test by @JanKleine in #1411
Fix for VS Code debug console: view opens sdfg in VS Code and not in browser by @kotsaloscv in #1419
Bump werkzeug from 2.3.5 to 3.0.1 by @dependabot in #1409
AugAssignToWCR: Support for more cases and increased test coverage by @lukastruemper in #1359
Implement Subsetlist and covers_precise by @matteonu in #1412
OTFMapFusion: Bugfix for tasklets with None connectors by @lukastruemper in #1415
Better mangeling of the state struct in the code generator by @philip-paul-mueller in #1413
Trivial map elimination init by @Sajohn-CH in #1353
Fixed Improper Method Call: Replaced mktemp by @fazledyn-or in #1428
Symbol specialization in auto_optimizer() never took effect. by @philip-paul-mueller in #1410
Issue a warning when to_sdfg() ignores the auto_optimize flag (Issue #1380). by @philip-paul-mueller in #1395
Fix schedule tree conversion for use of arrays in conditions by @tbennun in #1440
Fixes for TaskletFusion, AugAssignToWCR and MapExpansion by @lukastruemper in #1432
AugAssignToWCR: Minor fix for node not found error by @lukastruemper in #1447
OTFMapFusion: Minor bug fixes by @lukastruemper in #1448
Fix three issues related to deepcopying elements by @tbennun in #1446
Fix CUDA high-dimensional test by @tbennun in #1441
SDFG.arg_names was not a member but a class variable. by @philip-paul-mueller in #1457
PruneConnectors: Fission into separate states before pruning by @lukastruemper in #1451
In-out connector's global source when connector becomes out-only at outer SDFG scopes. by @alexnick83 in #1463
Fix two regressions in v0.15 by @tbennun in #1465
Fix codegen with data access on inter-state edge by @edopao in #1434

New Contributors

@kotsaloscv made their first contribution in #1419
@matteonu made their first contribution in #1412
@philip-paul-mueller made their first contribution in #1413
@fazledyn-or made their first contribution in #1428

Full Changelog: v0.15...v0.15.1rc1

Contributors

mcopik, tim0s, and 14 other contributors

Assets 2

16 Oct 17:32

tbennun

v0.15

0755385

v0.15

What's Changed

Work-Depth / Average Parallelism Analysis by @hodelcl in #1363 and #1327

A new analysis engine allows SDFGs to be statically analyzed for work and depth / average parallelism. The analysis allows specifying a series of assumptions about symbolic program parameters that can help simplify and improve the analysis results. For an example on how to use the analysis, see the following example:

from dace.sdfg.work_depth_analysis import work_depth

# A dictionary mapping each SDFG element to a tuple (work, depth)
work_depth_map = {}
# Assumptions about symbolic parameters
assumptions = ['N>5', 'M<200', 'K>N']
work_depth.analyze_sdfg(mysdfg, work_depth_map, work_depth.get_tasklet_work_depth, assumptions)

# A dictionary mapping each SDFG element to its average parallelism
average_parallelism_map = {}
work_depth.analyze_sdfg(mysdfg, average_parallelism_map, work_depth.get_tasklet_avg_par, assumptions)

Symbol parameter reduction in generated code (#1338, #1344)

To improve our integration with external codes, we limit the symbolic parameters generated by DaCe to only the used symbols. Take the following code for example:

@dace
def addone(a: dace.float64[N]):
  for i in dace.map[0:10]:
    a[i] += 1

Since the internal code does not actually need N to process the array, it will not appear in the generated code. Before this release the signature of the generated code would be:

DACE_EXPORTED void __program_addone(addone_t *__state, double * __restrict__ a, int N);

After this release it is:

DACE_EXPORTED void __program_addone(addone_t *__state, double * __restrict__ a);

Note that this is a major, breaking change that requires users who manually interact with the generated .so files to adapt to.

Externally-allocated memory (workspace) support (#1294)

A new allocation lifetime, dace.AllocationLifetime.External, has been introduced into DaCe. Now you can use your DaCe code with external memory allocators (such as PyTorch) and ask DaCe for: (a) how much transient memory it will need; and (b) to use a specific pre-allocated pointer. Example:

@dace
def some_workspace(a: dace.float64[N]):
  workspace = dace.ndarray([N], dace.float64, lifetime=dace.AllocationLifetime.External)
  workspace[:] = a
  workspace += 1
  a[:] = workspace

csdfg = some_workspace.to_sdfg().compile()

sizes = csdfg.get_workspace_sizes()  # Returns {dace.StorageType.CPU_Heap: N*8}
wsp = # ...Allocate externally...
csdfg.set_workspace(dace.StorageType.CPU_Heap, wsp)

The same interface is available in the generated code:

size_t __dace_get_external_memory_size_CPU_Heap(programname_t *__state, int N);
void __dace_set_external_memory_CPU_Heap(programname_t *__state, char *ptr, int N);
// or GPU_Global...

Schedule Trees (EXPERIMENTAL, #1145)

An experimental feature that allows you to analyze your SDFGs in a schedule-oriented format. It takes in SDFGs (even after applying transformations) and outputs a tree of elements that can be printed out in a Python-like syntax. For example:

@dace.program
def matmul(A: dace.float32[10, 10], B: dace.float32[10, 10], C: dace.float32[10, 10]):
  for i in range(10):
   for j in dace.map[0:10]:
     atile = dace.define_local([10], dace.float32)
     atile[:] = A[i]
     for k in range(10):
       with dace.tasklet:
         # ...
sdfg = matmul.to_sdfg()

from dace.sdfg.analysis.schedule_tree.sdfg_to_tree import as_schedule_tree
stree = as_schedule_tree(sdfg)
print(stree.as_string())

will print:

for i = 0; (i < 10); i = i + 1:
  map j in [0:10]:
    atile = copy A[i, 0:10]
    for k = 0; (k < 10); k = (k + 1):
      C[i, j] = tasklet(atile[k], B(10) [k, j], C[i, j])

There are some new transformation classes and passes in dace.sdfg.analysis.schedule_tree.passes, for example, to remove empty control flow scopes:

class RemoveEmptyScopes(tn.ScheduleNodeTransformer):
  def visit_scope(self, node: tn.ScheduleTreeScope):
    if len(node.children) == 0:
      return None
    return self.generic_visit(node)

We hope you find new ways to analyze and optimize DaCe programs with this feature!

Other Major Changes

Support for tensor linear algebra (transpose, dot products) by @alexnick83 in #1309
(Experimental) support for nested data containers and structures by @alexnick83 in #1324
(Experimental) basic support for mpi4py syntax by @alexnick83 and @Com1t in #1070 and #1288
(Experimental) Added support for a subset of F77 and F90 language features by @acalotoiu and @mcopik #1275, #1293, #1349 and #1367

Minor Changes

Support for Python 3.12 by @alexnick83 in #1386
Support attributes in symbolic expressions by @tbennun in #1369
GPU User Experience Improvements by @tbennun in #1283
State Fusion Extension with happens before dependency edge by @acalotoiu in #1268
Add CPU_Persistent map schedule (OpenMP parallel regions) by @tbennun in #1330

Fixes and Smaller Changes:

Fix transient bug in test with array_equal of empty arrays by @tbennun in #1374
Fixes GPUTransform bug when data are already in GPU memory by @alexnick83 in #1291
Fixed erroneous parsing of data slices when the data are defined inside a nested scope by @alexnick83 in #1287
Disable OpenMP sections by default by @tbennun in #1282
Make SDFG.name a proper property by @phschaad in #1289
Refactor and fix performance regression with GPU runtime checks by @tbennun in #1292
Fixed RW dependency violation when accessing data attributes by @alexnick83 in #1296
Externally-managed memory lifetime by @tbennun in #1294
External interaction fixes by @tbennun in #1301
Improvements to RefineNestedAccess by @alexnick83 and @Sajohn-CH in #1310
Fixed erroneous parsing of while-loop conditions by @alexnick83 in #1313
Improvements to MapFusion when the Map bodies contain NestedSDFGs by @alexnick83 in #1312
Fixed erroneous code generation of indirected accesses by @alexnick83 in #1302
RefineNestedAccess take indices into account when checking for missing free symbols by @Sajohn-CH in #1317
Fixed SubgraphFusion erroneously removing/merging intermediate data nodes by @alexnick83 in #1307
Fixed SDFG DFS traversal missing InterstateEdges by @alexnick83 in #1320
Frontend now uses the AST nodes' context to infer read/write accesses by @alexnick83 in #1297
Added capability for non-strict shape validation by @alexnick83 in #1321
Fixes for persistent schedule and GPUPersistentFusion transformation by @tbennun in #1322
Relax test for inter-state edges in default schedules by @tbennun in #1326
Improvements to inference of an SDFGState's read and write sets by @Sajohn-CH in #1325 and #1329
Fixed ArrayElimination pass trying to eliminate data that were already removed in #1314
Bump certifi from 2023.5.7 to 2023.7.22 by @dependabot in #1332
Fix some underlying issues with tensor core sample by @computablee in #1336
Updated hlslib to support Xilinx Vitis >=2022.2 by @carljohnsen in #1340
Docs: mention FPGA backend tested with Intel Quartus PRO by @TizianoDeMatteis in #1335
Improved validation of NestedSDFG connectors by @alexnick83 in #1333
Remove unused global data descriptor shapes from arguments by @tbennun in #1338
Fixed Scalar data validation in NestedSDFGs by @alexnick83 in #1341
Fix for None set properties by @tbennun in #1345
Add Object to defined types in code generation and some documentation by @tbennun in #1343
Fix symbolic parsing for ternary operators by @tbennun in #1346
Fortran fix memlet indices by @Sajohn-CH in #1342
Have memory type as argument for fpga auto interleave by @TizianoDeMatteis in #1352
Eliminate extraneous branch-end gotos in code generation by @tbennun in #1355
TaskletFusion: Fix additional edges in case of none-connectors by @lukastruemper in #1360
Fix dynamic memlet propagation condition by @tbennun in #1364
Configurable GPU thread/block index types, minor fixes to integer code generation and GPU runtimes by @tbennun in #1357

New Contributors

@computablee made their first contribution in #1290
@Com1t made their first contribution in #1288
@mcopik made their first contribution in #1349

Full Changelog: v0.14.4...v0.15

Contributors

mcopik, carljohnsen, and 11 other contributors

Assets 2

12 Jun 07:24

tbennun

v0.14.4

b7a8b5f

DaCe 0.14.4

Minor release; adds support for Python 3.11.

Assets 2

08 Jun 18:52

phschaad

v0.14.3

37b58bb

DaCe 0.14.3

What's Changed

Scope Schedules

The schedule type of a scope (e.g., a Map) is now also determined by the surrounding storage. If the surrounding storage is ambiguous, dace will fail with a nice exception. This means that codes such as the one below:

@dace.program
def add(a: dace.float32[10, 10] @ dace.StorageType.GPU_Global, 
        b: dace.float32[10, 10] @ dace.StorageType.GPU_Global):
    return a + b @ b

will now automatically run the + and @ operators on the GPU.

(#1262 by @tbennun)

DaCe Profiler

Easier interface for profiling applications: dace.profile and dace.instrument can now be used within Python with a simple API:

with dace.profile(repetitions=100) as profiler:
    some_program(...)
    # ...
    other_program(...)

# Print all execution times of the last called program (other_program)
print(profiler.times[-1])

Where instrumentation is applied can be controlled with filters in the form of strings and wildcards, or with a function:

with dace.instrument(dace.InstrumentationType.GPU_Events, 
                     filter='*add??') as profiler:
    some_program(...)
    # ...
    other_program(...)

# Print instrumentation report for last call
print(profiler.reports[-1])

With dace.builtin_hooks.instrument_data, the same technique can be applied to instrument data containers.

(#1197 by @tbennun)

Improved Data Instrumentation

Data container instrumentation can further now be used conditionally, allowing saving and restoring of data container contents only if certain conditions are met. In addition to this, data instrumentation now saves the SDFG's symbol values at the time of dumping data, allowing an entire SDFG's state / context to be restored from data reports.

(#1202, #1208 by @phschaad)

Restricted SSA for Scalars and Symbols

Two new passes (ScalarFission and StrictSymbolSSA) allow fissioning of scalar data containers (or arrays of size 1) and symbols into separate containers and symbols respectively, based on the scope or reach of writes to them. This is a form of restricted SSA, which performs SSA wherever possible without introducing Phi-nodes. This change is made possible by a set of new analysis passes that provide the scope or reach of each write to scalars or symbols.

(#1198, #1214 by @phschaad)

Extending Cutout Capabilities

SDFG Cutouts can now be taken from more than one state.

Additionally, taking cutouts that only access a subset of a data containre (e.g., A[2:5] from a data container A of size N) results in the cutout receiving an "Alibi Node" to represent only that subset of the data (A_cutout[0:3] -> A[2:5], where A_cutout is of size 4). This allows cutouts to be significantly smaller and have a smaller memory footprint, simplifying debugging and localized optimization.

Finally, cutouts now contain an exact description of their input and output configuration. The input configuration is anything that may influence a cutout's behavior and may contain data before the cutout is executed in the context of the original SDFG. Similarly, the output configuration is anything that a cutout writes to, that may be read externally or may influence the behavior of the remaining SDFG. This allows isolating all side effects of changes to a particular cutout, allowing transformations to be tested and verified in isolation and simplifying debugging.

(#1201 by @phschaad)

Bug Fixes, Compatability Improvements, and Other Changes

SymPy 1.12 Compatibility by @alexnick83 in #1256
GPU Grid-Strided Tiling by @C-TC in #1249
Fix MapInterchange for Maps with dynamic inputs by @alexnick83 in #1244
Assortment of fixes for dynamic Maps on GPU (dynamic thread blocks) by @alexnick83 in #1246
Tuning Compatibility Fixes by @lukastruemper in #1234
Inline preprocessor command by @tbennun in #1242
unsqueeze_memlet fixes by @alexnick83 in #1203
Fix-intermediate-nodes by @alexnick83 in #1212
Fix for LoopToMap when applied on multi-nested loops by @alexnick83 in #1207
Fix-nested-sdfg-deepcopy by @alexnick83 in #1221
Fix integer division in Python frontend by @tbennun in #1196
Fix augmented assignment on scalar in condition by @tbennun in #1225
Fix internal subscript access if already existed by @tbennun in #1228
Fix atomic operation detection for exactly-overlapping ranges by @tbennun in #1230
Fix-gpu-transform-copy-out by @alexnick83 in #1231
Fix-interstate-free-symbols by @alexnick83 in #1238
Fix nested access with nested symbol dependency by @alexnick83 in #1239
Fix import in the transformations tutorial. by @lamyiowce in #1210
LoopToMap detects shared transients by @alexnick83 in #1200
Faster CI and reachability checks for codecov.io by @tbennun in #1213
Map-fission-single-data-multi-connectors by @alexnick83 in #1216
Add library path to HIP CMake by @tbennun in #1219
BatchedMatMul: MKL gemm_batch support by @lukastruemper in #1181

Full Changelog: v0.14.2...v0.14.3

Please let us know if there are any regressions with this new release.

Contributors

tbennun, phschaad, and 4 other contributors

Assets 2

22 Feb 15:58

phschaad

v0.14.2

da95a40

DaCe 0.14.2

What's Changed

GPU instrumentation support with LIKWID by @lukastruemper
New GPU expansion for the Reduce Library Node by @hodelcl
CSRMM and CSRMV Library Nodes by @alexnick83, @lukastruemper, and @C-TC
New transformations (Temporal Vectorization, HBM Transform) and other FPGA improvements by @carljohnsen, @jnice-81, @sarahtr, and @TizianoDeMatteis
AMD GPU-related fixes and rocBLAS GEMM by @tbennun

Full Changelog: v0.14.1...v0.14.2

Contributors

carljohnsen, TizianoDeMatteis, and 7 other contributors

Assets 2

14 Oct 17:08

tbennun

v0.14.1

1580f0f

DaCe 0.14.1

This release of DaCe offers mostly stability fixes for the Python frontend, transformations, and callbacks.

Full Changelog: v0.14...v0.14.1

Assets 2

Releases: spcl/dace

v1.0.0

Release Notes

New features introduced into the SDFG IR and builder API:

New Python frontend features

New transformations

New code generation features

Experimental features

Other changes and bugfix highlights

Contributors

v1.0.0rc1

Release Notes

New Contributors

Contributors

v0.16.1

What's Changed

Fix constant propagation failing due to invalid topological sort by @phschaad in #1589

Contributors

v0.16

What's Changed

CI/CD pipeline for NOAA & NASA weather and climate model by @FlorianDeconinck & @BenWeber42 in #1460, #1478 & #1575

Changed default of serialize_all_fields to False by @BenWeber42 in #1564

Analysis passes for access range analysis by @tbennun in #1484

Reference-to-View pass and comprehensive reference test suite by @tbennun in #1485

Ndarray strides by @alexnick83 in #1506

Structure Support to NestedSDFGs and Python Frontend by @alexnick83 in #1366

Generalize StructArrays to ContainerArrays and refactor View class structure by @tbennun in #1504

Add support for distributed compilation in DaceProgram by @kotsaloscv in #1551 & #1555

Fixes and other improvements:

New Contributors

Contributors

v0.15.1

What's Changed

Highlights

Fixes and other improvements:

New Contributors

Contributors

v0.15

What's Changed

Work-Depth / Average Parallelism Analysis by @hodelcl in #1363 and #1327

Symbol parameter reduction in generated code (#1338, #1344)

Externally-allocated memory (workspace) support (#1294)

Schedule Trees (EXPERIMENTAL, #1145)

Other Major Changes

Minor Changes

Fixes and Smaller Changes:

New Contributors

Contributors

DaCe 0.14.4

DaCe 0.14.3

What's Changed

Scope Schedules

DaCe Profiler

Improved Data Instrumentation

Restricted SSA for Scalars and Symbols

Extending Cutout Capabilities

Bug Fixes, Compatability Improvements, and Other Changes

Contributors

DaCe 0.14.2

What's Changed

Contributors

DaCe 0.14.1