Skip to content

Commit 026722f

Browse files
committed
Update CHANGELOG for 0.4.0
1 parent e1d966b commit 026722f

File tree

1 file changed

+72
-0
lines changed

1 file changed

+72
-0
lines changed

CHANGELOG.md

Lines changed: 72 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,78 @@ The format is based on [Keep a Changelog](http://keepachangelog.com/en/1.0.0/)
66
and this project adheres to [Semantic
77
Versioning](http://semver.org/spec/v2.0.0.html).
88

9+
## [0.4.0] - 2023-07-13
10+
11+
We recommend using the following SYCL versions with this release:
12+
13+
- DPC++: 61e51015 or newer
14+
- hipSYCL: 24980221 or newer
15+
16+
See our [platform support guide](docs/platform-support.md) for a complete list of all officially supported configurations.
17+
18+
## Added
19+
20+
- Introduce new experimental `host_object` and `side_effect` APIs to express non-buffer dependencies between host tasks (#68, 7a5326a)
21+
- Add new `CELERITY_GRAPH_PRINT_MAX_VERTS` config options (#80, d3dd722)
22+
- Named threads for better debugging (#98, 25d769d, #131, ff5fbec)
23+
- Add support for passing device selectors to distr_queue constructor (#113, 556b6f2)
24+
- Add new `CELERITY_DRY_RUN_NODES` environment variable to simulate the scheduling of an application on a large number of nodes (without execution or data transfers) (#125, 299ebbf)
25+
- Add ability to name buffers for debugging (#132, 1076522)
26+
- Introduce experimental `fence` API for accessing buffer and host-object data from the main thread (#151, 6b803f8)
27+
- Introduce backend system for vendor-specific code paths (#162, 750f32a)
28+
- Add `CELERITY_USE_MIMALLOC` CMake configuration option to use the mimalloc allocator (enabled by default) (#170, 234e3d2)
29+
- Support 0-dimensional buffers, accessors and kernels (#163, 0685d94)
30+
- Introduce new diagnostics utility for detecting erroneous reference captures into kernel functions, as well as unused accessors (#173, ff7ed02)
31+
- Introduce `CELERITY_ACCESSOR_BOUNDARY_CHECK` CMake option to detect out-of-bounds buffer accesses inside device kernels (enabled by default for debug builds) (#178, 2c738c8)
32+
- Print more helpful error message when buffer allocations exceed available device memory (#179, 79f97c2)
33+
34+
## Changed
35+
36+
- Update spdlog to 1.9.2 (#80, a178828)
37+
- Overhaul logging mechanism (#80, 1b19bfc)
38+
- Improve graph dependency tracking performance (#100, c9dab18)
39+
- Improve task lookup performance (#112, 5139256)
40+
- Introduce epochs as a mechanism for in-graph synchronization (#86, 61dd07e)
41+
- Miscellaneous performance improvements (#115, 9a099d2, #137, b0254fd, #138, 02258c0, #145, f0b53ce)
42+
- Improve scheduler performance by reducing lock contention (#111, 4547b5f)
43+
- Improve graph generation and printing performance (#133, 8122798)
44+
- Use [libenvpp](https://github.com/ph3at/libenvpp) to validate all `CELERITY_*` environment variables (#158, b2ced9b)
45+
- Use native ("USM") pointers instead of SYCL buffers for backing buffer allocations (#162, 44497b3)
46+
- Implement `range` and `id` types instead of aliasing SYCL types (#163, 0685d94)
47+
- Disallow in-source builds (#176, 0a96d15)
48+
- Lift restrictions on reductions for DPC++ (#175, efff21b)
49+
- Remove multi-pass mechanism to allow reference capture of buffers and host-objects into command group functions, in alignment with the SYCL 2020 API (#173, 0a743c7)
50+
- Drastically improve performance of buffer data location tracking (#184, adff79e)
51+
- Switch to distributed scheduling model (#186, 0970bff)
52+
53+
## Deprecated
54+
55+
- Passing `sycl::device` to `distr_queue` constructor (use a device selector instead) (#113, 556b6f2)
56+
- Capturing buffers and host objects by value into command group functions (capture by reference instead) (#173, 0a743c7)
57+
- `allow_by_ref` is no longer required to capture references into command group functions (#173, 0a743c7)
58+
59+
## Removed
60+
61+
- Removed support for ComputeCpp (discontinued) (#167, 68367dd)
62+
- Removed deprecated `host_memory_layout` (use `buffer_allocation_window` instead) (#187, f5e6510)
63+
- Removed deprecated kernel dimension template parameter on `one_to_one`, `fixed` and `all` range mappers (#187, 40a12a4)
64+
- Kernels can no longer receive `sycl::item` (use `celerity::item` instead), this was already broken in 0.3.2 (#163, 67ccacc)
65+
66+
## Fixed
67+
68+
- Improve performance for buffer transfers on IBM Spectrum MPI (#114, c60527f)
69+
- Increase size limit on individual buffer transfer operations from 2 GiB to 128 GiB (#153, 972682f)
70+
- Fix race between creating collective groups and submitting host tasks (#152, 0a4fca5)
71+
- Align read-accessor `operator[]` with SYCL 2020 spec by returning const-reference instead of value (#156, 5011ded)
72+
73+
## Internal
74+
75+
- Add microbenchmark suite (#100, c2853ca, #107, 51f5bc5)
76+
- Update Catch2 to v3.3 (#102, 9a6f19d, #129, 0d1e36a, #162, 5aa33d6)
77+
- Add all_tests unit test executable (#104, c12b052)
78+
- Add custom CSV and Markdown reporters (#109, ba3af8b)
79+
- Introduce automatic clang-tidy checks for CI (#128, ca94bee)
80+
981
## [0.3.2] - 2022-02-17
1082

1183
### Added

0 commit comments

Comments
 (0)