Releases: computablee/DotMP
DotMP Pre-Release v2.0.0-pre1.2
This is the first pre-release of v2.0.0.
GPU Programming
This update incorporates support for GPGPU programming via an external package. Arrays can be created on the GPU, and GPU-based parallel-for loops can be run. There is limited support right now for GPU programming, and much work has yet to be completed.
Right now, the only type of loops supported are basic linear and collapsed for loops:
double[] a = new double[50_000];
double[] x = new double[50_000];
double[] y = new double[50_000];
float[] res = new float[50_000];
{
using var a_gpu = new DotMP.GPU.Buffer<double>(a, DotMP.GPU.Buffer.Behavior.To);
using var x_gpu = new DotMP.GPU.Buffer<double>(x, DotMP.GPU.Buffer.Behavior.To);
using var y_gpu = new DotMP.GPU.Buffer<double>(y, DotMP.GPU.Buffer.Behavior.To);
using var res_gpu = new DotMP.GPU.Buffer<float>(res, DotMP.GPU.Buffer.Behavior.From);
DotMP.GPU.Parallel.ParallelFor(0, a.Length, a_gpu, x_gpu, y_gpu, res_gpu,
(i, a_kernel, x_kernel, y_kernel, res_kernel) =>
{
res_kernel[i] = (float)(a_kernel[i] * x_kernel[i] + y_kernel[i]);
});
}
// after the destructors of the buffers are called, the memory is freed and data is copied back to the CPU
For collapsed for loops, ParallelForCollapse
is provided with an analogous API to the CPU API (i.e., passing in tuples of bounds).
I plan on updating the wiki over the next couple of weeks with details and tutorials.
CPU API Changes
Previously, in order to use CPU functions like Critical
, Ordered
, and Single
, an ID parameter had to be passed in to inform the runtime which loop it was seeing. This is done away with, and the runtime is able to distinguish these functions when called in different locations (i.e., inferring the ID parameter). The old functions have been deprecated, but are available. To use the new overloads, simply remove the ID parameter.
Old:
DotMP.Parallel.Single(42, () =>
{
// single region with ID 42
});
New:
DotMP.Parallel.Single(() =>
{
// an ID is no longer required
});
What's Changed
- Implement baseline GPU functionality by @computablee in #95
- Update release/2.0 branch with GPU changes by @computablee in #125
- Add GPU functionality by @computablee in #126
- Add support for .NET Framework 4.7.1 and .NET Standard 2.1 by @computablee in #127
- nuget: bump the xunit group in /DotMP-Tests with 4 updates by @dependabot in #136
- nuget: bump the fluent group in /DotMP-Tests with 1 update by @dependabot in #135
- actions: bump vsoch/pull-request-action from 1.0.24 to 1.1.0 by @dependabot in #134
- actions: bump actions/setup-dotnet from 3 to 4 by @dependabot in #133
- Add license to nupkg files by @computablee in #138
- Fix code quality tests by @computablee in #137
- nuget: bump the xunit group in /DotMP-Tests with 4 updates by @dependabot in #139
- nuget: bump the xunit group in /DotMP-Tests with 3 updates by @dependabot in #141
- nuget: bump the fluent group in /DotMP-Tests with 1 update by @dependabot in #140
- Dependabot cannot recursively search directories by @computablee in #142
- nuget: bump the bench group in /benchmarks/GPUHeatTransfer with 1 update by @dependabot in #143
- nuget: bump the bench group in /benchmarks/ILGPUOverhead with 1 update by @dependabot in #144
- nuget: bump the bench group in /benchmarks/GPUOverhead with 1 update by @dependabot in #145
- nuget: bump the bench group in /benchmarks/Misc with 2 updates by @dependabot in #147
- nuget: bump the bench group in /benchmarks/HeatTransfer with 2 updates by @dependabot in #148
- nuget: bump the xunit group in /DotMP-Tests with 2 updates by @dependabot in #146
- nuget: bump the fluent group in /DotMP-Tests with 1 update by @dependabot in #151
- nuget: bump the xunit group in /DotMP-Tests with 3 updates by @dependabot in #152
- nuget: bump the bench group in /benchmarks/GPUHeatTransfer with 1 update by @dependabot in #153
- nuget: bump the bench group in /benchmarks/GPUOverhead with 1 update by @dependabot in #154
- nuget: bump the bench group in /benchmarks/HeatTransfer with 2 updates by @dependabot in #155
- nuget: bump the bench group in /benchmarks/Misc with 2 updates by @dependabot in #156
- nuget: bump the bench group in /benchmarks/ILGPUOverhead with 1 update by @dependabot in #157
- nuget: bump the xunit group in /DotMP-Tests with 2 updates by @dependabot in #158
- nuget: bump the xunit group in /DotMP-Tests with 2 updates by @dependabot in #159
- nuget: bump the xunit group in /DotMP-Tests with 2 updates by @dependabot in #160
- actions: bump codecov/codecov-action from 3 to 4 by @dependabot in #161
- nuget: bump the xunit group in /DotMP-Tests with 2 updates by @dependabot in #165
- actions: bump vsoch/pull-request-action from 1.1.0 to 1.1.1 by @dependabot in #166
- nuget: bump the coverlet group in /DotMP-Tests with 2 updates by @dependabot in #167
- nuget: bump the fluent group in /DotMP-Tests with 1 update by @dependabot in #168
- nuget: bump the xunit group in /DotMP-Tests with 2 updates by @dependabot in #169
- nuget: bump Microsoft.NET.Test.Sdk from 17.8.0 to 17.9.0 in /DotMP-Tests by @dependabot in #170
Full Changelog: v1.6.1...v2.0.0-pre1.2
DotMP Release v1.6.1
This is a minor release. This release adds support for .NET Framework 4.7.1 and .NET Standard 2.1.
Full Changelog: v1.6.0...v1.6.1
DotMP Release v1.6.0
DotMP v1.6.0 doesn't incorporate a ton of new features (though there are some!). Rather, it aims to provide a wide plethora of well-needed performance improvements across the board.
Licensing Changes
The project license has been changed from MIT to LGPL 2.1.
Changes to the Scheduler API
The schedule
parameter in Parallel.For
and its derivatives now take implementations of the IScheduler
interface instead of a DotMP.Schedule
enum. The changes are fully source-compatible with previous versions, but breaks API and ABI compatibility. There are not only performance benefits to doing this, but in addition, the code is simpler, more modular, more maintainable, more readable, has less duplication, and is expansible by the user.
The IScheduler
interface is public-facing and permits users to implement their own custom schedulers. Details are outlined in the new wiki!
We also introduce a new work-stealing scheduler. The work-stealing scheduler can be accessed via DotMP.Schedule.WorkStealing
, and has been seamlessly integrated into the rest of DotMP. Details are also outlined in the new wiki.
Performance Improvements
There have been minor performance improvements across the board with parallel-for loops. However, collapsed for loops see substantial performance improvements, over 3x in some of my benchmarks. Static scheduling sees a performance bump as well from better avoidance of false sharing issues.
GetThreadNum
has also been optimized, though this is already such a lightweight function that it's hardly noticeable.
Tasking Improvements
Previously, it was possible to spawn tasks from within other tasks, but it was not possible for a task to wait on its child tasks to complete. Now, there is a new implementation of taskwait
which permits you to specify which tasks to wait on, and this version does not act as a barrier if called from within a task. If the default taskwait without arguments is called from within a task, a deadlock is detected and an exception is thrown.
Bug Fixes
Prior to this release, if an exception was thrown from inside a parallel region, it could not be caught from outside the region. Now, exceptions thrown inside parallel regions are properly caught and re-thrown in serial space, allowing for try/catch blocks to be thrown around a parallel region and catch exceptions that happen in parallel space.
Atomics
We have now implemented atomic subtraction for unsigned integer types. Two new methods were added to the Atomic
static class: uint Sub(ref uint, uint)
, and ulong Sub(ref ulong, ulong)
.
Reorganizing
There has been some internal reorganizing. DotMP exceptions have been moved to the DotMP.Exceptions
namespace, and the actual scheduler implementations have been moved to the DotMP.Schedulers
namespace.
What's Changed
- Full rewrite of core scheduler, support custom schedulers by @computablee in #101
- Implement proper exception handling from threads by @computablee in #103
- nuget: bump the fluent group in /DotMP-Tests with 1 update by @dependabot in #102
- Major refactoring by @computablee in #104
- nuget: bump the bench group in /benchmarks/HeatTransfer with 2 updates by @dependabot in #106
- nuget: bump the xunit group in /DotMP-Tests with 3 updates by @dependabot in #105
- Significant performance improvements, new scheduler by @computablee in #107
- Better benchmark, more lenient testing for GitHub Actions by @computablee in #109
- Improve
GetThreadNum
performance by @computablee in #111 - nuget: bump Microsoft.NET.Test.Sdk from 17.7.2 to 17.8.0 in /DotMP-Tests by @dependabot in #112
- nuget: bump the xunit group in /DotMP-Tests with 1 update by @dependabot in #113
- Implement optimized index calculations for collapse(4) and higher by @computablee in #114
- nuget: bump the xunit group in /DotMP-Tests with 5 updates by @dependabot in #116
- Implement atomic subtraction for unsigned integers by @computablee in #118
- Add error checking to ForCollapse by @computablee in #120
- Create symbols for nuget by @computablee in #121
- Optimize #118 by @computablee in #123
- Allow taskwait from within tasks by @computablee in #122
- Add checks for overflow in schedulers by @computablee in #124
Full Changelog: v1.5.0...v1.6.0
DotMP Pre-Release v1.6.0-pre2
This is a second pre-release of v1.6.0. Starting with this pre-release moving forward, we will no longer be providing binaries on GitHub. We recommend using the NuGet package manager. This saves me some time and energy.
Performance Improvements
Index calculations have been thoroughly optimized across the board. v1.6.0-pre1 optimized index calculations for 2D and 3D loops, and pre2 optimizes index calculations for 4D and higher to a significant margin.
GetThreadNum
has also been optimized, though this is already such a lightweight function that it's hardly noticeable.
Atomics
We have now implemented atomic subtraction for unsigned integer types. Two new methods were added to the Atomic
static class: uint Sub(ref uint, uint)
, and ulong Sub(ref ulong, ulong)
. These are not super optimized and could probably be better before the full v1.6.0 release.
Reorganizing
There has been some internal reorganizing. DotMP exceptions have been moved to the DotMP.Exceptions
namespace, and the actual scheduler implementations have been moved to the DotMP.Schedulers
namespace.
Bug fixes
There are a few bug fixes and performance improvements throughout DotMP.
What's Changed
- Improve
GetThreadNum
performance by @computablee in #111 - nuget: bump Microsoft.NET.Test.Sdk from 17.7.2 to 17.8.0 in /DotMP-Tests by @dependabot in #112
- nuget: bump the xunit group in /DotMP-Tests with 1 update by @dependabot in #113
- Implement optimized index calculations for collapse(4) and higher by @computablee in #114
- nuget: bump the xunit group in /DotMP-Tests with 5 updates by @dependabot in #116
- Implement atomic subtraction for unsigned integers by @computablee in #118
- Add error checking to ForCollapse by @computablee in #120
- Create symbols for nuget by @computablee in #121
Full Changelog: v1.6.0-pre1...v1.6.0-pre2
DotMP Pre-Release v1.6.0-pre1
DotMP v1.6.0 isn't planned to incorporate a ton of new features (though there are some!). Rather, it aims to provide a wide plethora of well-needed performance improvements across the board.
This is the first pre-release of v1.6.0.
Licensing Changes
The project license has been changed from MIT to LGPL 2.1.
Changes to the Scheduler API
The schedule
parameter in Parallel.For
and its derivatives now take implementations of the IScheduler
interface instead of a DotMP.Schedule
enum. The changes are fully source-compatible with previous versions, but breaks API and ABI compatibility. There are not only performance benefits to doing this, but in addition, the code is simpler, more modular, more maintainable, more readable, has less duplication, and is expansible by the user.
The IScheduler
interface is public-facing and permits users to implement their own custom schedulers. Details are outlined in the new wiki!
We also introduce a new work-stealing scheduler. The work-stealing scheduler can be accessed via DotMP.Schedule.WorkStealing
, and has been seamlessly integrated into the rest of DotMP. Details are also outlined in the new wiki.
Performance Improvements
There have been minor performance improvements across the board with parallel-for loops. However, collapsed for loops in 2D and 3D see substantial performance improvements, over 3x in some of my benchmarks. Static scheduling sees a performance bump as well from better avoidance of false sharing issues.
Bug Fixes
Prior to this release, if an exception was thrown from inside a parallel region, it could not be caught from outside the region. Now, exceptions thrown inside parallel regions are properly caught and re-thrown in serial space, allowing for try/catch blocks to be thrown around a parallel region and catch exceptions that happen in parallel space.
Planned Improvements
Before DotMP v1.6.0 is fully released, some more improvements are planned:
- Fix issue where tasks can't call
taskwait
without a deadlock, which limits opportunities for recursive parallelism. - Optimize 4+ dimensional collapsed for loops.
- Do in-depth performance tuning across the entire tasking subsystem.
What's Changed
- Full rewrite of core scheduler, support custom schedulers by @computablee in #101
- Implement proper exception handling from threads by @computablee in #103
- nuget: bump the fluent group in /DotMP-Tests with 1 update by @dependabot in #102
- Major refactoring by @computablee in #104
- nuget: bump the bench group in /benchmarks/HeatTransfer with 2 updates by @dependabot in #106
- nuget: bump the xunit group in /DotMP-Tests with 3 updates by @dependabot in #105
- Significant performance improvements, new scheduler by @computablee in #107
- Better benchmark, more lenient testing for GitHub Actions by @computablee in #109
Full Changelog: v1.5.0...v1.6.0-pre1
DotMP Release v1.5.0
This is major release v1.5.0. There are lots of changes here:
Collapsed worksharing-for loops
Several new functions have been added to the Parallel
class, including ForCollapse
, ForReductionCollapse
, ParallelForCollapse
, and ParallelForReductionCollapse
. Each of these has the ability to run n-dimensional collapsed for loops. Collapsed for loops work if you have a situation similar to the following:
for (int i = 0; i < M; i++)
{
for (int j = 0; j < N; j++)
{
doWork(i, j);
}
}
Previously, the only official way to parallelize this loop would have been to parallelize only the outermost loop, leaving the innermost one in serial:
DotMP.Parallel.ParallelFor(0, M, i =>
{
for (int j = 0; j < N; j++)
{
doWork(i, j);
}
});
For many use cases this is a sufficient amount of parallelism, but if the outermost loop has few iterations, it may be beneficial to parallelize across both loops, effectively multiplying the amount of iterations that the scheduler has to work with. This means that larger chunk sizes can be used while maintaining efficient load balancing. The new way to do this is as follows:
DotMP.Parallel.ParallelForCollapse((0, M), (0, N), (i, j) =>
{
doWork(i, j);
});
This allows the DotMP loop schedulers to have M*N
iterations to work with, increasing flexibility while scheduling.
There are overloads allowing this tuple syntax for 2D, 3D, and 4D collapsed for loops. For 5D or higher, you instead pass an array of tuples representing each dimension's start and end indices, and the action takes an array of integers as indices.
New locking API
The locking API has been updated to be more object-oriented. Previously, the following format was used to create, lock, and unlock a lock:
DotMP.Lock l = new DotMP.Lock();
DotMP.Lock.Set(l);
DotMP.Lock.Unset(l);
The new API instead calls methods on the lock object:
DotMP.Lock l = new DotMP.Lock();
l.Set();
l.Unset();
Other changes
There's lots of other changes. A basic summary is here:
- The project has a new logo, thanks to @exrol.
- Lots of documentation issues have been fixed, including a lack of NuGet documentation.
- There have been mountains of bug fixes throughout the project.
- More rigorous error checking throughout the project.
- More rigorous testing on our end.
- Some internal optimizations, refactoring, tidying up, and removing dead code.
Full Changelog
- Chore: Remove dead code by @blouflashdb in #53
- feat: add logo by @exrol in #54
- Update bug-fixes branch by @computablee in #55
- Fix documentation issues by @computablee in #56
- Fix DotMP.Parallel.Single by @blouflashdb in #57
- Merge bug fixes into main by @computablee in #60
- Updated Lock API (Issue #43) by @HarryHeres in #61
- Update README.md with fixed information by @computablee in #62
- Fixed potential rare exception in taskwait by @computablee in #63
- Added full test suite for atomics by @computablee in #65
- Add information for all-contributors by @computablee in #66
- [testing] PR to get all-contributors working by @computablee in #67
- [tributors] contributors/update-2023-10-05 by @github-actions in #68
- Fix linting issues with contributors.yml by @computablee in #69
- actions: bump vsoch/pull-request-action from 1.0.21 to 1.0.24 by @dependabot in #70
- actions: bump actions/checkout from 3 to 4 by @dependabot in #71
- nuget: bump Microsoft.NET.Test.Sdk from 17.3.2 to 17.7.2 in /DotMP-Tests by @dependabot in #72
- nuget: bump xunit.analyzers from 1.0.0 to 1.3.0 in /DotMP-Tests by @dependabot in #73
- nuget: bump xunit.runner.visualstudio from 2.4.5 to 2.5.1 in /DotMP-Tests by @dependabot in #74
- nuget: bump FluentAssertions from 6.7.0 to 6.12.0 in /DotMP-Tests by @dependabot in #75
- nuget: bump xunit.core from 2.4.2 to 2.5.1 in /DotMP-Tests by @dependabot in #76
- nuget: bump FluentAssertions.Analyzers from 0.17.2 to 0.25.0 in /DotMP-Tests by @dependabot in #79
- Refactoring, tidying up codebase by @computablee in #81
- Add more tests for reductions and bug fix by @computablee in #82
- Add test for
Parallel.MasterTaskloop
by @computablee in #83 - Dependabot groups and a PR template by @Skenvy in #84
- [tributors] contributors/update-2023-10-09 by @github-actions in #85
- Features by @computablee in #86
- Bug fixes, better test coverage by @computablee in #88
- Better testing, exception handling, documentation, remove dead code by @computablee in #89
- Fix issues with README by @computablee in #90
- Remove dead code by @computablee in #91
- nuget: bump the xunit group in /DotMP-Tests with 5 updates by @dependabot in #93
- Worksharing-For Optimizations by @computablee in #94
- nuget: bump the xunit group in /DotMP-Tests with 3 updates by @dependabot in #97
- Validate DotMP input parameters by @computablee in #98
New Contributors
- @blouflashdb made their first contribution in #53
- @exrol made their first contribution in #54
- @HarryHeres made their first contribution in #61
- @Skenvy made their first contribution in #84
Full Changelog: v1.4.1...v1.5.0
DotMP Release v1.4.1
This is a minor release. This update adds support for .NET 7.0. There were no necessary changes to the codebase (hooray!) but the build pipeline is changed slightly. This does mean that future collaborators will need to have both the .NET 6.0 SDK and the .NET 7.0 SDK.
DotMP Release v1.4.0
This is a major release.
Changelog:
- The
Locking
andLock
classes have been merged. Now there is justLock
. Shared<T>
now implementsIDisposable
and may be used within ausing
block.SharedEnumerable<T>
now exists.- Created factory classes for
Shared<T>
andSharedEnumerable<T>
. - Added a K-nearest-neighbors example.
DotMP.Parallel.Schedule
is nowDotMP.Schedule
.- Added a tasking system, including the
DotMP.Parallel.Task
,DotMP.Parallel.Taskloop
,DotMP.Parallel.Taskwait
methods as part of the public-facing API. DotMP.Parallel.Section
has been removed, and the API forDotMP.Parallel.Sections
has been changed.- Better documentation.
- Better code organization.
- Better testing.
DotMP Release v1.3.0
This release comes just hours after v1.2.1, but comes with a big change--the project has been renamed from OpenMP.NET to DotMP. The previous releases have been renamed to DotMP, but the provided .dll
/.pdb
and source code still reflects the old branding. The reason for the change is to abide by OpenMP's trademark usage guidelines, since I hope to progress to publishing on Nuget soon.
There is one new feature-- the DotMP.Parallel.Schedule.Runtime
schedule. The Runtime
schedule allows the schedule to be set via an environment variable (e.g., dynamic,32
for Dynamic scheduling with a chunk size of 32).
DotMP Release v1.2.1
This release doesn't change much with the actual .dll
and .pdb
files, but does add significant documentation across the entire codebase. Doxygen can now generate documentation from the source code, and I've added a Makefile
for easy building of the source yourself.