Release 0.9.0 · ddemidov/amgcl

Use NUMA-friendly internal data structures.
This shows measurable speed-up on NUMA systems.
Allow asynchronous amg setup.
amgcl::amg constructor starts the setup process in a new thread. As soon
as constructor returns, the instance is ready to be used as a
preconditioner. Initially its just a single-level smoother, but when as
the new (coarser) levels are constructed, they are put to use.
In case of GPGPU backends, this should allow to overlap work between
host CPU doing setup and the compute device doing the solution. In some
cases a 2x speedup of the overall solution has been achieved.
Allow limiting number of amg levels, thus supporting using relaxation
for coarse solves.
Rewrite lgmres and fgmres in terms of Givens rotations, which should
work better with complex problems, see #34.
Use new, more effective, sparse matrix format in VexCL backend and
allow to use non-scalar values with the backend.
Modernize cmake scripts.
Provide amgcl::amgcl imported target, so that users may just
```
find_package(amgcl)
add_executable(myprogram myprogram.cpp)
target_link_libraries(myprogram amgcl::amgcl)
```
to build a program using amgcl. The imported target brings necessary
compile and link options automatically.
Replace boost.python with pybind11 and
improve python interface.
Unify example codes for different backends.
Minor improvements and bug fixes

Provide feedback