Minor release v1.1
Changelog for 1.1:
- Increase default problem size to almost 4GB to compensate for OpenMP overhead.
- Turn on streaming stores always for Intel toolchain
- Explicitly set static scheduling for OMP for loops
- Add golang version in util
- Add single file versions (C and Fortran) for teaching
- Improve LIKWID instrumentation