Skip to content

Commit

Permalink
Merge pull request #200 from intel/develop
Browse files Browse the repository at this point in the history
Develop
  • Loading branch information
chuckyount authored Feb 15, 2019
2 parents e4043ba + 9525adf commit 5c91f46
Show file tree
Hide file tree
Showing 27 changed files with 666 additions and 398 deletions.
11 changes: 7 additions & 4 deletions README.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# YASK--Yet Another Stencil Kernel

* New YASK users may want to start with the [YASK tutorial](https://www.ixpug.org/components/com_solutionlibrary/assets/documents/1538169451-IXPUG_Fall_Conf_2018_paper_2%20-%20Rev3%20-%20Charles%20Yount.pdf).
* New YASK users may want to start with the [YASK tutorial](docs/YASK-tutorial.pdf).
* Existing YASK users may want to jump to the [backward-compatibility notices](#backward-compatibility-notices).

## Overview
Expand All @@ -25,12 +25,12 @@ YASK contains a domain-specific compiler to convert scalar stencil code to SIMD-
for multi-socket and multi-node operation or
Intel(R) Parallel Studio XE Composer Edition for C++ Linux
for single-socket only
(2016 or later, 2018 update 2 or later recommended).
(2018 or later; 2019 or later recommended and required when using g++ 8 or later).
Building a YASK kernel with the Gnu compiler is possible, but only useful
for functional testing. The performance
of the kernel built from the Gnu compiler has been observed to be up to 7x lower
than the same kernel built using the Intel compiler.
* Gnu C++ compiler, g++ (4.9.0 or later; 6.1.0 or later recommended).
* Gnu C++ compiler, g++ (4.9.0 or later; 8.2.0 or later recommended).
* Linux libraries `librt` and `libnuma`.
* Perl (5.010 or later).
* Awk.
Expand All @@ -45,7 +45,7 @@ YASK contains a domain-specific compiler to convert scalar stencil code to SIMD-
Reading the generated code is only necessary for debug or curiosity.
* SWIG (3.0.12 or later),
http://www.swig.org, for creating the Python interface.
* Python 2 (2.7.5 or later) or 3 (3.6.1 or later, recommended),
* Python 2 (2.7.5 or later) or 3 (3.6.1 or later),
https://www.python.org/downloads, for creating and using the Python interface.
* Doxygen (1.8.11 or later),
http://doxygen.org, for creating updated API documentation.
Expand All @@ -58,6 +58,9 @@ YASK contains a domain-specific compiler to convert scalar stencil code to SIMD-
for functional testing if you don't have native support for any given instruction set.

### Backward-compatibility notices:
* Version 2.18.00 added the ability to specify the global-domain size, and it will calculate the local-domain sizes from it.
There is no longer a default local-domain size.
Output changed terms "overall-problem" to "global-domain" and "rank-domain" to "local-domain".
* Version 2.17.00 determined the host architecture in `make` and `bin/yask.sh` and number of MPI ranks in `bin/yask.sh`.
This changed the old behavior of `make` defaulting to `snb` architecture and `bin/yask.sh` requiring `-arch` and `-ranks`.
Those options are still available to override the host-based default.
Expand Down
Binary file added docs/YASK-tutorial.pdf
Binary file not shown.
106 changes: 75 additions & 31 deletions include/yk_solution_api.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -134,40 +134,92 @@ namespace yask {
virtual std::vector<std::string>
get_misc_dim_names() const =0;

/// Set the size of the solution domain for this rank.
/// Set the local-domain size in the specified dimension, i.e., the size of the part of the domain that is in this rank.
/**
The domain defines the number of elements that will be evaluated with the stencil(s).
If MPI is not enabled, this is the entire problem domain.
If MPI is enabled, this is the domain for the current rank only,
and the problem domain consists of the sum of all rank domains
in each dimension (weak-scaling).
The domain size in each rank does not have to be the same, but
all domains in the same column must have the same width,
all domains in the same row must have the same height,
If MPI is not enabled, this is equivalent to the global-domain size.
If MPI is enabled, this is the domain size for the current rank only,
and the global-domain size is the sum of all local-domain sizes
in each dimension.
The local-domain size in each rank does not have to be the same, but
all local-domains in the same column of ranks must have the same width,
all local-domains in the same row must have the same height,
and so forth, for each domain dimension.
The domain size does *not* include the halo area or any padding.
For best performance, set the rank domain
The local-domain size does *not* include the halo area or any padding.
For best performance, set the local-domain
size to a multiple of the number of elements in a vector-cluster in
each dimension whenever possible.
each dimension.
You should set either the local-domain size or the global-domain size
in each dimension. The unspecified (zero) sizes will be calculated based on the
specified ones when prepare_solution() is called.
Setting the local-domain size to a non-zero value will clear the
global-domain size in that dimension until prepare_solution() is called.
See the "Detailed Description" for \ref yk_grid for more information on grid sizes.
There is no domain-size setting allowed in the
solution-step dimension (usually "t").
solution-step dimension (e.g., "t").
*/
virtual void
set_rank_domain_size(const std::string& dim
/**< [in] Name of dimension to set. Must be one of
the names from get_domain_dim_names(). */,
idx_t size /**< [in] Elements in the domain in this `dim`. */ ) =0;

/// Get the domain size for this rank.
/// Get the local-domain size in the specified dimension, i.e., the size in this rank.
/**
See documentation for set_rank_domain_size().
If you have called set_overall_domain_size() in a given dimension,
get_rank_domain_size() will return zero in that dimension until
prepare_solution() is called. After prepare_solution() is called,
the computed size will be returned.
@returns Current setting of rank domain size in specified dimension.
*/
virtual idx_t
get_rank_domain_size(const std::string& dim
/**< [in] Name of dimension to get. Must be one of
the names from get_domain_dim_names(). */) const =0;

/// Get the global-domain size in the specified dimension, i.e., the total size across all MPI ranks.
/**
You should set either the local-domain size or the global-domain size
in each dimension. The unspecified (zero) sizes will be calculated based on the
specified ones when prepare_solution() is called.
Setting the global-domain size to a non-zero value will clear the
local-domain size in that dimension until prepare_solution() is called.
See documentation for set_rank_domain_size().
See the "Detailed Description" for \ref yk_grid for more information on grid sizes.
There is no domain-size setting allowed in the
solution-step dimension (e.g., "t").
*/
virtual void
set_overall_domain_size(const std::string& dim
/**< [in] Name of dimension to set. Must be one of
the names from get_domain_dim_names(). */,
idx_t size /**< [in] Elements in the domain in this `dim`. */ ) =0;

/// Get the global-domain size in the specified dimension, i.e., the total size across all MPI ranks.
/**
The global-domain indices in the specified dimension will range from
zero (0) to get_overall_domain_size() - 1, inclusive.
Call get_first_rank_domain_index() and get_last_rank_domain_index()
to find the subset of this domain in each rank.
If you have called set_rank_domain_size() in a given dimension,
get_overall_domain_size() will return zero in that dimension until
prepare_solution() is called. After prepare_solution() is called,
the computed size will be returned.
@returns Sum of all ranks' domain sizes in the given dimension.
*/
virtual idx_t
get_overall_domain_size(const std::string& dim
/**< [in] Name of dimension to get. Must be one of
the names from get_domain_dim_names(). */ ) const =0;

/// Set the block size in the given dimension.
/**
This sets the approximate number of elements that are evaluated in
Expand Down Expand Up @@ -208,8 +260,16 @@ namespace yask {

/// Set the number of MPI ranks in the given dimension.
/**
The *product* of the number of ranks across all dimensions must
equal yk_env::get_num_ranks().
If set_num_ranks() is set to a non-zero value in all
dimensions, then
the *product* of the number of ranks across all dimensions must
equal the value returned by yk_env::get_num_ranks().
If the number of ranks is zero in one or more
dimensions, those values will be set by a heuristic when
prepare_solution() is called.
An exception will be thrown if no legal values are possible
given the specified (non-zero) values.
The curent MPI rank will be assigned a unique location
within the overall problem domain based on its MPI rank index.
Or, you can set it explicitly via set_rank_index().
Expand Down Expand Up @@ -356,22 +416,6 @@ namespace yask {
/**< [in] Name of dimension to get. Must be one of
the names from get_domain_dim_names(). */ ) const =0;

/// Get the overall problem size in the specified dimension.
/**
The overall domain indices in the specified dimension will range from
zero (0) to get_overall_domain_size() - 1, inclusive.
Call get_first_rank_domain_index() and get_last_rank_domain_index()
to find the subset of this domain in each rank.
@note This function should be called only *after* calling prepare_solution()
because prepare_solution() obtains the sub-domain sizes from other ranks.
@returns Sum of all ranks' domain sizes in the given dimension.
*/
virtual idx_t
get_overall_domain_size(const std::string& dim
/**< [in] Name of dimension to get. Must be one of
the names from get_domain_dim_names(). */ ) const =0;

/// Run the stencil solution for the specified steps.
/**
The stencil(s) in the solution are applied to the grid data, setting the
Expand Down
3 changes: 3 additions & 0 deletions src/common/common.mk
Original file line number Diff line number Diff line change
Expand Up @@ -70,6 +70,9 @@ PERL := perl
MKDIR := mkdir -p -v
BASH := bash

# Options to avoid warnings when compiling SWIG-generated code.
SWIG_CXXFLAGS := -Wno-class-memaccess -Wno-stringop-overflow -Wno-stringop-truncation

# Find include path needed for python interface.
# NB: constructing string inside print() to work for python 2 or 3.
PYINC := $(addprefix -I,$(shell $(PYTHON) -c 'import distutils.sysconfig; print(distutils.sysconfig.get_python_inc() + " " + distutils.sysconfig.get_python_inc(plat_specific=1))'))
Expand Down
2 changes: 1 addition & 1 deletion src/common/common_utils.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -46,7 +46,7 @@ namespace yask {
// for numbers above 9 (at least up to 99).

// Format: "major.minor.patch".
const string version = "2.17.00";
const string version = "2.18.00";

string yask_get_version_string() {
return version;
Expand Down
2 changes: 1 addition & 1 deletion src/common/tuple.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -239,7 +239,7 @@ namespace yask {
// For some reason, copying *this and erasing
// the element in newt._q causes an exception.
Tuple newt;
for (int i = 0; i < size(); i++) {
for (int i = 0; i < getNumDims(); i++) {
if (i != posn)
newt.addDimBack(getDimName(i), getVal(i));
}
Expand Down
11 changes: 7 additions & 4 deletions src/common/tuple.hpp
Original file line number Diff line number Diff line change
Expand Up @@ -162,15 +162,15 @@ namespace yask {

public:
Tuple() {}
~Tuple() {}
~Tuple() {} // NOT a virtual class!

// first-inner (first dim is unit stride) accessors.
bool isFirstInner() const { return _firstInner; }
void setFirstInner(bool fi) { _firstInner = fi; }

// Query number of dims.
int size() const {
return int(_q.size());
size_t size() const {
return _q.size();
}
int getNumDims() const {
return int(_q.size());
Expand Down Expand Up @@ -328,7 +328,7 @@ namespace yask {
// extra values are ignored. If there are fewer values in 'vals'
// than 'this', only the number of values supplied will be updated.
void setVals(int numVals, const T vals[]) {
int end = int(std::min(numVals, size()));
int end = std::min(numVals, int(_q.size()));
for (int i = 0; i < end; i++)
setVal(i, vals[i]);
}
Expand Down Expand Up @@ -553,6 +553,9 @@ namespace yask {
Tuple negElements() const {
return mapElements([&](T in){ return -in; });
}
Tuple absElements() const {
return mapElements([&](T in){ return abs(in); });
}

// make string like "4x3x2" or "4, 3, 2".
std::string makeValStr(std::string separator=", ",
Expand Down
8 changes: 4 additions & 4 deletions src/compiler/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -74,17 +74,17 @@ YC_LFLAGS := -lrt -Wl,-rpath=$(LIB_OUT_DIR) -L$(LIB_OUT_DIR) -l$(YC_BASE)

$(YC_OBJ_DIR)/%.o: $(YC_STENCIL_DIR)/%.cpp $(YC_INC_GLOB) $(YC_STENCIL_INC_GLOB)
$(MKDIR) $(YC_OBJ_DIR)
$(CXX_PREFIX) $(YC_CXX) $(YC_CXXFLAGS) -DUSE_INTERNAL_DSL -O0 -c -o $@ $<
$(CXX_PREFIX) $(YC_CXX) $(YC_CXXFLAGS) -x c++ -DUSE_INTERNAL_DSL -O0 -c -o $@ $<
@ls -l $@

$(YC_OBJ_DIR)/%.o: $(YC_LIB_SRC_DIR)/%.cpp $(YC_INC_GLOB)
$(MKDIR) $(YC_OBJ_DIR)
$(CXX_PREFIX) $(YC_CXX) $(YC_CXXFLAGS) -fPIC -c -o $@ $<
$(CXX_PREFIX) $(YC_CXX) $(YC_CXXFLAGS) -x c++ -fPIC -c -o $@ $<
@ls -l $@

$(YC_OBJ_DIR)/%.o: $(COMM_DIR)/%.cpp $(YC_INC_GLOB)
$(MKDIR) $(YC_OBJ_DIR)
$(CXX_PREFIX) $(YC_CXX) $(YC_CXXFLAGS) -fPIC -c -o $@ $<
$(CXX_PREFIX) $(YC_CXX) $(YC_CXXFLAGS) -x c++ -fPIC -c -o $@ $<
@ls -l $@

######## Primary targets.
Expand Down Expand Up @@ -127,7 +127,7 @@ $(YC_SWIG_OUT_DIR)/yask_compiler_api_wrap.cpp: $(YC_SWIG_DIR)/yask*.i $(INC_DIR)
# https://github.com/swig/swig/issues/773
$(YC_OBJ_DIR)/yask_compiler_api_wrap.o: $(YC_SWIG_OUT_DIR)/yask_compiler_api_wrap.cpp
$(MKDIR) $(YC_OBJ_DIR)
$(CXX_PREFIX) $(YC_CXX) $(YC_CXXFLAGS) -DNDEBUG $(PYINC) -fPIC -c -o $@ $<
$(CXX_PREFIX) $(YC_CXX) $(YC_CXXFLAGS) -x c++ $(SWIG_CXXFLAGS) -DNDEBUG $(PYINC) -fPIC -c -o $@ $<
@ls -l $@

$(YC_PY_LIB): $(YC_OBJS) $(YC_OBJ_DIR)/yask_compiler_api_wrap.o
Expand Down
11 changes: 4 additions & 7 deletions src/compiler/lib/Grid.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -200,13 +200,10 @@ namespace yask {

// Can fold if ALL fold dims >1 are used in this grid.

#if 1
// NB: this will always be true if there is no vectorization.
// We do this because the compiler expects stencils to be vectorizable.
_isFoldable = _numFoldableDims == dims._foldGT1.size();
#else
_isFoldable = (_numFoldableDims > 0 ) && (_numFoldableDims == dims._foldGT1.size());
#endif
// NB: this will always be true if there is no vectorization, i.e.,
// both are zero. We do this because the compiler expects stencils
// to be vectorizable.
_isFoldable = _numFoldableDims == int(dims._foldGT1.size());
}

// Determine whether halo sizes are equal.
Expand Down
13 changes: 3 additions & 10 deletions src/kernel/Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -63,7 +63,6 @@ else ifeq ($(stencil),cube)
else ifneq ($(findstring iso3dfd,$(stencil)),)
MACROS += MAX_EXCH_DIST=1
radius := 8
def_rank_args := -d 1024
def_pad_args := -ep 1
ifeq ($(arch),knl)
fold_4byte := x=2,y=8
Expand Down Expand Up @@ -92,7 +91,6 @@ else ifneq ($(findstring iso3dfd,$(stencil)),)
else ifneq ($(findstring awp,$(stencil)),)
def_block_args := -b 32
YC_FLAGS += -min-es 1
def_rank_args := -d 1024 -dz 128
def_pad_args := -ep 1
ifeq ($(arch),knl)
fold_4byte := x=4,y=4
Expand All @@ -117,16 +115,13 @@ else ifneq ($(findstring awp,$(stencil)),)
endif

else ifneq ($(findstring ssg,$(stencil)),)
def_rank_args := -d 512
ifneq ($(filter $(arch),skx skl clx),)
def_rank_args := -d 640 -dx 320
fold_4byte := x=4,y=4
def_block_args := -bx 96 -by 16 -bz 80
def_block_threads := 2
endif

else ifneq ($(findstring fsg,$(stencil)),)
def_rank_args := -d 256
ifeq ($(arch),knl)
omp_region_schedule := guided
def_block_args := -b 16
Expand All @@ -143,7 +138,6 @@ else ifneq ($(findstring fsg,$(stencil)),)
else ifeq ($(stencil),tti)
MACROS += MAX_EXCH_DIST=3
radius := 2
def_rank_args := -d 512
ifneq ($(filter $(arch),skx skl clx),)
fold_4byte := x=4,y=4
def_block_args := -bx 80 -by 16 -bz 40
Expand Down Expand Up @@ -231,7 +225,6 @@ omp_block_schedule ?= static,1
omp_misc_schedule ?= guided
def_thread_divisor ?= 1
def_block_threads ?= 2
def_rank_args ?= -d 128
def_block_args ?= -b 64
cluster ?= x=1
pfd_l1 ?= 0
Expand Down Expand Up @@ -433,7 +426,7 @@ MACROS += ALLOW_NEW_GRIDS=$(allow_new_grid_types)
# Default cmd-line args.
DEF_ARGS += -thread_divisor $(def_thread_divisor)
DEF_ARGS += -block_threads $(def_block_threads)
DEF_ARGS += $(def_rank_args) $(def_block_args) $(def_pad_args) $(more_def_args)
DEF_ARGS += $(def_block_args) $(def_pad_args) $(more_def_args)
YK_CXXFLAGS += -DDEF_ARGS='"$(DEF_ARGS) $(EXTRA_DEF_ARGS)"'

# arch.
Expand Down Expand Up @@ -710,7 +703,7 @@ $(YK_SWIG_OUT_DIR)/yask_kernel_api_wrap.cpp: $(YK_SWIG_DIR)/yask*.i $(INC_DIR)/*

$(YK_SWIG_OUT_DIR)/yask_kernel_api_wrap.o: $(YK_SWIG_OUT_DIR)/yask_kernel_api_wrap.cpp
$(MKDIR) $(dir $@)
$(CXX_PREFIX) $(YK_CXX) $(YK_CXXFLAGS) $(PYINC) -fPIC -c -o $@ $<
$(CXX_PREFIX) $(YK_CXX) $(YK_CXXFLAGS) -x c++ $(SWIG_CXXFLAGS) $(PYINC) -fPIC -c -o $@ $<
@ls -l $@

$(YK_PY_LIB): $(YK_OBJS) $(YK_EXT_OBJS) $(YK_SWIG_OUT_DIR)/yask_kernel_api_wrap.o
Expand Down Expand Up @@ -1016,4 +1009,4 @@ help:
echo "Example builds with test runs:"; \
echo " $(MAKE) -j all # Normal full API and stencil tests"; \
echo " $(MAKE) -j all YK_CXXOPT=-O2 YK_CXX=g++ mpi=0 ranks=1 # g++ w/o MPI"; \
echo " $(MAKE) -j all YK_CXXOPT=-O1 ranks=3 check=1 # Run 3 ranks w/checking"
echo " $(MAKE) -j all YK_CXXOPT=-O1 ranks=4 check=1 # Run 4 ranks w/checking"
Loading

0 comments on commit 5c91f46

Please sign in to comment.