In Homework #3 we will write come OpenMP code to numerically evaluate integrals using various integration algorithms.
Repository layout: note that there are some slight changes from Homework #2.
-
homework3/
:Python wrapper of your C library. The functions defined here allow you to use your C code from within Python. The test suite calls these functions which, in turn, call your C functions. DO NOT MODIFY. Any modifications will be ignored / overwritten when the homework is submitted.
Please read the docstrings in this file to see how to call the wrapper and what kind of information is expected to be returned. Compare with the signature of the underlying C function.
-
include/
:The library headers containing (1) the C function prototypes and (2) the C function documentation. Document your C functions here.
-
lib/
:The compiled C code will be placed here as a shared object library named
libhomework3.so
. -
report/
:Place your
report.pdf
here. See the "Report" section below. -
src/
:C library source files. This is where you will provide the bodies of the corresponding functions requested below.
Do not change the way in which these functions are called. Doing so will break the Python wrappers located in
homework3/wrappers.py
. You may, however, write as many helper functions as you need.Aside from writing your own tests and performing computations for your report, everything you need to write for this homework will be put in the files here.
-
ctests/
:Directory in which to place any optional C code used to debug and test your library as well as practice compiling and linking C code. See the file
ctests/example.c
for more information and on how to compile. -
Makefile
: See "Compiling and Testing" below. -
test_homework3.py
: A sample test suite for your own testing purposes.
The makefile for this homework assignment has been provided for you. It will
compile the source code located in src/
and create a shared object library
lib/libhomework3.so
. You can also use this Makefile to compile all of the
scripts you write in the directory, ctests
.
Run,
$ make lib
to create lib/libhomework3.so
. This library must exist in order for the Python
wrappers to work. As a shortcut, running
$ make test
will perform
$ make lib
$ python test_homework3.py
Run,
$ make example.out
to compile the script ctests/example.c
into an executable called
example.out
. Copy the syntax of the corresponding make command if you would
like to write more C scripts.
In this assignment we will implement several numerical integration techniques in serial and in parallel. Given a function, f, and some x-data points, form the arrays
x = [x_0, x_1, ..., x_{N-1}]
fvals = [f(x_0), f(x_1), ..., f(x_{N-1})]
Using these data one can approximate the integral of f from x_0
to x_{N-1}
using the following formulae
Scipy has built-in functions
scipy.linalg.trapz
and
scipy.linalg.simps
which, given an x-data set and function evaluations, will use these rules to
estimate the integral. For example,
>>> from numpy import linspace, sin, pi
>>> from scipy.integrate import trapz, simps
>>> x = linspace(-1,pi,8) # eight, equally-spaced points between -1 and pi
>>> fvals = sin(x)
>>> integral_trapz = trapz(fvals, x)
>>> integral_simps = simps(fvals, x, even='first') # see simps documentation
In this assignment we will write serial and parallel versions of these algorithms. Additionally, we will perform some parameter tuning on our parallel implementation of Simpson's rule. In particular, you are to provide the definitons of the below functions.
Implement the functions declared in include/integrate.h
and defined in
src/integrate.c
. In each of these functions, fvals
is a length N
array of
function evaluations of type double
on the elements of x
, a length N
array
of domain points of type double
. These are fed in from either Python via Numpy
arrays or your test scripts in the directory ctests
.
-
double trapz_serial(double* fvals, double* x, int N)
Approximates the integral of f using the trapezoidal rule.
-
double trapz_parallel(double* fvals, double* x, int N, int num_threads)
Approximates the integral of f using the trapezoidal rule in parallel. The number of threads to be used should be an argument to the function.
-
double simps_serial(double* fvals, double* x, int N)
Approximates the integral of f using Simpson's rule. Note that there are two cases to consider: the "nice" case when
N
is odd and the "not nice" case whenN
is even. Once you understand why there is an issue in theN
is even case resolve the issue by taking a single trapezoidal rule approximation at the end of the data array to capture the "missing" part of the integral.(Hint: read the
scipy.linalg.simps
documentation. In particular, read about theeven
keyword.) -
double simps_parallel(double* fvals, double* x, int N, int num_threads)
Approximates the integral of f using the trapezoidal rule in parallel. The number of threads to be used should be an argument to the function.
-
double simps_parallel_chunked(double* fvals, double* x, int N, int num_threads, int chunk_size)
Approximates the integral of f using the trapezoidal rule in parallel. The number of threads to be used should be an argument to the function. In this implemenation you should use a
#pragma omp for
construct so you can pass in a chunk size to the loop thread scheduler. (Or write one manually. See Lecture 11 and 12.)You do not need to use
#pragma omp for
in your implementation ofsimps_parallel
but you may if you want to.
Your implementations will be run through the following test suite. Tests will
use an x
where distance between subsequent points, x[i+1]-x[i]
, is not
necessarily uniform. In particular, tests will be run on domains that look
something like:
x = numpy.append(linspace(-1,0,8,endpoint=False), linspace(0,3,127))
See
Issue #7
for details on why we use domains of this form. In short, each Simpson's
subinterval needs the same dx
. However, different Simpson's rule subintervals
are allowed to have different dx.
trapz
Tests:
- (1pt) Check the definition of
trapz_serial
using data arrays of length two. - (1pt) Test if
trapz_serial
matches (up to close to machine precision) the output ofscipy.integrate.trapz
for variousN
. - (1pt) Check the definition of
trapz_parallel
using data arrays of length two. - (1pt) Test if
trapz_parallel
matches (up to close to machine precision) the output ofscipy.integrate.trapz
for variousN
.
simps
Tests:
- (1pt) Check the definition of
simps_serial
using data arrays of length three. - (1pt) Test if
simps_serial
matches (up to close to machine precision) the output ofscipy.integrate.simps
for variousN
odd. (This is the "nice" case.) - (1pt) Test if
simps_serial
matches (up to close to machine precision) the output ofscipy.integrate.simps
for variousN
even. (This is the "not nice" case where an extra trapezoidal rule approximation needs to be done at the end of the data. See thescipy.linalg.simps
documentation.) - (1pt) Check the definition of
simps_parallel
using data arrays of length three. - (1pt) Test if
simps_parallel
matches (up to close to machine precision) the output ofscipy.integrate.simps
for variousN
odd. - (1pt) Test if
simps_parallel
matches (up to close to machine precision) the output ofscipy.integrate.simps
for variousN
even.
Chunked simps
Tests:
- (1pt) Check the definition of
simps_parallel_chunked
using data arrays of length three. - (2pt) Test if
simps_parallel_chunked
matches (up to close to machine precision) the output ofscipy.integrate.simps
for variousN
odd and various values ofchunk_size
. - (2pt) Test if
simps_parallel_chunked
matches (up to close to machine precision) the output ofscipy.integrate.simps
for variousN
even and various values ofchunk_size
.
Total: (15 / 15pts)
Note that in this homework assignment we will manually verify that the
parallel
versions of your algorithms indeed implement some sort of parallel
structure. You will recieve zero points on the parallel
tests if you only
implement the serial version of the code.
Produce a PDF document named report.pdf
in the directory report
of the
repository. Please write your name and UW NetID (e.g. "cswiercz", not your
student ID number) at the top of the document. The report should contain
responses to the following questions:
-
(12/20) Create a plot of parallel loop chunk sizes vs. corresponding runtime using chunked parallel Simpson's method. In particular, create a
semilogx
plot of timings ofsimps_parallel_chunked
with chunk size on horizontal axis and runtime on the vertical axis, usinghomework3.time_simps_parallel_chunked
with the following data:N = 2**20 x = numpy.linspace(-1,3,N) y = sin(exp(x)) chunk_sizes = [2**0, 2**2, 2**4, 2**6, 2**8, ..., 2**20]
- (2pt) In black, timings with
num_threads = 1
. - (2pt) In green, timings with
num_threads = 2
. - (2pt) In blue, timings with
num_threads = 4
. - (2pt) In red, timings
num_threads = 8
. - (2pt) Plot uses a logarithmic scale on the x-axis.
- (1pt) Both x- and y-axes are labeled appropriately.
- (1pt) Plot has a meaningful title.
Note that you may want to start with a smaller value of
N
while you're experimenting with getting the plot just right. Also, try generating several versions of the plot in case there is some system noise causing erradic timing behavior. - (2pt) In black, timings with
-
(8/20) Let's analyze this plot. In particular, provide comments on the following questions.
- (2pt) What can be attributed to the
nthreads = 8
behavior when thechunk_size
is small? - (2pt) What happens when the
chunk_size
is equal to the problem size (i.e. whenchunk_size = 2**20 = N
) and why does this affect the timings in the way that the plot suggests? - (2pt) Why is
chunk_size
relevant to this implementation of Simpson's rule as opposed to the simple example we did in class during Lecture 12? (5 May, 2016?) - (2pt) What range of chunk values appear to be optimal for this particular
value of
N
in parallel Simpson? Can you conjecture an ideal chunk value as a function ofN
. If so, why? If not, why not? Note that this "optimal" chunk size may not be the same for non-Simpson's rule problems.
- (2pt) What can be attributed to the
Provide documentation for the function prototypes listed in all of the files in
include
following the formatting described in the
Grading document.
We will time your implementation of the function simps_parallel
via the
function time_simps_parallel
with num_threads = 4
for some value N
between
1e7
and 1e18
.
N = # approximately 2**7 and approximately 2**18
x = numpy.linspace(-1,3,N)
y = sin(exp(x))