Skip to content

Latest commit

 

History

History
799 lines (561 loc) · 28.2 KB

README.adoc

File metadata and controls

799 lines (561 loc) · 28.2 KB

PARSEC Benchmark

PARSEC http://parsec.cs.princeton.edu/ 3.0-beta-20150206 ported to Ubuntu 22.04 and SPLASH2 ported to Buildroot 2017.08 cross compilation (ARM, MIPS, etc.). This repo intends to support all build types and benchmarks.

As of November 2023, the official Princeton website was down. The last working archive was from September 22, 2023. It is unclear if they retired it or if it is a bug. If they retired it, OMG. For this reason, we re-uploaded their original test data to: https://github.com/cirosantilli/parsec-benchmark/releases/tag/3.0 The scripts in this repository automatically download and use those blobs.

./configure

Before doing anything else, you must get the parecmgmt command with:

. env.sh

Build all:

parsecmgmt -a build -p all

-a means "action", i.e. which action to take for the selected packages.

Build just the ferret benchmark:

parsecmgmt -a build -p ferret

Build all SPLASH2 benchmarks:

parsecmgmt -a build -p splash2x

Build just one SPLASH2 benchmark:

parsecmgmt -a build -p splash2x.barnes

List all benchmarks:

parsecmgmt -a info

Run one splash2 benchmark with the test Input size:

parsecmgmt -a run -p splash2x.barnes -i test

Non-splash 2:

parsecmgmt -a run -p netdedup

For some reason, the splash2 version (without the X) does not have any test data besides -i test, making it basically useless. So just use the X version instead. TODO why? Can we just remove it then? When running splash2, it says:

NOTE: SPLASH-2 only supports "test" input sets.

so likely not a bug.

Run all packages with the default test input size:

parsecmgmt -a run -p all

Not every benchmark has every input size, e.g. splash2.barnes only has test input inside of core and input-sim

TODO runs all sizes, or just one default size:

parsecmgmt -a run -p splash2x

Run one splash2 benchmark with one Input size, listed in by increasing size:

parsecmgmt -a run -p splash2x.barnes -i test
parsecmgmt -a run -p splash2x.barnes -i simdev
parsecmgmt -a run -p splash2x.barnes -i simsmall
parsecmgmt -a run -p splash2x.barnes -i simmedium
parsecmgmt -a run -p splash2x.barnes -i simlarge
parsecmgmt -a run -p splash2x.barnes -i native
  • test means: just check the code might be working, but don’t stress. Inputs come with the smallest possible distribution file parsec-3.0-core.tar.gz (112 MB zipped, also contains sources), and are tiny sanity checks as the name suggests. We have however removed them from this repo, since they are still blobs, and blobs are evil.

  • sim* are different sizes for running in simulators such as gem5 for example. Simulators are slower than real hardware, so the tests have to be limited in size.

    Inputs are present in the separate parsec-3.0-input-sim.tar.gz file (468 MB zipped), which we download by default on ./configure.

  • native means suitable for benchmarking real hardware. It is therefore the largest input. We do not download the native inputs by default on ./configure because it takes several minutes. To download native inputs, run:

    ./get-inputs -n

    which also downloads parsec-3.0-input-native.tar.gz (2.3 GB zipped, 3.1 GiB unzipped, apparent size: 5.5 GiB), indicating that there are some massively sparse files present. It appears that most .tar are highly sparse for some reason.

The original README explains how input sizes were originally dosaged:

All inputs except 'test' and 'simdev' can be used for performance analysis. As a rough guideline, on a Pentium 4 processor with 3.0 GHz you can expect approximately the following execution times:

  • test: almost instantaneous

  • simdev: almost instantaneous

  • simsmall: 1s

  • simsmall: 3 - 5s

  • simlarge: 12 - 20s

  • native: 10 - 30min

One of the most valuable things parsec offers is that it instruments the region of interest of all benchmarks with:

__parsec_roi_begin

That can then be overridden for different targets to check time, cache state, etc. on the ROI:

  • on simulators you could use magic instruction TODO link to the GEM5 one.

  • on real systems you could use syscalls, instructions or other system interfaces to get the data

Some rebuilds after source changes with -a build are a bit broken. E.g. a direct:

parsecmgmt -a build -p ferret

doesn’t do anything if you have modified sources. Also, trying to clean first still didn’t work:

parsecmgmt -a clean -p ferret
parsecmgmt -a build -p ferret

What worked was a more brutal removal of inst and obj:

pkgs/apps/ferret/inst
pkgs/apps/ferret/obj

You could also do:

git clean -xdf pkgs/apps/ferret
./get-inputs

but then you would need to re-rrun ./get-inputs again because the git clen -xdf removes the unpacked inputs that were placed under pkgs/apps/ferret/inputs/.

Most/all packages appears to be organized in the same structure, take pkgs/apps/ferret for example:

  • inputs: inputs unpacked by ./get-inputs from the larger tars for the different test sizes. These are often still tarred however, e.g. pkgs/apps/ferret/inputs/input_test.tar

  • inst: installation, notably contains executables and libraries, e.g.:

    • pkgs/apps/ferret/inst/amd64-linux.gcc/bin/ferret

    • pkgs/apps/ferret/inst/amd64-linux.gcc/bin/ferret

  • obj: .o object files, e.g. pkgs/apps/ferret/obj/amd64-linux.gcc/parsec/obj/cass_add_index.o

  • parsec: parsec build and run configuration in Bash format, e.g.: pkgs/apps/ferret/obj/amd64-linux.gcc/parsec/native.runconf contains:

    #!/bin/bash
    run_exec="bin/ferret"
    run_args="corel lsh queries 50 20 ${NTHREADS} output.txt"
  • run:

    • runtime outputs, e.g. pkgs/apps/ferret/run/benchmark.out contains a copy of what went to stdout during the last -a run

    • an unpacked version of the input, pkgs/apps/ferret/inputs/input_test.tar gets unpacked directly there creating folders queries corel

  • src: the source!

  • version: a version string, e.g. 2.0

Fails with:

[PARSEC] Running 'time /home/ciro/bak/git/linux-kernel-module-cheat/parsec-benchmark/parsec-benchmark/pkgs/apps/x264/inst/amd64-linux.gcc/bin/x264 --quiet --qp 20 --partitions b8x8,i4x4 --ref 5 --direct auto --b-pyramid --weightb --mixed-refs --no-fast-pskip --me umh --subme 7 --analyse b8x8,i4x4 --threads 1 -o eledream.264 eledream_32x18_1.y4m':                                                                          [PARSEC] [---------- Beginning of output ----------]
PARSEC Benchmark Suite Version 3.0-beta-20150206
yuv4mpeg: 32x18@25/1fps, 0:0
*** Error in `/home/ciro/bak/git/linux-kernel-module-cheat/parsec-benchmark/parsec-benchmark/pkgs/apps/x264/inst/amd64-linux.gcc/bin/x264': double free or corruption (!prev): 0x0000000001a88e50 ***
/home/ciro/bak/git/linux-kernel-module-cheat/parsec-benchmark/parsec-benchmark/bin/parsecmgmt: line 1222: 20944 Aborted                 (core dumped) /home/ciro/bak/git/linux-kernel-module-cheat/parsec-benchmark/parsec-benchmark/pkgs/apps/x264/inst/amd64-linux.gcc/bin/x264 --quiet --qp 20 --partitions b8x8,i4x4 --ref 5 --direct auto --b-pyramid --weightb --mixed-refs --no-fast-pskip --me umh --subme 7 --analyse b8x8,i4x4 --threads 1 -o eledream.264 eledream_32x18_1.y4m

Mentioned on the following unresolved Parsec threads:

The problem does not happen on Ubuntu 17.10’s x264 0.148.2795 after removing b-pyramid which is not a valid argument anymore it seems., so the easiest fix for this problem is to just take the latest x264 (as a submodule, please!!) and apply parsec roi patches to it (git grep parsec under x264/src).

If you have already built for the host previously, you must first in this repo:

  • git clean -xdf, otherwise the x86 built files will interfere with buildroot

  • run Buildroot on a new shell. Otherwise . env.sh adds the ./bin/ of this repo to your PATH, and parsecmgmt is used from this source, instead of from the copy that Buildroot made

Only SPLASH2 was ported currently, not the other benchmarks.

PARSEC’s build was designed for multiple archs, this can be seen at bin/parsecmgmt, but not for cross compilation. Some of the changes we’ve had to make:

  • use CC everywhere instead of hardcoded gcc

  • use HOST_CC for .c utilities used during compilation

  • remove absolute paths, e.g. -I /usr/include

The following variables are required for cross compilation, with example values:

export GNU_HOST_NAME='x86_64-pc-linux-gnu'
export HOSTCC='/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.arm~/host/bin/ccache /usr/bin/gcc'
export M4='/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.arm~/host/usr/bin/m4'
export MAKE='/usr/bin/make -j6'
export OSTYPE=linux
export TARGET_CROSS='/home/ciro/bak/git/linux-kernel-module-cheat/buildroot/output.arm~/host/bin/arm-buildroot-linux-uclibcgnueabi-'
export HOSTTYPE='"arm"'

Then just do a normal build.

We have made a brief attempt to get the other benchmarks working. We have already adapted and merged parts of the patches static-patch.diff and xcompile-patch.diff present at: https://github.com/arm-university/arm-gem5-rsk/tree/aa3b51b175a0f3b6e75c9c856092ae0c8f2a7cdc/parsec_patches

But it was not enough for successful integration as documented below.

The main point to note is that the non-SPLASH benchmarks all use Automake.

Some of the benchmarks fail to build with:

atomic/atomic.h:38:4: error: #error Architecture not supported by atomic.h

The ARM gem5 RSK patches do seem to fix that for aarch64, but not for arm, we should port them to arm too.

Some benchmarks don’t rely on that however, and they do work, e.g. bodytrack.

Some builds work, but not all.

parsec.raytrace depends on cmake, which fails with:

---------------------------------------------
CMake 2.6-1, Copyright (c) 2007 Kitware, Inc., Insight Consortium
---------------------------------------------
Error when bootstrapping CMake:
Cannot find appropriate C compiler on this system.
Please specify one using environment variable CC.
See cmake_bootstrap.log for compilers attempted.

which is weird since I am exporting CC.

It is the only package that depends on cmake and mesa as can be found with:

git grep 'deps.*cmake'

cmake we could use host / Buildroot built one, but Mesa, really? For a CPU benchmark? I’m tempted to just get rid of this benchmark.

Furthermore, http://gem5.org/PARSEC_benchmarks says that raytrace relies on SSE intrinsics, so maybe it is not trivially portable anyways.

If we disable raytrace, cmake and mesa by editing config/packages/parsec.raytrace.pkgconf, parsec.cmake.pkgconf and parsec.mesa.pkgconf to contain:

pkg_aliases=""

the next failure is dedup, which depends on ssl, which fails with:

Operating system: x86_64-whatever-linux2
Configuring for linux-x86_64
Usage: Configure.pl [no-<cipher> ...] [enable-<cipher> ...] [experimental-<cipher> ...] [-Dxxx] [-lxxx] [-Lxxx] [-fxxx] [-Kxxx] [no-hw-xxx|no-hw] [[no-]threads] [[no-]shared] [[no-]zlib|zlib-dynamic] [enable-mon
tasm] [no-asm] [no-dso] [no-krb5] [386] [--prefix=DIR] [--openssldir=OPENSSLDIR] [--with-xxx[=vvv]] [--test-sanity] os/compiler[:flags]

dedup and netdedup are the only packages that use ssl. ssl is actually OpenSSL, which Buildroot has.

The next failure is vips due to glib:

checking for growing stack pointer... configure: error: in `/path/to/linux-kernel-module-cheat/out/aarch64/buildroot/build/parsec-benchmark-custom/pkgs/libs/glib/obj/aarch64-linux.gcc':
configure: error: cannot run test program while cross compiling

which is weird, I thought those Automake problems were avoided by --build and --host, which we added in a previous patch.

glib is and libxml are only used by vips. Buildroot has only parts of glib it seems, e.g. glibmm, but it does have libxml2.

The next failure is uptcpip on which all netapps depend:

ar rcs libuptcp.a ../freebsd.kern/*.o ../freebsd.netinet/*.o *.o ../host.support/uptcp_statis.o         ../host.support/host_serv.o         ../host.support/if_host.o
ar: ../host.support/uptcp_statis.o: No such file or directory

I hack in a pwd on the configure, and the CWD is pkgs/apps/x264/obj/aarch64-linux.gcc, so sure, there is no ./config.sub there…​

And the errors are over! :-)

While it is possible to run all tests on host with parsecmgmt, this has the following disadvantages:

  • parsecmgmt Bash scripts are themselves too slow for gem5

  • parsecmgmt -a run -p all does not stop on errors, and it becomes hard to find failures

For those reasons, we have created the test.sh script, which runs the raw executables directly, and stops on failures.

That script can be run either on host, or on guest, but you must make sure that all test inputs have been previously unpacked with:

parsecmgmt -a run -p all

test size is required since the input names for some benchmarks are different depending on the test sizes.

https://parsec.cs.princeton.edu/overview.htm gives an overview of some of them, but it is too short to be understood. TODO: go over all of them with sample input/output analysis! One day.

PARSEC 3.0 provides three server/client mode network benchmarks which leverage a user-level TCP/IP stack library for communication.

Everything under netapps is a networked version of something under app, e.g.

  • pkgs/kernels/dedup/

  • pkgs/netapps/netdedup

Was apparently a separate benchmark that got merged in.

This is suggested e.g. at https://parsec.cs.princeton.edu/overview.htm which compares SPLASH2 as a separate benchmark to parsec, linking to the now dead http://www-flash.stanford.edu/apps/SPLASH/

This is also presumably why splash went in under ext.

SPLASH-2 benchmark suite includes applications and kernels mostly in the area of high performance computing (HPC). It has been widely used to evaluate multiprocessors and their designs for the past 15 years.

Content similarity search server

This presentation by original authors appears to describe the software: https://www.cs.princeton.edu/cass/papers/Ferret_slides.pdf And here’s the paper: https://www.cs.princeton.edu/cass/papers/Ferret.pdf so we understand that it is some research software from Princeton.

Unzipping the inputs there are a bunch of images, so we understand that it must be some kind of image similarity, i.e. a computer vision task.

Given the incrediable advances in computer vision in the 2010’s, these algorithms have likey become completely obsolete compared to deep learning techniques.

Running with:

parsecmgmt -a run -p ferret -i simsmall

we see the program output as:

(7,1)
(16,2)
(16,3)
(16,4)
(16,5)
(16,6)
(16,7)
(16,8)
(16,9)
(16,10)
(16,11)
(16,12)
(16,13)
(16,14)
(16,15)
(16,16)

TODO understand.One would guess that it shows which image looks the most like each other image? But then that would mean that the algorithm sucks, since almost everything looks like 16. And 16,16 looks like itself which would have to be excluded.

If we unpack the input directory, we can see that there are 16 images some of them grouped by type:

acorn.jpg
air-fighter.jpg
airplane-2.jpg
airplane-takeoff-3.jpg
alcatraz-island-prison.jpg
american-flag-3.jpg
apartment.jpg
apollo-2.jpg
apollo-earth.jpg
apple-11.jpg
apple-14.jpg
apple-16.jpg
apple-7.jpg
aquarium-fish-25.jpg
arches-9.jpg
arches.jpg

so presumably authors would expect the airplaines and apples to be more similar to one another.

[PARSEC] parsec.freqmine [1] (data mining)
[PARSEC] Mine a transaction database for frequent itemsets
[PARSEC]   Package Group: apps
[PARSEC]   Contributor:   Intel Corp.
[PARSEC]   Aliases:       all parsec apps openmp

Frequent Itemsets Mining (FIM) is the basis of Association Rule Mining (ARM). Association Rule Mining is the process of analyzing a set of transactions to extract association rules. ARM is a very common used and well-studied data mining problem. The mining is applicable to any sequential and time series data via discretization. Example domains are protein sequences, market data, web logs, text, music, stock market, etc.

To mine ARMs is converted to mine the frequent itemsets Lk, which contains the frequent itemsets of length k. Many FIMI (FIM Implementation) algorithms have been proposed in the literature, including FP-growth and Apriori based approaches. Researches showed that the FP-growth can get much faster than some old algorithms like the Apriori based approaches except in some cases the FP-tree can be too large to be stored in memory when the database size is so large or the database is too sparse.

Googling "Frequent Itemsets Mining" leads e.g. to

Based on the items of your shopping basket, suggest other items people often buy together.

For example, if a dataset contains 100 transactions and the item set {milk, bread} appears in 20 of those transactions, the support count for {milk, bread} is 20.

Running:

parsecmgmt -a run -p freqmine -i test

produces output:

transaction number is 3
32
192
736
2100
4676
8246
11568
12916
11450
8009
4368
1820
560
120
16
1
the data preparation cost 0.003300 seconds, the FPgrowth cost 0.002152 seconds

A manual run can be done with:

cd pkgs/apps/freqmine
./inst/amd64-linux.gcc/bin/freqmine inputs/T10I4D100K_3.dat 1

where the parameters are:

  • inputs/T10I4D100K_3.dat: input data

  • minimum support

both described below.

run_args="T10I4D100K_3.dat 1"

pkgs/apps/freqmine/inputs/input_test.tar contains T10I4D100K_3.dat which contains the following plaintext file:

25 52 164 240 274 328 368 448 538 561 630 687 730 775 825 834
39 120 124 205 401 581 704 814 825 834
35 249 674 712 733 759 854 950

So we see that it contains 3 transactions, and the _3 in the filename means the number of transactions, and it also gets output by the program:

transaction number is 3

The README describes the input output incomprehensibly as:

For the input, a date-set file containing the test transactions is provided.

There is another parameter that indicates "minimum-support". When it is a integer, it means the minimum counts; when it is a floating point number between 0 and 1, it means the percentage to the total transaction number.

The program output all (different length) frequent itemsets with fixed minimum support.

Let’s hack the "test" input to something actually minimal:

1 2 3
1 2 4
2 3

Now the output for parameter 1 is:

4
5
2

and for parameter 2 is:

3
2

I think what it means is, take input parameter 1. 1 means the minimal support we are couning. The output:

4
5
2

means actually means:

How many sets are there with a given size and support at least 1:

set_size    number_of_sets
1        -> 4
2        -> 5
3        -> 2

For example, for set_size 1 there are 4 possible sets (4 pick 1, as we have 4 distinct numbers):

  • {1}: appears in 1 2 3 and 1 2 4, so support is 2, and therefore at least 1

  • {2}: appears in 1 2 3, 1 2 4 and 2 3, so support is 3, and therefore at least 1

  • {3}: appears in 1 2 3, 1 2 4 and 2 3, so support is 3, and therefore at least 1

  • {4}: appears in 1 2 4, so support is 1, and therefore at least 1

so we have 4 sets with support at least one, so the output for that line is 4.

For set_size 2, there are 6 possible sets (4 pick 2):

  • {1, 2}: appears in 1 2 3, 1 2 4, so support is 2

  • {1, 3}: appears in 1 2 3, so support is 1

  • {1, 4}: appears in 1 2 4, so support is 1

  • {2, 3}: appears in 1 2 3 and 2 3, so support is 2

  • {2, 4}: appears in 1 2 4, so support is 1

  • {3, 4}: does not appear in any line, so support is 0

Therefore, we had 5 sets with support at least 1: {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, so the output for the line is 5.

For set_size 3, there are 4 possible sets (4 pick 3):

  • {1, 2, 3}: appears in 1 2 3, so support is 1

  • {1, 2, 4}: appears in 1 2 4, so support is 1

  • {1, 3, 4}: does not appear in any line, su support is 0

  • {2, 3, 4}: does not appear in any line, su support is 0

Therefore, we had 2 sets with support at least 1: {1, 2}, {1, 3}, {1, 4}, {2, 3}, {2, 4}, so the output for the line is 2.

If we take the input parameter 2 instead, we can reuse the above full calculations to retrieve the values:

  • set_size 1: 3 sets have support at least 2: {1}, {2} and {3}

  • set_size 2: 2 sets have support at least 2: {1, 2} and {2, 3}

Presumably therefore, there is some way to calculate these outputs without having to do the full explicit set enumeration, so you can get counts for larger support sizes but not necessarily be able to get those for the smaller ones.

Well, if this doesn’t do raytracing, I would be very surprised!

pkgs/apps/raytrace/inputs/input_test.tar contains:

octahedron.obj

####
#
# Object octahedron.obj
#
# Vertices: 6
# Faces: 8
#
####
#
# Octahedron
# Synthetic model for PARSEC benchmark suite
# Created by Christian Bienia
#
####

v 1.0 0.0 0.0
v 0.0 1.0 0.0
v 0.0 0.0 1.0
v -1.0 0.0 0.0
v 0.0 -1.0 0.0
v 0.0 0.0 -1.0
# 6 vertices, 0 vertices normals

f 1 2 3
f 4 2 3
f 1 5 3
f 1 2 6
f 4 5 6
f 1 5 6
f 4 2 6
f 4 5 3
# 8 faces, 0 coords texture

# End of File

so clearly a representation of a 3D object, https://en.wikipedia.org/wiki/Wavefront_.obj_file describes the format.

And input_simdev.tar contains a much larger bunny.obj, which is a classic 3D model used by computer graphics researchers: https://en.wikipedia.org/wiki/Stanford_bunny

src/README documents that output would be in video format, and is turned off, boring!!!

The input for raytrace is a data file describing a scene that is composed of a single, complex object. The program automatically rotates the camera around the object to simulate movement. The output is a video stream that is displayed in a video. For the benchmark version output has been disabled.

These appear to be all external libraries, and don’t have tests specifically linked to them.

They are then used from other tests, e.g. pkg/libs/mesa is used from pkgs/apps/raytrace:

pkgs/apps/raytrace/parsec/gcc-pthreads.bldconf:18:build_deps="cmake mesa"

We also note that one lib can depend on another lib, e.g. glib depends on zlib:

pkgs/libs/glib/parsec/gcc.bldconf:17:build_deps="zlib"

so they were essentially building their own distro. They should have used Buildroot poor newbs!

Two deps in particular are special things used widely across many benchmarks:

git grep 'build_deps="[^"]'

Hooks are instrumentation hooks that get performance metrics out. They have several flavors for different environment, e.g. native vs magic simulator instructions.

The addition of hook points on several meaningful workloads is basically one of PARSEC’s most important features.

Presumably it is something to do with being able to use different forms of parallelism transparently?

This repo was started from version 3.0-beta-20150206:

$ md5sum parsec-3.0.tar.gz
328a6b83dacd29f61be2f25dc0b5a053  parsec-3.0.tar.gz

We later learnt about parsec-3.0-core.tar.gz, which is in theory cleaner than the full tar, but even that still contains some tars, so it won’t make much of a difference.

Why this fork: how can a project exist without Git those days? I need a way to track my patches sanely. And the thing didn’t build on latest Ubuntu of course :-)

We try to keep this as close to mainline functionality as possible to be able to compare results, except that it should build and run.

We can’t track all the huge input blobs on GitHub or it will blow up the 1Gb max size, so let’s try to track everything that is not humongous, and then let users download the missing blobs from Princeton directly.

Let’s also remove the random output files that the researches forgot inside the messy tarball as we find them.

All that matters is that this should compile fine: runtime will then fail due to missing input data.

I feel like libs contains ancient versions of a bunch of well known third party libraries, so we are just re-porting them to newest Ubuntu, which has already been done upstream…​ and many of the problems are documentation generation related…​ at some point I want to just use Debian packages or git submodules or Buildroot packages.

TODO: after build some ./configure and config.h.in files are modified. But removing them makes build fail. E.g.:

  • pkgs/apps/bodytrack/src/config.h.in

  • pkgs/apps/bodytrack/src/configure

Parse is just at another level of software engineering quality.

Princeton stopped actively supporting PARSEC directly, they don’t usually reply on the mailing list. So a few forks / patches / issue trackers have popped up in addition to ours: