This repository contains a setup for evaluating the programming infrastructure for the HammerBlade project. HammerBlade is a custom architecture that is programmable using the domain-specific languages PyTorch, and GraphIt. PyTorch is a popular prototyping language for ML computations, and GraphIt is an MIT-developed programming language for developing and tuning graph computations. Our PyTorch tool flow maps down on to the TVM compiler infrastructure.
To help deploy this toolchain, we have provided in this repository a Docker container setup that includes PyTorch, GraphIt, and tools for estimating the energy consumption of programs implemented using this infrastructure.
First, clone this repository and cd
into it:
$ git clone https://github.com/bespoke-silicon-group/hb_starlite.git
$ cd hb_starlite
(You can also use the internal GitLab instance instead of the public GitHub repository if you prefer. They contain the same code.) Then, use our launch script to pull the container and start it up:
$ ./docker/bash.sh samps/hb_starlite
Or, if you don't have access to the private GitLab and Docker registry, you can use a publicly hosted version of the repository and Docker image instead:
When you're inside the container, the default directory is /workspace
, which is a mount of the hb_starlite
directory on the host.
So you'll see everything in this repository, including this README file, when you type ls
.
One additional note: please type python3
, not just python
, to use Python.
And if you need to install packages, use pip3
instead of just pip
.
All our tools are installed for Python 3.x, and python
on this system is Python 2.x.
Inside the container, you can use PyTorch, TVM, and GraphIt.
For machine learning and dense linear algebra development, use PyTorch. Consider starting with the "60-minute blitz" PyTorch tutorial, which shows you how to build and train an image classifier.
When following these tutorials, remember to invoke Python by typing python3
, not just python
.
For dealing with sparse tensor data, please use the torch.sparse module. It's marked as experimental, but it's good for most common uses of sparse matrix data.
To develop graph processing kernels, use GraphIt. Begin by following the Getting Started guide, which walks you through the implementation of the PageRank-Delta algorithm. (You can skip the initial setup instructions; the compiler is already installed in the container for you.) We recommend watching our GraphIt tutorial screencast for an introduction to the language.
Once you've gotten the basics down, check out the language manual for more details. The included example applications are also useful as reference material.
We recommend that you do not use GraphIt's scheduling language. Sticking with the default schedule should be fine for this programmability evaluation, so please just focus on expressing the algorithm. We also recommend, when writing GraphIt programs, that you never hard-code parameters or filenames---always accept them as command-line arguments. Keeping these flexible will make it easier to run on multiple inputs.
To build applications that use both tensor-oriented compute and graph processing, use Python. PyTorch (and TVM) use Python natively as their interface, and GraphIt has Python bindings.
To use GraphIt from Python, you can imitate our example project, which shows how to interact with a single-source shortest path (SSSP) kernel from a Python program. Specifically, follow these steps:
-
Change your GraphIt program by renaming your
main
function to something descriptive, and mark it using theexport
keyword. -
Replace any globals that come from
argv
or are read from files to instead come from arguments to this function. For example, our SSSP program defines a function like this:export func do_sssp(input_edges : edgeset{Edge}(Vertex,Vertex,int), source_vertex : int) -> output : vector{Vertex}(int) edges = input_edges; vertices = edges.getVertices(); ...
whereas the "standalone" version gets
edges
from a file (by callingload
) andsource_vertex
comes fromargv
. However,edges
andvertices
remain as globalconst
declarations. -
In your Python program, add
import graphit
. Then, usegraphit.compile_and_load
to import your GraphIt code as a module. In our example, we call itsssp_module
:sssp_module = graphit.compile_and_load("sssp.gt")
The argument to
compile_and_load
is the filename of your GraphIt source code. -
Call
<module>.<function>(...)
to invoke your GraphIt function. In our example, for instance, we callsssp_module.do_sssp(edges, start_vertex)
. To supplyedgeset
andvector{Vertex}
arguments to GraphIt functions, usescipy.sparse.csr_matrix
and NumPy array values, respectively. You can construct acsr_matrix
manually or load one from an.npz
file, for example. -
If you need to, you can convert the output from a GraphIt function into a PyTorch tensor. Just use
torch.tensor(vals)
.
We have also provided some utilities for interacting with GraphIt in graphit_util.py
.
You might want to use these functions:
- The function
load_cached
there works likegraphit.compile_and_load
, but it will skip compilation if the GraphIt source file has not changed. This can make development and testing faster: compiling takes a few seconds, even for short programs. - A
read_adjacency_tsv
function parses "adjacency TSV" files, of the sort made popular by the MIT GraphChallenge datasets.
To see these utilities in action, see our more complete example.py
.
You can run this example on a non-trivial input graph:
$ curl -LO 'https://graphchallenge.s3.amazonaws.com/snap/ca-GrQc/ca-GrQc_adj.tsv'
$ python3 example.py ./ca-GrQc_adj.tsv 3