This implementation aims to reproduce the main results we reported in the paper "Data-driven context-sensitivity points-to analysis" submitted to OOPSLA 2017. Specifically, Table 3 and learned formulas in Appendix B will be reproduced.
- 64-bit Ubuntu operating system
- Java 7 (or higher)
- Python 2.x
Running Doop Framework requires Datalog engine that computes new facts from initial facts and inference rules given by Doop framework. Please execute the following command in your terminal to make sure your system has one of them.
$ bloxbatch -help
If you need to install Datalog engine, please visit this page. The page provides a deb package of an academic version of LogicBlox v3 engine.
Running Doop framework requires an environment variable $DOOP_HOME
. Please make sure this variable is set to the directory of "Data-Driven-Doop". Executing the following commands will do the job.
Data-Driven-Doop$ export $pwd > ~/.profile
Data-Driven-Doop$ source ~/.profile
Verifying installation is very easy. You can check the installation by running the following command:
Data-Driven-Pointsto-Analysis$ ./eval.py s2objH+Data eclipse
You will see the results as follows:
eclipse
==================================================================
Running selective-2-object-sensitive+heap+Data points-to analysis
...
Pointer analysis START
analysis time 24.34s
Pointer analysis FINISH
...
polymorphic virtual call sites 1,066
reachable methods 7,971
reachable casts that may fail 596
call graph edges 134,827
==================================================================
The results say that
- The program to be analyzed is eclipse
- This analysis analyzes program with selective-2-object-sensitive+heap+Data
- The main analysis took 24.34 seconds
- The results for each client (poly v-calls, reachable methods, may-fail casts, call-graph-edges)
In the implementation, you can find:
- Data-Driven-Doop: This folder contains a Doop framework, which was modified to support our data-driven technique. The modifications include 1) new Datalog rules to allow flexible context-depth handling and 2) additional analyses that are parameterized to context-depth-selection heuristics.
- features: This folder will store feature extraction results during the learning phase.
- eval.py: A Python script for reproducing Table 3 using learned heuristics. (Section 2.1)
- learn.py: A Python script for reproducing Table 1 and Appendix B. (Section 2.2)
We implemented our approach on top of a variant of Doop framework, which is introduced along with the paper "Introspective analysis: context-sensitivity, across the board" by Smaragdakis et al. You can download the original copy of introspective version Doop framework from here.
Here are a list of important files we changed:
It has basic analysis rules that are shared by all analysis instances. In the original rules, analyzing a method invocation statement results one calling context. Following example is for static invocations:
MergeStaticMacro(?callerCtx, ?invocation, ?calleeCtx),
CallGraphEdge(?callerCtx, ?invocation, ?calleeCtx, ?tomethod) <-
StaticMethodInvocationSkolemOpt(?callerCtx, ?invocation, ?tomethod).
In contrary, our implementation takes a invocation fact and assigns context depth from zero to two with respect to our context-sensitivity heuristic Select2
and Select1
:
LMergeStaticMacro2(?callerCtx, ?invocation, ?calleeCtx),
CallGraphEdge(?callerCtx, ?invocation, ?calleeCtx, ?tomethod) <-
Select2(?tomethod),
StaticMethodInvocationSkolemOpt(?callerCtx, ?invocation, ?tomethod).
LMergeStaticMacro1(?callerCtx, ?invocation, ?calleeCtx),
CallGraphEdge(?callerCtx, ?invocation, ?calleeCtx, ?tomethod) <-
!Select2(?tomethod),
Select1(?tomethod),
StaticMethodInvocationSkolemOpt(?callerCtx, ?invocation, ?tomethod).
LMergeStaticMacro0(?callerCtx, ?invocation, ?calleeCtx),
CallGraphEdge(?callerCtx, ?invocation, ?calleeCtx, ?tomethod) <-
!Select2(?tomethod),
!Select1(?tomethod),
StaticMethodInvocationSkolemOpt(?callerCtx, ?invocation, ?tomethod).
We did the same thing for virtual and special method invocations. You can easily find them by searching predicates Select1
or Select2
in the file.
This file contains definitions of Select1
and Select2
predicates. The learning script learn.py
changes the definitions during the learning phase.
These folders contains modified versions of three context-sensitive analyses; object, selective-hybrid-object, and type sensitivities. Each analysis has macros.logic
, which defines MergeMacro
rules for producing calleeCtx
fact. Similar to the case of context-sensitive.logic
, we expanded the original rule into three in order to handle multiple context-depths in one place. For example, 012-object-sensitive+heap/macros.logic
has three macros for virtual method invocations as follows:
//2-object-sensitive
#define LMergeMacro2(callerCtx, invocation, hctx, heap, calleeCtx) \
Context(calleeCtx), \
ContextFromRealContext[RealHContextFromHContext[hctx], heap] = calleeCtx
//1-object-sensitive
#define LMergeMacro1(callerCtx, invocation, hctx, heap, calleeCtx) \
Context(calleeCtx), \
ContextFromRealContext[InitialHeapValue[], heap] = calleeCtx
//context-insensitive
#define LMergeMacro0(callerCtx, invocation, hctx, heap, calleeCtx) \
Context(calleeCtx), \
ContextFromRealContext[InitialHeapValue[], InitialHeapValue[]] = calleeCtx
We did the same thing for static method invocations.
This file contains definitions of atomic features. The learning script learn.py
changes the contents if necessary.
This script is for reproducing Table 3 in the paper. Usage is ./eval.py ANALYSIS BENCHMARK
. It returns points-to analysis report for a given target analysis (ANALYSIS
) and benchmark (BENCHMARK
).
ANALYSIS
can be a category name or an abbreviation of specific analysis. The script supports four categories --- data
, intro
, normal
, and insens
--- and each category (except for insens
) includes multiple context-sensitive analyses as follows:
data
s2objH+Data
: Data-driven selective hybrid 2-object-sensitive analysis with a context-sensitive heap2objH+Data
: Data-driven 2-object-sensitive analysis with a context-sensitive heap2-typeH+Data
: Data-driven 2-type-sensitive analysis with a context-sensitive heap
intro
introA-s2objH
,introB-s2objH
: Introspective versions of selective hybrid 2-object-sensitive analysis with a context-sensitive heapintroA-2objH
,introB-2objH
: Introspective versions of 2-object-sensitive analysis with a context-sensitive heapintroA-typeH
,introB-typeH
: Introspective versions of 2-type-sensitive analysis with a context-sensitive heap
normal
s2objH
: Unmodified selective hybrid 2-object-sensitive analysis with a context-sensitive heap2objH
: Unmodified 2-object-sensitive analysis with a context-sensitive heap2typeH
: Unmodified 2-type-sensitive analysis with a context-sensitive heap
insens
: context-insensitive analysis
For benchmarks, the script supports six large benchmarks in DaCapo suite: eclipse
, chart
, bloat
, xalan
, jython
, and hsqldb
. Also, you can use all
keyword to run all those benchmarks at once.
This script is for learning Boolean formulas shown in Appendix B in the paper. Usage is simply ./learn.py ANALYSIS
. The argument ANALYSIS
can be one of three context-sensitivities: sobj
, obj
, or type
. Executing this script mainly does two things: 1) atomic feature extraction and 2) learning Boolean formulas. Please note that this procedure fully runs our learning algorithm over the entire training set, so it takes about 2 days in total.
Functions learn
and refine
implement Algorithm 2 in the paper. The learn
function takes current formulas (i.e., learn
function calls refine
to learn Boolean formulas (line 600).
The refine
function first makes the initial formula need_further_refinement
) but ours skips the checking as many as possible to accelerate the learning procedure. For example, if the chosen atomic feature can prove more queries than the clause to conjoin, we can safely conclude that the refined formula's precision must be unchanged without calling points-to analysis.
The learned heuristic can be found in the select.logic
file located in Data-Driven-Doop/logic
. Analyzing benchmark with the learned heuristic can be done by executing following commands:
Data-Driven-Doop$ ./run -jre1.6 ANALYSIS jars/dacapo/BENCHMARK.jar
You should replace ANALYSIS
with the full name of one you used in learning phase:
sobj
:s-012-object-sensitive+heap
obj
:012-object-sensitive+heap
type
:012-type-sensitive+heap
For available benchmarks, you can check Data-Driven-Doop/jars/dacapo
directory.
We've archived a ready-to-run version of our implementation in ACM Digital Library (Link). In the page, you can find the download link "Auxiliary Archive" under the "Source Materials" section. Please note that the size of the VirtualBox image is about 7.2 GB and importing it to your machine may require additional 50GB.