Skip to content

GSoC 2020 Project Ideas

Corina Pasareanu edited this page Mar 4, 2020 · 57 revisions

Project Ideas

Please note that this list is not exclusive. If you have other ideas and topics related to JPF, please let us know on the JPF Google group. A possible proposal template can be found at the bottom of our GSoC page: JPF Google Summer of Code 2020.

JPF Infrastructure

JPF Application Domains

Automatic Program Repair

Symbolic Execution

Fuzzing

Smart Contract

Concolic Execution

Environment and Test Case Generation

Symbolic Data-race Detection

Projects Descriptions

Support Java 11 for jpf-core

Description: jpf-core is essentially a JVM that currently fully supports only Java 8. The goal of this project is to make it up-to-date with new features of Java 11. The JPF source itself has already been made compatible with Java 11. Now, JPF should support new features of Java 11 bytecode. The key feature of Java 11 that is currently not supported are bootstrap methods that are generated at load time. They are used for things as common as string concatenation ("Hello, " + name). As of now, a few specialized cases are supported, but there are still many programs (and unit tests) that fail with Java 11. It is therefore very important for us that we support the general case of this feature. There are also some internal APIs from Java 11 that no longer exist in Java 12 and later, so time permitting, we would also like to update code depending on these. This is a high-priority project, as support for Java 8 is limited to the near future.

Difficulty: Hard
Required skills: Knowledge of Java bytecode
Preferred skills: Knowledge of bootstrap methods in Java bytecode

Support Java 11 for SPF

Description: Symbolic PathFinder is essentially a (symbolic) JVM that currently supports only Java 8. The goal of this project is to make it up-to-date with new features of Java 11. This is a high-priority project, as support for Java 8 is limited to the near future.

Support for gradle for SPF

Description: The goal of this project is to (1) implement gradle support for Symbolic Pathfinder, (2) to update the extension template, including gradle support and updated documentation.

Improving String Analysis in SPF

Description: Symbolic PathFinder incorporates String constraint solvers (ABC,Z3) to enable analysis of programs that process Strings. The project will involve careful testing and improving the infrastructure for String solving.

Method Summaries, extended

Description: A thesis project implemented Summaries of methods when executing in JPF. It includes a representation of the summaries that captures all side effects of the given procedure; and implements modifications of the program state.

The actual summary of a method will be computed during its first execution, and then reused within traversal of other state paths. Another possible feature is the support for externally defined summaries that would be useful for library methods.

Experiments have shown that without summarizing the effects of constructors, most methods cannot be summarized. This is caused by the construction of new objects or throwing an exception (which also create a new object). Summarizing the effect of constructors would therefore be a huge enhancement to this technique. Other enhancements may also be possible.

Difficulty: Medium
Required skills: Knowledge of Java bytecode
Preferred skills: Handling of weak references, garbage collection

Model Checking Distributed Java Applications

Description: jpf-nas is an extension of JPF that provides support for model checking distributed multithreaded Java applications. It relies on the multiprocess support included in the jpf-core which provides basic functionality to verify the bytecode of distributed applications. jpf-nas supports interprocess communication via TCP sockets by modeling the Java networking package java.net. This tool can handle simple multi-client server applications. Some examples can be found in the jpf-nas distribution (at jpf-nas/src/examples/). The goal of this project is to extend the functionality of jpf-nas in various ways, such as extending the communication model supported by the tool towards an existing open source Java library/framework, called QuickServer, increasing the performance of the tool by improving the mechanism used to manage the state of communication objects, extending the tool with the cache-based approach used in net-iocache, etc.

Difficulty: Medium
Required skills: Knowledge of Java networking
Preferred skills: Knowledge of jpf-nas or net-iocache

Automatic program repair using annotations

Description: Automated program repair (APR) techniques often generate overfitting patches due to the reliance on test cases for patch generation and validation. In this project, we propose to overcome the overffiting issue in APR by leveraging developer-provided partial annotations to aid semantic reasoning. Developer annotations can come in different forms, e.g., JPF annotation. The advantage of developer annotations is two-fold. First, in addition to test cases, it helps augment the specifications of the program under analysis and thus provides more complete specifications. These annotations, despite being simple, can help significantly in semantic reasoning, e.g., null pointer analysis. Second, these annotations are not required to be complex so that to reduce the burden of manual effort by developers. For example, to reason about null pointer exception errors, developers are only required to add a few Nullable or Non-Nullable annotations to class fields or method parameters, etc. We will use JPF and SPF for symbolically reasoning about the semantics of programs under analysis and generating repairs. We will also use JPF-Annotation as a way for developers to provide annotations.

Refactoring SPF constraint library

Description: SPF constraints need to be refactored to allow different kinds of constraints to be combined during the construction of a path condition. An example of how it should be after the refactoring is the Abstract Syntax Tree constructed by GREEN.

Hash-consing for SPF

Description: Hash-consing is a technique that reuses previously constructed expressions to avoid duplication during construction of larger expressions. It is a technique that has been extensively used for creating maximally-shared graphs (see Calysto by Babic et al.), reusing structurally equivalent expressions in KLEE (by Cadar et al.). Variants of this idea are also applied in other binary symbolic executors like FuzzBALL and built in to new binary analysis frameworks (see Jung et al). Symbolic PathFinder currently does not support hash-consing or sharing of subexpressions causing memory usage to blow-up in pathological examples. This project would be about implementing hash-consing or a variant of this idea in SPF. This would have a clear benefit in reducing SPF's memory usage.

Difficulty: Medium
Required skills: Knowledge of symbolic execution and what it is useful for
Preferred skills: Prior experience of developing or maintaining a symbolic execution tool
Expected outcomes: Clear reduction in memory usage of SPF when we run benchmarks. The idea of hash-consing is being adapted from the veritesting paper by Avgerinos et al.

Visualizing ChoiceGenerator tree for SPF

Description: A symbolic executor explores feasible choices through a program. It can often be difficult to understand how a symbolic executor got to a particular program location in a given symbolic state. These difficulties arise from not being able to easily see all the previous choices the symbolic executor made to get to a certain point. To address this limitation, a symbolic executor can be asked to visually report its tree of exploration choices. An example of this is FuzzBALL's "-decision-tree-use-file" that reports FuzzBALL's tree of explored choices into a file that can be visualized. This project would be add a similar feature in SPF. Given a point of symbolic exploration, it would allow SPF to report its tree of explored ChoiceGenerator objects into a file that can be observed visually.

Difficulty: Easy
Required skills: Java programming
Preferred skills: Some knowledge of symbolic execution
Expected outcomes: Visualization of the choice generator tree when exploring a few benchmarks; implementation of a few ideas to improve the usability of such trees when generated from a symbolic execution run with complex software

Combinatorial testing of configuration options for SPF

Description: SPF has a large number of diverse configuration options. Enabling some features require combinations of options (such as incremental solving) whereas other options are backed by a broken implementation. For example, during the recently concluded SV-COMP 2020 competition, the Java Ranger authors (which also includes me) turned off symbolic string support, while the SPF team chose to leave this option on. For this project, we would examine all of SPF's options and construct test cases to combinatorially cover all of them. The outcome of this project would be a regression test suite that combinatorially covers all of SPF's options and provides clear documentation on which options don't work. If there is still time available during the summer, we would also attempt to fix SPF's broken symbolic string solving. There have been many recent advances in string solving and it would be valuable to have support for powerful string solvers such as z3str3. This project can also be combined with the improving string analysis project above.

Difficulty: Medium
Required skills: Java programming
Preferred skills: Some knowledge of SPF's configuration options
Expected outcomes: A clear list of SPF options that dont play well with each other; a list of test cases that tested the correct interactions between SPF options

Beneficial path-merging for SPF

Description: Path-merging has recently been implemented as an extension to the Symbolic PathFinder tool. However, path-merging is not always beneficial because it can contribute to making the contents of the stack and/or the heap symbolic. Later branching on these symbolic contents can cause further branching. This project is about developing a heuristic similar to the one proposed by Kuznetsov et al. for symbolic execution of Java bytecode.

Difficulty: Hard
Required skills: Knowledge of path-merging in symbolic execution; examples include dynamic state merging and/or veritesting
Preferred skills: Knowledge of SPF's internals when it comes to its execution of conditional branches; knowledge of Java Ranger would be further beneficial
Expected outcomes: a heuristic that improves SPF's performance when it uses path-merging but avoids path-merging when it doesn't seem beneficial; the targeted heuristic is of the kind proposed in the state merging paper above by Kuznetsov et al.

Whitebox Fuzzer and Grammar Learner

Description: Build a whitebox fuzzing tool on top of Symbolic PathFinder, that can learn the input grammar of a piece of code in an iterative fashion. The idea would be to first run the code on symbolic input of a fixed length and then learn a possible grammar for this length, at that point extend the length and generalise the grammar. The main research goal behind this project is to see if one can do whitebox fuzzing without a pre-determined seed file (which is the way most whitebox fuzzers work at the moment).

Fuzzing and Symbolic Execution

Description: Develop a fuzzer for Java that can be integrated with SPF (or another Java based symbolic execution engine). The idea would be that when fuzzing gets stuck and makes no progress that the symbolic analysis can create a new seed file to allow analysis to progress.

Comparison between concolic and classical symbolic execution

Description: Comparison between concolic execution, e.g. DEEPSEA and JDart, and classical symbolic execution, e.g. SPF.

Smart Contract Analysis

Description: Develop a mechanism to allow the analysis of Ethereum Virtual Machine (EVM) bytecode by replacing the JVM bytecodes with EVM bytecodes within JPF. The second part of the project would be to extend the bytecodes further to allow symbolic execution as well.

JDart for Dynamic Taint Analysis

Description: JDart is a dynamic symbolic execution engine Java based on Java PathFinder (JPF). The tool executes Java programs with concrete and symbolic values at the same time and records symbolic constraints describing all the decisions along a particular path of the execution. These path constraints are then used to find new paths in the program. Concrete data values for exercising these paths are generated using a constraint solver.

Currently JDart has two mechanisms for marking data values in a program for symbolic analysis. First, JDart analyzes methods during starting symbolic analysis. All parameters of an analyzed method are treated symbolically. Second, special annotations can be used to mark class members that should be analyzed symbolically. This mode of operation is geared towards generating test cases as well as the symbolic summaries of methods.

The goal of this project is to add a new mode of operation to JDart, in which symbolic variables can be introduced dynamically during execution – and also be made purely concrete again. This can be done by introducing symbolic variables for (or removing symbolic annotations from) the return values of specific methods (e.g., Random.nextInt()) – using listeners provided by JPF. This new mode of operation would have two benefits:

  1. Existing programs can be analyzed without modifications. And analysis is not confined to individual methods. This will, e.g., enable JDart to analyze the Java programs in the SVCOMP benchmarks (Injection Flaws).
  2. It is one step towards enabling using JDart for dynamic taint analysis, e.g., for analyzing potential for injection attacks (Minepump). Web-applications tend to contain code of form:
u = Request.getValueOf(“user”); //Taint u (symbolically) 
query = “SELECT user FROM table WHERE uid=”+u; //query becomes tainted
query = santitize(query); //Un-taint query (symbolically.) 
db.select(query);

If the value of u is not properly analyzed and sanitized, an attacker can exploit this code to gain access to arbitrary information in the applications database. Making the return value of Request.getValueOf(“user”) symbolic, is the first step towards tracing taint of this value.

Please note: For a full dynamic taint analysis, JDart would also have to be extended to analyze strings and arrays symbolically --- this is out of the scope of this project.

Test Case Generation/Model-based Testing with Modbat for JPF

Description: JPF requires test cases as a starting point to explore a system. It is therefore suitable to use test case generation to create test cases automatically. Modbat is an open-source tool for test case generation. For testing concurrent software, an obvious choice would be to combine Modbat (to generate tests) with JPF (to execute tests and find concurrency problems). This has been done once as a proof of concept but is not supported in the current version of Modbat. The main reason for this is that Modbat's reporting has to read and parse bytecode, which requires access to some native code that JPF does not support. The goal is to find all problems where Modbat requires native access, and to use jpf-nhandler to resolve as many of these cases as possible. Remaining cases can be handled with custom model/peer classes.

Difficulty: Easy
Required skills: Knowledge of Java Pathfinder
Preferred skills: Knowledge of test generation

Symbolic data-race detection for Habanero Java

Description: Habanero Java is a Java implementation of the Habanero Extreme-scale programming model for multithreaded applications. The model is based on X10 and supports fork/join semantics as well as futures, isolation, and phasers. The advantage of structured parallelism such as Habanero is that the language itself provides concurrency guarantees such as deadlock freedom and determinacy if and only the program is free of data-race. A data-race occurs when two or more threads of execution access the same memory location and at least one of those accesses is a write. An additional advantage of structured parallelism is that run-times and analysis can be optimized based on the language structure itself. Recent work adds to JPF the ability to model check Habanero Java programs using a verification specific runtime and an algorithm that constructs and analyzes a computation graph representing the happens-before relation of the program execution (1, 2. The analysis is predictive because it infers from the single observed schedule the presence or absence of data-race in other non-observed schedules and only needs to enumerate schedules around isolation. Enumeration schedules around isolation though is still expensive and leads to state explosion in JPF. The work in this project is to mitigate this state explosion in enumerating schedules around isolation by building a symbolic computation graph from the program execution that adds constraints on the graph edges indicating under what condition the edge is active, and then using an SMT solver to find a set of edges on which a data-race exists. A first step in the project is to add a dynamic partial order reduction to JPF that is able to inform the symbolic computation graph about dependencies.

Difficulty: Medium
Required skills: Java programming and knowledge of model checking
Expected outcomes: SMT solution to data-race that avoids state exploration

Clone this wiki locally