Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

State refactor pt1 #118

Open
wants to merge 4 commits into
base: mainline
Choose a base branch
from
Open

State refactor pt1 #118

wants to merge 4 commits into from

Conversation

victornicolet
Copy link
Contributor

This splits the dataflow.AnalyzerState into different State structs that embed each other:

  • the config.State contains the config and logger.
  • the loadprogram.State additionally contains the information about a built SSA program.
  • the ptr.State additionally contains the information about the pointer analysis result.
  • the dataflow.State additionally contains information about dataflows.
    Other analyzers (taint, backtrace, dependencies, syntactic, ...) can build only the state they need, and the type tells what analysis results are available.

BuildMode: ssa.BuilderMode(0),
LoadTests: false,
ApplyRewrites: true,
Platform: "",
PackageConfig: nil,
}
pkgs, _, err := LoadProgram(loadOptions, files)
pkgs, _, err := loadprogram.Do(files, loadOptions)
if err != nil {
t.Fatalf("error loading packages: %s", err)
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The test should make sure that len(pkgs) > 0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: Should this belong in the loadprogram package?

)

// A State is the base state for the analyses in Argot. Analyses that do not require whole-program analysis
// should be built with the go tools analysis framework.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: a link to https://pkg.go.dev/golang.org/x/tools/go/analysis would be useful here

// A State is the base state for the analyses in Argot. Analyses that do not require whole-program analysis
// should be built with the go tools analysis framework.
type State struct {
*config.State
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to mutate config.State? If not, maybe this shouldn't be a pointer

Comment on lines +57 to +59
numAlarms atomic.Int32
errors map[string][]error
errorMutex sync.Mutex
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this could be a single struct for clarity

log := config.NewLogGroup(cfg)
state, err := dataflow.NewInitializedAnalyzerState(program, lp.Pkgs, log, cfg)
lp.Config.SlicingProblems = []config.SlicingSpec{{BacktracePoints: lp.Config.TaintTrackingProblems[0].Sinks}}
state, err := dataflow.NewDefault(lp.Config, lp.Prog, lp.Pkgs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think dataflow.NewDefaultState would be a more specific name

package config

// ConfigLogger groups a config and a logger. All "state" structs should implement this.
type ConfigLogger interface {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think Configurer would be a more fitting name in case we want to add a method in the future

options loadprogram.Options) (*loadprogram.State, error) {
if name != "" {
// If it's a named target, need to change to project root's directory to properly load the target
err := os.Chdir(c.Config.Root())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should try to avoid these kinds of side-effects as much as possible. There's a way to load a program from a directory without cd-ing to it: https://github.com/awslabs/ar-go-tools/blob/mainline/internal/analysistest/analysistest.go#L59

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point.
Since this was a change from the previous PR and I'm not sure how well it will work with larger programs with vendored dependencies we can make that change in a future PR?


// BuildWholeProgramTarget loads the target specified by the list of files provided. Return an analyzer state that has
// been initialized with the program if successful. The Target of that state will be set to the provided name.
func BuildWholeProgramTarget(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's confusing that there are two ways to construct a loadprogram.State: this function and loadprogram.NewState. I propose deleting all the BuildXTarget functions and making the x.NewState functions specify whatever dependencies they need as parameters.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Having tried to not have the BuildXTarget initially, I found myself just copy-pasting code everywhere that basically implements the BuildXTarget.
The difference between the NewState functions for every state and the BuildXTarget is:

  • with BuildXTarget you give files, load options and config and you get X. BuildXTarget does all the work.
  • with NewState you have to build the program yourself, or the previous state yourself. NewState only builds the additional information on top of its arguments, so you have to chain loadprogram.Do and NewState for every state you want. This gives more control, but in most places (tests, executables) you just want BuildXTarget. NewState will be more useful when/if we don't specify a tool and argot just detects what analyses it needs to run and what states it needs to build.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's fair. How about renaming each BuildXTarget function to <package>.NewDefaultState?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is still a difference that the NewDefaultState accepts an already built program as input (because it's really here only for tests) whereas the BuildXTarget takes in patterns as input, and then builds the program.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ended up removing the BuildXTarget functions an instead introducing a result monad to chain the state building operations. Still have some cleanup to do, but this simplifies the interface.

log := config.NewLogGroup(cfg)
state, err := dataflow.NewInitializedAnalyzerState(program, lp.Pkgs, log, cfg)
lp.Config.SlicingProblems = []config.SlicingSpec{{BacktracePoints: lp.Config.TaintTrackingProblems[0].Sinks}}
state, err := dataflow.NewDefault(lp.Config, lp.Prog, lp.Pkgs)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here's how I'd like this API to work using a dependency-injection style:

cfgState, err := config.NewState(...)
progState, err := analysistest.NewProgramState(cfgState, testfsys, ...)
ptrState, err := ptr.NewState(progState)
dfState, err := dataflow.NewState(ptrState)
res, err := backtrace.Analyze(dfState)

We may even be able to merge the implementation of analysistest.LoadTest into loadprogram.NewState, but that would mean that loadprogram.NewState would need a parameter that implements the filesystem interface which may be a bit much...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think without a result monad the chaining of those operations where each returns an error becomes very verbose.
Also, one option would be to make the config.State contain all the arguments for the next NewState, i.e. loadprogram.NewState.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants