State refactor pt1 #118

victornicolet · 2024-11-22T18:01:38Z

This splits the dataflow.AnalyzerState into different State structs that embed each other:

the config.State contains the config and logger.
the loadprogram.State additionally contains the information about a built SSA program.
the ptr.State additionally contains the information about the pointer analysis result.
the dataflow.State additionally contains information about dataflows.
Other analyzers (taint, backtrace, dependencies, syntactic, ...) can build only the state they need, and the type tells what analysis results are available.

samarth-aws · 2024-11-22T18:31:38Z

analysis/load_program_test.go

 		BuildMode:     ssa.BuilderMode(0),
 		LoadTests:     false,
 		ApplyRewrites: true,
 		Platform:      "",
 		PackageConfig: nil,
 	}
-	pkgs, _, err := LoadProgram(loadOptions, files)
+	pkgs, _, err := loadprogram.Do(files, loadOptions)
 	if err != nil {
 		t.Fatalf("error loading packages: %s", err)
 	}


The test should make sure that len(pkgs) > 0

samarth-aws · 2024-11-22T18:32:54Z

analysis/load_program_test.go

nit: Should this belong in the loadprogram package?

samarth-aws · 2024-11-22T18:34:30Z

analysis/loadprogram/state.go

+)
+
+// A State is the base state for the analyses in Argot. Analyses that do not require whole-program analysis
+// should be built with the go tools analysis framework.


nit: a link to https://pkg.go.dev/golang.org/x/tools/go/analysis would be useful here

samarth-aws · 2024-11-22T18:35:31Z

analysis/loadprogram/state.go

+// A State is the base state for the analyses in Argot. Analyses that do not require whole-program analysis
+// should be built with the go tools analysis framework.
+type State struct {
+	*config.State


Is it possible to mutate config.State? If not, maybe this shouldn't be a pointer

samarth-aws · 2024-11-22T18:36:22Z

analysis/loadprogram/state.go

+	numAlarms  atomic.Int32
+	errors     map[string][]error
+	errorMutex sync.Mutex


nit: this could be a single struct for clarity

samarth-aws · 2024-11-22T18:40:56Z

analysis/backtrace/backtrace_taint_test.go

-	log := config.NewLogGroup(cfg)
-	state, err := dataflow.NewInitializedAnalyzerState(program, lp.Pkgs, log, cfg)
+	lp.Config.SlicingProblems = []config.SlicingSpec{{BacktracePoints: lp.Config.TaintTrackingProblems[0].Sinks}}
+	state, err := dataflow.NewDefault(lp.Config, lp.Prog, lp.Pkgs)


I think dataflow.NewDefaultState would be a more specific name

samarth-aws · 2024-11-22T18:50:44Z

analysis/config/state.go

+package config
+
+// ConfigLogger groups a config and a logger. All "state" structs should implement this.
+type ConfigLogger interface {


I think Configurer would be a more fitting name in case we want to add a method in the future

samarth-aws · 2024-11-22T19:02:29Z

analysis/builders.go

+	options loadprogram.Options) (*loadprogram.State, error) {
+	if name != "" {
+		// If it's a named target, need to change to project root's directory to properly load the target
+		err := os.Chdir(c.Config.Root())


We should try to avoid these kinds of side-effects as much as possible. There's a way to load a program from a directory without cd-ing to it: https://github.com/awslabs/ar-go-tools/blob/mainline/internal/analysistest/analysistest.go#L59

Good point.
Since this was a change from the previous PR and I'm not sure how well it will work with larger programs with vendored dependencies we can make that change in a future PR?

samarth-aws · 2024-11-22T19:12:20Z

analysis/builders.go

+
+// BuildWholeProgramTarget loads the target specified by the list of files provided. Return an analyzer state that has
+// been initialized with the program if successful. The Target of that state will be set to the provided name.
+func BuildWholeProgramTarget(


It's confusing that there are two ways to construct a loadprogram.State: this function and loadprogram.NewState. I propose deleting all the BuildXTarget functions and making the x.NewState functions specify whatever dependencies they need as parameters.

Having tried to not have the BuildXTarget initially, I found myself just copy-pasting code everywhere that basically implements the BuildXTarget.
The difference between the NewState functions for every state and the BuildXTarget is:

with BuildXTarget you give files, load options and config and you get X. BuildXTarget does all the work.

with NewState you have to build the program yourself, or the previous state yourself. NewState only builds the additional information on top of its arguments, so you have to chain loadprogram.Do and NewState for every state you want. This gives more control, but in most places (tests, executables) you just want BuildXTarget. NewState will be more useful when/if we don't specify a tool and argot just detects what analyses it needs to run and what states it needs to build.

That's fair. How about renaming each BuildXTarget function to <package>.NewDefaultState?

There is still a difference that the NewDefaultState accepts an already built program as input (because it's really here only for tests) whereas the BuildXTarget takes in patterns as input, and then builds the program.

Ended up removing the BuildXTarget functions an instead introducing a result monad to chain the state building operations. Still have some cleanup to do, but this simplifies the interface.

samarth-aws · 2024-11-22T19:24:03Z

analysis/backtrace/backtrace_taint_test.go

-	log := config.NewLogGroup(cfg)
-	state, err := dataflow.NewInitializedAnalyzerState(program, lp.Pkgs, log, cfg)
+	lp.Config.SlicingProblems = []config.SlicingSpec{{BacktracePoints: lp.Config.TaintTrackingProblems[0].Sinks}}
+	state, err := dataflow.NewDefault(lp.Config, lp.Prog, lp.Pkgs)


Here's how I'd like this API to work using a dependency-injection style:

cfgState, err := config.NewState(...) progState, err := analysistest.NewProgramState(cfgState, testfsys, ...) ptrState, err := ptr.NewState(progState) dfState, err := dataflow.NewState(ptrState) res, err := backtrace.Analyze(dfState)

We may even be able to merge the implementation of analysistest.LoadTest into loadprogram.NewState, but that would mean that loadprogram.NewState would need a parameter that implements the filesystem interface which may be a bit much...

I think without a result monad the chaining of those operations where each returns an error becomes very verbose.
Also, one option would be to make the config.State contain all the arguments for the next NewState, i.e. loadprogram.NewState.

victornicolet added 3 commits November 22, 2024 12:12

Refactor internal state Part 1.

48c2a3f

Refactor internal state Part 2.

7e489d2

Separete pointer and whole program state, cleanup dead code.

2988011

victornicolet requested a review from samarth-aws November 22, 2024 18:05

samarth-aws reviewed Nov 22, 2024

View reviewed changes

victornicolet force-pushed the state-refactor-pt1 branch from 0fca836 to 0940fbd Compare November 22, 2024 22:40

Using a result monad, removing the builders.

06ad8f8

victornicolet force-pushed the state-refactor-pt1 branch from 0940fbd to 06ad8f8 Compare November 22, 2024 22:41

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

State refactor pt1 #118

State refactor pt1 #118

victornicolet commented Nov 22, 2024

samarth-aws Nov 22, 2024

samarth-aws Nov 22, 2024

samarth-aws Nov 22, 2024

samarth-aws Nov 22, 2024

samarth-aws Nov 22, 2024

samarth-aws Nov 22, 2024

samarth-aws Nov 22, 2024

samarth-aws Nov 22, 2024

victornicolet Nov 22, 2024

samarth-aws Nov 22, 2024

victornicolet Nov 22, 2024

samarth-aws Nov 22, 2024

victornicolet Nov 22, 2024

victornicolet Nov 22, 2024

samarth-aws Nov 22, 2024

victornicolet Nov 22, 2024

State refactor pt1 #118

Are you sure you want to change the base?

State refactor pt1 #118

Conversation

victornicolet commented Nov 22, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment