Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Feature: Added creating of Directed Acyclic Graphs (DAG) to existing DAG Driver #433

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

NotAnAddictz
Copy link
Contributor

Summary

Previously, in PR #383 , we added functionality for sequential invocation for each function. This commit adds on to this existing feature, making every function act as an entry point and create a DAG structure for each function based on the width and depth distributions in data/traces/example/dag_structure.xlsx and invoke it according to the frequency of the entry functions.

Implementation Notes ⚒️

  • Added various helper functions to create and manage the DAG structure.
  • Added functionality to download the sampled_150 folder containing the folders for each group of functions if required.
  • Tweaked the generation of specifications for each function to cover the highest possible invocation frequencies/min
  • Wrapped the functions into Nodes to facilitate DAG generation
  • Added parameter entriesWritten to functionsDriver to ensure all invocations are written in the output file
  • Added a retry limit of 1 for each function in the DAG.
    image
  • All DAG Structures will not have duplicate functions inside it, and is populated by randomly chosen functions in the function list.

External Dependencies 🍀

  • N/A

Breaking API Changes ⚠️

  • N/A

Simply specify none (N/A) if not applicable.

@cvetkovic
Copy link
Contributor

@leokondrashov: Is this still relevant or we close this PR?

@wanghanchengchn
Copy link
Contributor

@leokondrashov: Is this still relevant or we close this PR?

I apologize for the late reply. I will review this pull request!

@wanghanchengchn
Copy link
Contributor

Dear @cvetkovic,

The current version looks good to me. However, due to my limited experience in reviewing pull requests, I would greatly appreciate it if you could provide us with some feedback when you have a moment. Thank you very much!

cmd/loader.go Outdated Show resolved Hide resolved
cmd/loader.go Outdated Show resolved Hide resolved
data/traces/example/dag_structure.xlsx Outdated Show resolved Hide resolved
pkg/config/parser.go Outdated Show resolved Hide resolved
pkg/driver/trace_driver.go Outdated Show resolved Hide resolved
cvetkovic
cvetkovic previously approved these changes Oct 4, 2024
Copy link
Contributor

@cvetkovic cvetkovic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM. @leokondrashov If good, you can proceed with merging.

@cvetkovic
Copy link
Contributor

@wanghanchengchn Just fix the errors linter reports.

@wanghanchengchn
Copy link
Contributor

Thank you! Dear @NotAnAddictz, could you please address the failed checks? Additionally, this branch is out-of-date with the base branch. Kindly rebase on the main branch and verify that all checks are passing. Thank you!

Copy link
Contributor

@leokondrashov leokondrashov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good to me codewise. But I'd like to have documentation for the feature in the repo, not only in the PR description.

I think the linter problem is not caused by you, but it's easy to fix by changing Fatalf to Fatal.

cmd/config.json Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
docs/configuration.md Outdated Show resolved Hide resolved
go.mod Outdated Show resolved Hide resolved
data/traces/example/dag_structure.csv Outdated Show resolved Hide resolved
pkg/generator/dag_generation.go Show resolved Hide resolved
docs/loader.md Show resolved Hide resolved
pkg/generator/dag_generation.go Outdated Show resolved Hide resolved
var width,depth int
DAGDistribution := generateCDF(fmt.Sprintf("%s/dag_structure.csv", config.TracePath))
totalLinkedList := []*list.List{}
for _, function := range functions {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So, you generate the DAG for each function in the trace, right? But then, if we have > 1 function in the DAG, we would have functions that are invoked from different DAGs. It doesn't seem to be appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, after discussing with Hancheng, I will change the workflow to generate a single DAG only, based on the given parameters. And add a parameter to determine the entry function in the config file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why only one? IIRC, the DAGs are quite small on average. So even if we have 2k functions deployed, it would use only a handful. Can you continue to generate DAGs until it won't be able to fit another one? That seems more appropriate.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the suggestion. Have made it continuously generate DAGs. The entry function will start at index 0, with the next DAG's entry function being the next unused function. (ie: DAG 1 uses f(0) -> f(1) ->f(2). DAG 2 will have entry function f(3)). Invocation Frequencies and IAT will follow the respective entry function. DAGs will have different shapes based on dag_structure.csv unless user specifies with EnableDAGDataset.

pkg/generator/dag_generation.go Outdated Show resolved Hide resolved
Copy link
Contributor

@leokondrashov leokondrashov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The code looks good to me. I have a couple of suggestions for tests: failure to generate, correct depth and width of bigger DAGs with multiple branches, reading and generating the sizes from trace file, creation of several DAGs.

@NotAnAddictz
Copy link
Contributor Author

Thanks for the suggestions! Have added tests to generate from trace, multiple DAG generation of bigger DAGs (width = 10, depth 5)

Copy link
Contributor

@leokondrashov leokondrashov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please fix couple minor things

Comment on lines 656 to 664
if d.Configuration.LoaderConfiguration.AsyncMode {
sleepFor := time.Duration(d.Configuration.LoaderConfiguration.AsyncWaitToCollectMin) * time.Minute

log.Infof("Sleeping for %v...", sleepFor)
time.Sleep(sleepFor)

d.writeAsyncRecordsToLog(globalMetricsCollector)
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Duplicated lines, look above

node = node.Next()
}
atomic.AddInt64(metadata.FunctionsInvoked, numberOfFunctionsInvoked)
if success {
atomic.AddInt64(metadata.SuccessCount, 1)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This now counts successful branch execution, not DAG or functions. I don't think that would be the correct behaviour.

Copy link
Contributor Author

@NotAnAddictz NotAnAddictz Nov 28, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Noted, have changed to reflect successful functions invoked.

RecordOutputChannel: invocationRecordOutputChannel,
AnnounceDoneWG: announceDone,
}

announceDone.Add(1)
testDriver.invokeFunction(metadata)
if !(successCount == 1 && failureCount == 0) {
announceDone.Wait()
if !(functionsInvoked == 3 && failureCount == 0) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Consider adding the successCount test as well. I think it might be handled wrongly (see previous comment).

@@ -135,7 +136,7 @@ func TestInvokeFunctionFromDriver(t *testing.T) {

testDriver := createTestDriver()
var failureCountByMinute = make([]int64, testDriver.Configuration.TraceDuration)

var functionsInvoked int64
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please also add it into test conditions.

@@ -33,7 +33,10 @@
| MetricScrapingPeriodSeconds | int | > 0 | 15 | Period of Prometheus metrics scrapping |
| GRPCConnectionTimeoutSeconds | int | > 0 | 60 | Timeout for establishing a gRPC connection |
| GRPCFunctionTimeoutSeconds | int | > 0 | 90 | Maximum time given to function to execute[^5] |
| DAGMode | bool | true/false | false | Sequential invocation of all functions one after another |
| DAGMode | bool | true/false | false | Generates DAG workflows iteratively with functions in TracePath [^7]. Frequency and IAT of the DAG follows their respective entry function, while Duration and Memory of each function will follow their respective values in TracePath. |
| EnableDAGDataset | bool | true/false | true | Generate width and depth from data/traces/example/dag_structure.csv[^8] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update that the generation will take the .csv file from the trace path, not from that specific one. This one is the sample, not the data that should be used in real experiments.

Copy link
Contributor

@leokondrashov leokondrashov left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM now. Thank you, YiShen, for patiently following all my suggestions.

@cvetkovic We would need to plan the merge around the merge of RPS mode. Can you assist in that?

@NotAnAddictz Unfortunately, this might require another rebase of the trace driver. But it should be pretty small because the RPS mode mostly changes the functionsDriver, while yours is mostly in the invoker part.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants