Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Investigate performance of Score Runs #270

Open
MrSerth opened this issue Jan 4, 2023 · 8 comments
Open

Investigate performance of Score Runs #270

MrSerth opened this issue Jan 4, 2023 · 8 comments
Labels
deployment Everything related to our production environment enhancement New feature or request help wanted Extra attention is needed

Comments

@MrSerth
Copy link
Member

MrSerth commented Jan 4, 2023

Some score runs triggered in CodeOcean are notoriously slow and take quite some time to finish. This led to some teachers spending a lot of time combining various test cases to minimize scoring results. One of the optimizations taken is to squash all test cases into one file, don't use the linter (in Python) or don't use a testing framework at all. Especially putting all tests into one file is somewhat understandable, as in the current workflow test files get executed sequentially.

Obviously, we want to optimize our tool rather than putting more efforts on teachers to optimize their tests. Therefore, before taking any specific steps, we should get a better understanding of the score runs and actually collect some profiling data. The data could help us answer the following questions:

How much time is needed ...

  • in CodeOcean
    • to save an exercise before the actual scoring request is issued
    • to collect files
    • to request a runner
    • to send files to Poseidon
  • in Poseidon
    • to copy files to an allocation in Nomad
    • to prepare an execution
  • in the allocation
    • to compile the code (if applicable, e.g., for Java)
    • to execute the actual test cases
  • in CodeOcean
    • to parse the scoring result using the RegEx
    • store the output of the testrun (and other post-processing tasks)
    • to forward the response

Based on these numbers, we could identify the main pain points to tackle them individually.

@MrSerth MrSerth added enhancement New feature or request help wanted Extra attention is needed deployment Everything related to our production environment labels Jan 4, 2023
@mpass99
Copy link
Collaborator

mpass99 commented Jan 30, 2023

I see two methods of collecting the required data:

  • Add log statements including the duration for the individual parts; Manually collect data of interest.
  • Add Influx client for CodeOcean (or use the Prometheus endpoint?); Send the duration data for the individual parts to our Influx server; Create appropriate panels.

I prefer the second option as it allows easy continuous evaluations although it requires more effort now. What do you think?

@mpass99
Copy link
Collaborator

mpass99 commented Jan 30, 2023

* in CodeOcean
  
  * to request a runner
  * to send files to Poseidon

* in Poseidon
  
  * to copy files to an allocation in Nomad
  * to prepare an execution

We already collect this data with our Poseidon influxdb Middleware. Do you want to collect additional data? Such as the network delay or the Poseidon overhead of the Nomad Copy Files process?

@MrSerth
Copy link
Member Author

MrSerth commented Feb 1, 2023

I like the idea 👍, let's have a look at Sentry's Distributed Tracing to capture the data and proceed with our evaluation.

@MrSerth
Copy link
Member Author

MrSerth commented Feb 9, 2023

The Distributed Tracing on Sentry works so far and traces are associated across CodeOcean and Poseidon. However, we don't have an instrumentation for the WebSocket part in CodeOcean yet, and therefore are missing the corresponding span in Poseidon. Hence:

  • Add custom span around WebSocket in CodeOcean
  • Ensure the TraceID is sent along the WebSocket opening from CodeOcean to Poseidon (to associate the corresponding request)
  • Add support for JavaScript Tracing (and keep trace ID between "create submission" and "code execution")
  • Add and configure Sentry Relay

@MrSerth
Copy link
Member Author

MrSerth commented Feb 10, 2023

I created PR openHPI/codeocean#1536 tackling the first two aspects of my list above.

@mpass99
Copy link
Collaborator

mpass99 commented Feb 15, 2023

Additionally, we want to gain insights of the span nomad.execute.exec that is describing the Nomad execute request. To do so, we produce a special string in the bash command, scan the output for this string and convert it into a Sentry span.

@MrSerth
Copy link
Member Author

MrSerth commented May 12, 2023

I've added support for the JavaScript-based Sentry library for CodeOcean; those changes have been integrated. The final step is to finish the Sentry Relay, as tracked by #370.

@mpass99
Copy link
Collaborator

mpass99 commented Sep 11, 2023

We have performed our first analysis. Leading to further steps.

Example Performance Breakdown

Example Performance Breakdown

Sentry Performance Measurement

From the 24th to the 28th of August.

Next steps

Until the next analysis we want to complete the issues:

Additionally, we want to complete these issues:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
deployment Everything related to our production environment enhancement New feature or request help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

2 participants