Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make upload error reporting more consistent #1520

Closed
Tracked by #1450
matt-codecov opened this issue Apr 5, 2024 · 7 comments
Closed
Tracked by #1450

make upload error reporting more consistent #1520

matt-codecov opened this issue Apr 5, 2024 · 7 comments
Assignees
Labels
chore platform-nicetohave Simple changes to improve code quality, error visibility, etc in the platform team

Comments

@matt-codecov
Copy link

catch more exceptions during upload processing

upload_processor.py's process_individual_report() is supposed to call build_report_from_raw_content(), get a ProcessingResult back, and pass it into update_upload_with_processing_result(). if the ProcessingResult contains an error, an UploadError record is created in the db with an error code.

build_report_from_raw_content() handles three specific error cases: FileNotInStorageError when the raw upload is missing, ReportExpiredException, and ReportEmptyError. if one of those occurs, it will return a ProcessingResult with an appropriate error filled in. if any other exception occurs, it is not caught. it bubbles up through process_individual_report() until it is caught in this catchall, which sets the upload state to error (good) but does not create an UploadError database record (bad). without an UploadError in the database, gazebo has to display "unknown error" for this upload when a better message would be something like "processing failed".

so, we should add an except Exception: catchall inside of build_report_from_raw_content() which will return a ProcessingResult with an error code like "failure_during_processing". then teach api and gazebo about that value so that gazebo can display "processing failed" for it

shared enum for upload error code

gazebo and codecov-api agree on a list of possible upload error codes:

worker uses the same values, but it just hardcodes magic strings:

the python enum should probably live in shared and worker should use it too

@matt-codecov matt-codecov added chore platform-nicetohave Simple changes to improve code quality, error visibility, etc in the platform team labels Apr 5, 2024
@Adal3n3
Copy link

Adal3n3 commented Jul 11, 2024

@rohan-at-sentry @thomasrockhu-codecov How can we move this issue forward? Without this work we can't detect the unknown error.

@rohan-at-sentry
Copy link

Let's aim to tackle this early next quarter. Added to 1st sprint of Q3 2024.

@Adal3n3
Copy link

Adal3n3 commented Aug 15, 2024

@rohan-at-sentry @aj-codecov @trent-codecov We are now in Q3. Can we move this issue forward?

@matt-codecov
Copy link
Author

i'll deploy codecov/codecov-api#820 and codecov/worker#703 next week

with them deployed, we can search for the following log strings:

  • Unknown error when fetching raw report from storage
  • Unknown error when processing raw upload

they will have information about exceptions we aren't currently catching, and where appropriate we can assign them new error codes. unfortunately creating new error codes is a pain. to make sure it is safe, you must merge in this order:

  • gazebo (create the new value)
  • shared (create the new value)
  • codecov-api (updating shared should be enough)
  • worker (update shared + catch the new exception)

my focus is just on making sure descriptive error codes are available to gazebo, i'm not touching how gazebo counts/displays them. it appears that happens in two places:

  • here, where we count errors of each type
  • here, where we choose what text to display for each upload depending on the error code

i don't know how the product wants to handle/display new error codes so i'm leaving that stuff alone

@matt-codecov
Copy link
Author

@matt-codecov
Copy link
Author

matt-codecov commented Nov 20, 2024

i am getting some PRs up to resolve the highest-firing unknown errors codecov/worker#908

i have also starting adding a new PROCESSING_TIMEOUT code. when a timeout occurs, worker will attempt to set it as the upload error code before exiting. we might get forcibly terminated before we can save everything, but it might handle some of these cases at least

the first two PRs to create the new code:

next, i'll have to update API to use the new version of shared

  • placeholder

finally, i'll have to update worker to record the new error code

  • placeholder

we're in a code freeze so i will have to follow up on these later

@matt-codecov
Copy link
Author

(open on VPN) https://l.codecov.dev/UHKdI1 i wrote a doc about how upload errors are set/presented and the process for creating new ones

we are in a code freeze right now. i will follow up with my current PRs later. closing this issue since the process is documented now

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
chore platform-nicetohave Simple changes to improve code quality, error visibility, etc in the platform team
Projects
None yet
Development

No branches or pull requests

3 participants