New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

A quick script to check if a job took longer than x time to run and update python in precommit #930

Open

EmanElsaban wants to merge 3 commits into master from u/emanelsabban/quick-script-to-check-jobs-running

Contributor

EmanElsaban commented Nov 16, 2023

This script outputs the jobs that takes longer than x time to run. We can specify the time using --time. It's an easy way to get the jobs than to look at the Tron UI

EmanElsaban requested review from nemacysts, KaspariK and bobokun

November 16, 2023 21:03

EmanElsaban force-pushed the u/emanelsabban/quick-script-to-check-jobs-running branch 4 times, most recently from 87e47d9 to 3f7fe09 Compare

November 16, 2023 21:54

nemacysts reviewed

View reviewed changes

.pre-commit-config.yaml Outdated

    
                      language_version: python3.6

                      args: [--target-version, py36]

                      language_version: python3.8

                      args: [--target-version, py38]

Member

nemacysts Nov 17, 2023

I think we might want this to still use [--target-version, py36] so that it doesn't try to format anything in an incompatible way

Contributor Author

EmanElsaban Nov 21, 2023 •

edited

Loading

its failed the test below cause:

ERROR: InvocationError for command '/home/runner/work/Tron/Tron/.tox/py36/bin/pre-commit run --all-files' (exited with code 1)

but when I try to run this locally I don't have py36 on this branch, its .tox/py38/...

Contributor Author

EmanElsaban Nov 21, 2023

then when I run make test after make clean:

(py38) emanelsabban@dev208-uswest1adevc:~/pg/Tron$ make test
tox -e py36
py36 create: /nail/home/emanelsabban/pg/Tron/.tox/py36
ERROR: InterpreterNotFound: python3.6
________________________________________________________________________________________________________ summary _________________________________________________________________________________________________________
ERROR:  py36: InterpreterNotFound: python3.6
make: *** [Makefile:64: test] Error 1

Member

nemacysts Nov 27, 2023

oh, that's unfortunate :/

I'm just a tad worried about breaking the py36 build and/or somehow allowing py36+ code to sneak in and break at runtime (on py36)

tron/bin/check_job_exceeding_time.py Outdated

		log = logging.getLogger("check_exceeding_time")


		def parse_cli():

Member

nemacysts Nov 17, 2023

the yelpy naming convention for this is usually parse_args :)

probably a good idea to have new code by typed as well, so we probably want this to be something like:

Suggested change

      
            def parse_cli():
          
            def parse_args() -> argparse.Namespace:

Member

nemacysts Nov 17, 2023

and depending on how easy it is to trace the tron code to get the type of some of these things, it might be worth using reveal_type (see https://mypy.readthedocs.io/en/stable/cheat_sheet_py3.html#when-you-re-puzzled-or-when-things-are-complicated) to have mypy tell you what it thinks the types are

tron/bin/check_job_exceeding_time.py Outdated

		check_if_time_exceeded(job_runs, time_limit, result)


		def main():

Member

nemacysts Nov 17, 2023

Suggested change

      
            def main():
          
            def main() -> Optional[int]:

would match what sys.exit expects :)

tron/bin/check_job_exceeding_time.py Outdated

+                  if args.job is None:
+                      jobs = client.jobs(include_job_runs=True)
+                      for job in jobs:
+                          check_job_time(job=job, time_limit=args.time_limit, result=result)

Member

nemacysts Nov 17, 2023

non-blocking suggestion: could we make this a pure function by having it return a list of affected runs?

(and then do something like results.extend(check_job_time(...))?

tron/bin/check_job_exceeding_time.py Outdated

Member

nemacysts Nov 17, 2023

super-nit: should this be something like get_jobs_exceeding_runtime? usually check_* functions are intended to be used as monitoring checks and will return non-zero if things are failing checks (versus the current behavior here, which is just printing)

tron/bin/check_job_exceeding_time.py Outdated



		def check_if_time_exceeded(job_runs, job_expected_runtime, result):
		states_to_check = {"queued", "scheduled", "cancelled", "skipped"}

Member

nemacysts Nov 17, 2023

nit: we repeat this - might be worth extracting this to a named constant

tron/bin/check_job_exceeding_time.py Outdated

Comment on lines 48 to 51

+                      and job_run.get(
+                          "state",
+                          "unknown",
+                      )

Member

nemacysts Nov 17, 2023

do we need to do this check in both places? looks like this check is also being done by the function that calls this

Contributor Author

EmanElsaban Nov 21, 2023

I don't think so, Idk why I initially thought the state might change so I added it in both places

tron/bin/check_job_exceeding_time.py Outdated

Comment on lines 55 to 56

		if duration_seconds and duration_seconds > job_expected_runtime:
		return True

Member

nemacysts Nov 17, 2023

nit:

Suggested change

      
                    if duration_seconds and duration_seconds > job_expected_runtime:
          
                        return True
          
                    return duration_seconds and duration_seconds > job_expected_runtime:

tron/bin/check_job_exceeding_time.py Outdated

Comment on lines 61 to 65

+                  job_runs = sorted(
+                      job.get("runs", []),
+                      key=lambda k: (k["end_time"] is None, k["end_time"], k["run_time"]),
+                      reverse=True,
+                  )

Member

nemacysts Nov 17, 2023

just curious: are we sorting so that the output is ordered later on? should we sort at printing time so that we only need to sort once if so?

tron/bin/check_job_exceeding_time.py Outdated

+                      job = client.job_runs(job_url)
+                      check_job_time(job=job, client=client, url_index=url_index, result=result)
+                  if result is None:

Member

nemacysts Nov 17, 2023

should this be if not result? we default result to [] - which is not None

EmanElsaban force-pushed the u/emanelsabban/quick-script-to-check-jobs-running branch 2 times, most recently from 6ede8b6 to 0275846 Compare

November 21, 2023 22:18

EmanElsaban requested a review from nemacysts

November 21, 2023 22:41

EmanElsaban force-pushed the u/emanelsabban/quick-script-to-check-jobs-running branch from 0275846 to 00287f2 Compare

November 21, 2023 22:51


          A quick script to check if a job took longer than x time to run and u…

9dcc4cb

…pdate python in precommit

EmanElsaban force-pushed the u/emanelsabban/quick-script-to-check-jobs-running branch from 00287f2 to 9dcc4cb Compare

November 21, 2023 22:57

nemacysts previously approved these changes

View reviewed changes

Member

nemacysts left a comment

i've got some minor comments re: the types - but if i'm blocking you feel free to ignore them as they're not particularly serious and we can always come back and improve them :)

tron/bin/get_jobs_exceeding_runtime.py

		return args


		def check_if_time_exceeded(job_runs, job_expected_runtime) -> list:

Member

nemacysts Nov 27, 2023

Suggested change

      
            def check_if_time_exceeded(job_runs, job_expected_runtime) -> list:
          
            def check_if_time_exceeded(job_runs: List[Dict[str, Any]], job_expected_runtime: int) -> List[str]:

i'm sort of guessing about the job_runs typing here - that said, if we're just using this script for the current project it's probably fine to be a little lax with the types and just do the easy ones (e.g.,

Suggested change

      
            def check_if_time_exceeded(job_runs, job_expected_runtime) -> list:
          
            def check_if_time_exceeded(job_runs: List, job_expected_runtime: int) -> List[str]:

tron/bin/get_jobs_exceeding_runtime.py

		return result


		def is_job_run_exceeding_expected_runtime(job_run, job_expected_runtime) -> bool:

Member

nemacysts Nov 27, 2023

Suggested change

      
            def is_job_run_exceeding_expected_runtime(job_run, job_expected_runtime) -> bool:
          
            def is_job_run_exceeding_expected_runtime(job_run: Dict[str, Any], job_expected_runtime: int) -> bool:

tron/bin/get_jobs_exceeding_runtime.py

		return False


		def check_job_time(job, time_limit) -> list:

Member

nemacysts Nov 27, 2023

Suggested change

      
            def check_job_time(job, time_limit) -> list:
          
            def check_job_time(job: Dict[str, Any], time_limit: int) -> List[str]:

bobokun previously approved these changes

View reviewed changes

EmanElsaban added 2 commits

February 8, 2024 14:16


          Merge branch 'master' of github.com:Yelp/Tron into u/emanelsabban/qui…

4781c30

…ck-script-to-check-jobs-running


          remove language_version

3e4b319

Member

nemacysts commented May 10, 2024

@EmanElsaban is this still something we're interested in merging?

EmanElsaban dismissed stale reviews from bobokun and nemacysts via

3e4b319

November 25, 2024 16:55

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet