Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Refactor paasta logs filtering for performance #3988

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

yaroliakh
Copy link
Contributor

This does two things.

  1. reorders paasta_*_passes_filter evaluation logic to be more resource efficient and fast:
  • running cProfile on a paasta logs that retrieves logs from a 5 minutes time range for a verbose stream:
Thu Nov 21 09:17:28 2024    profile_res1

         314520328 function calls (314346707 primitive calls) in 190.683 seconds

   Ordered by: internal time
   List reduced from 9965 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  6298604   28.013    0.000   43.563    0.000 isotime.py:73(parse_time)
  6298667   19.141    0.000   19.141    0.000 decoder.py:343(raw_decode)
    70692   15.875    0.000   15.875    0.000 {method 'read' of '_ssl._SSLSocket' objects}
  6298584    9.933    0.000  129.460    0.000 logs.py:311(paasta_app_output_passes_filter)
  6298604    9.566    0.000   15.773    0.000 isodates.py:119(parse_date)
 25208656    9.492    0.000    9.492    0.000 {method 'match' of 're.Pattern' objects}
  6298584    9.182    0.000   11.240    0.000 logs.py:254(check_timestamp_in_range)
  6298667    7.409    0.000   31.491    0.000 decoder.py:332(decode)
  6298604    7.006    0.000   68.744    0.000 isodatetime.py:16(parse_datetime)
   490464    5.683    0.000    5.683    0.000 {method 'decompress' of 'zlib.Decompress' objects}
 12597228    5.148    0.000    5.148    0.000 {method 'groupdict' of 're.Match' objects}
  6298585    4.937    0.000   53.965    0.000 readers.py:84(read_for_datetime)
  6298667    4.932    0.000   38.038    0.000 __init__.py:299(loads)
        1    4.158    4.158  189.258  189.258 logs.py:1235(print_logs_by_time)
   490784    3.342    0.000    3.342    0.000 {built-in method zlib.crc32}
  6298688    3.074    0.000   29.595    0.000 gzip.py:388(readline)
  6300436    2.423    0.000   38.955    0.000 {method 'readline' of '_io.BufferedReader' objects}
 25219401    2.291    0.000    2.291    0.000 {method 'replace' of 'str' objects}
  6298604    2.247    0.000    2.247    0.000 {method 'quantize' of 'decimal.Decimal' objects}
  6298692    1.878    0.000    2.685    0.000 _compression.py:12(_check_not_closed)
  • running the same command on update paasta logs.py:
Fri Nov 22 07:44:21 2024    profile_res2

         154490954 function calls (154307204 primitive calls) in 95.599 seconds

   Ordered by: internal time
   List reduced from 9954 to 20 due to restriction <20>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
  6298667   17.367    0.000   17.367    0.000 decoder.py:343(raw_decode)
    59552   16.302    0.000   16.302    0.000 {method 'read' of '_ssl._SSLSocket' objects}
  6298667    6.448    0.000   28.303    0.000 decoder.py:332(decode)
   490464    5.470    0.000    5.470    0.000 {method 'decompress' of 'zlib.Decompress' objects}
  6298585    4.394    0.000   50.692    0.000 readers.py:84(read_for_datetime)
  6298667    4.143    0.000   33.867    0.000 __init__.py:299(loads)
  6298584    4.009    0.000   38.426    0.000 logs.py:311(paasta_app_output_passes_filter)
        1    3.597    3.597   94.242   94.242 logs.py:1192(print_logs_by_time)
   490784    3.304    0.000    3.304    0.000 {built-in method zlib.crc32}
 12611560    2.791    0.000    2.791    0.000 {method 'match' of 're.Pattern' objects}
  6298688    2.626    0.000   26.326    0.000 gzip.py:388(readline)
  6300550    2.184    0.000   37.083    0.000 {method 'readline' of '_io.BufferedReader' objects}
  6298692    1.574    0.000    2.236    0.000 _compression.py:12(_check_not_closed)
   490590    1.484    0.000    6.897    0.000 response.py:535(read)
  6298585    1.397    0.000   52.089    0.000 readers.py:118(get_log_reader)
  6343872    1.247    0.000    1.247    0.000 {method 'decode' of 'bytes' objects}
 12597350    1.102    0.000    1.102    0.000 {method 'end' of 're.Match' objects}
11530811/11527085    1.081    0.000    1.132    0.000 {built-in method builtins.len}
   490676    0.906    0.000   18.212    0.000 gzip.py:454(read)
  6300779    0.893    0.000    0.893    0.000 {method 'strip' of 'str' objects}

As can be seen, new version is twice as fast than older one, mainly because of eliminating frequent and costly isotime, isodatecalls.

  1. Updating isodate to the latest as it should include a fix for https://github.com/gweis/isodate/issues/53. Also it would fix current paasta logs crashes from the same bug, e.g. at
    dt = isodate.parse_datetime(timestamp)
    where it's not properly handled.

@yaroliakh yaroliakh requested a review from a team as a code owner November 22, 2024 17:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant