Skip to content

3.6.7.10

Compare
Choose a tag to compare
@PalNilsson PalNilsson released this 18 Sep 09:02
· 732 commits to master since this release
f903d9e
  • Improved reporting of CPU consumption time
    • It was seen in I/O bound payloads that the correct CPU consumption time was not reported correctly. Pilot is now making sure there are no zero values reported
    • Reported by R. Walker
  • Migration towards using psutil module has started
    • Until now, pilot has relied on executing the ps command for process information, but this is heavy on the system if many ps commands are executed in short time
    • A. De Silva has made the psutil module available via ALRB and is setup in the wrapper with ‘lsetup psutil’ by P. Love
      Currently, there is no requirement for psutil - the pilot has a fallback to using other process info in case psutil fails to import - but this will change soon
    • Pilot is currently only using psutil to get information whether a certain process is running or not, with a fallback to /proc/{pid}
  • Added protection for failed writing of info dictionary to disk before server update
    • Curl normally uses this dictionary, but should now instead use the dictionary explicitly (converted to string)
    • Previously, the pilot would fail to inform the server, i.e. the job would become a lost heartbeat
    • The pilot might still fail before reaching this point, as it basically relies on disks with space > 0
  • Moved import of google cloud logging to beginning of real-time logging module to prevent an unexplained problem seen in Rubin jobs
    • Previously, said module was only imported when it needed to be used, but for some reasons this would occasionally lead to python locking up
    • Requested by Z. Yang (Rubin)