Skip to content

3.7.3.84

Compare
Choose a tag to compare
@PalNilsson PalNilsson released this 30 Apr 11:49
· 415 commits to master since this release
4b75cba
  • Replaced all usages of curl with python native urllib
    • Trace reports, panda server interactions (getJob, updateJob, getProxy, queuedata and transform downloads), as well as prmon dictionary upload to https://pilot.atlas-ml.org/
    • This change is important both for attempting to reduce the number of non-responsive pilots (esp. following trace service curl upload) and for resolving EL9 complications (Tokyo)
  • Corrected cpu model reporting in cpuconsumptionunit string on ARM
    • Previously, UNKNOWN was given as cpu model name
    • Note: cache size is not reported in this case as it is not available in user space
    • Requested by I. Glushkov
  • Added GPU info from prmon to job metrics
    • E.g. “GPU_name=NVIDIA_A100-SXM4-40G nGPU=1”
    • To be used by monitoring
    • Requested by T. Korchuganova, T. Maeno
  • Now aborting stage-in loop in case graceful stop bit has been set
    • Previously, the stage-in thread would keep running until finished which unnecessarily delayed the termination of the pilot
    • The graceful stop bit gets set e.g. when there is an unexpected exception thrown
  • Added intersect value from PSS+SWAP fit to job metrics
  • Now performing CVMFS checks at the beginning of the pilot
  • ATLAS
    • Active monitoring of remote file open verification script
      • Now able to abort (e.g.) if lsetup is taking too long time
  • Support for event service with AthenaMT
  • Housekeeping
    • Additional pilot modules were processed with pylint etc
  • Bug fixes
    • Rubin
      • Removed memory monitoring files from looping job algorithms (problem reported by W. Guan)
    • Corrected exception handling when socket.gethostbyaddr() is used
    • Corrected the replica sorting algorithm, which did not sort replicas according to read_lan
    • ATLAS
      • Correction for the case where looping check had not run (thus not set any internal timings) before job suspension occurred
        • Lead to problems with some BOINC jobs

Code contributions from P Nilsson, O. Freyermuth, J. Esseiva