Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

3.8.1.66 #140

Merged
merged 135 commits into from
Sep 9, 2024
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
135 commits
Select commit Hold shift + click to select a range
ed223f1
New version
Jul 5, 2024
3cdbbc8
Updated and corrected logserver handling from pilot arguments
Jul 5, 2024
3e50cbf
Refactored collect_zombies() and moved recursion
Jul 8, 2024
f04f4b4
Pylint updates. Improved error handling
Jul 8, 2024
007a48c
Pylint updates.
Jul 8, 2024
2d83130
Pylint updates.
Jul 8, 2024
d1a42a7
Patch for unset resource type
Jul 8, 2024
537d383
Multi-job PUSH updates
Jul 10, 2024
c78f3c0
Version update after merge with special patch release
Jul 10, 2024
1bc1b25
Patches for complete state bug
Jul 11, 2024
d48e98e
Pylint updates
Jul 12, 2024
7e31347
Pylint updates
Jul 12, 2024
0b3dd12
Added minramcount
Jul 15, 2024
723e1ad
Added memkillgrace
Jul 16, 2024
215a351
Preliminary support for resource types dictionary
Jul 16, 2024
89edec0
Added function is_command_available. Added /usr/sbin path to ifconfig…
Jul 16, 2024
9b5713e
Added function is_command_available. Added /usr/sbin path to ifconfig…
Jul 16, 2024
51fd57a
Updated log message
Jul 16, 2024
1a57e88
Refactoring
Jul 16, 2024
76ac587
Preliminary support for OIDC token in new urllib request function
Jul 16, 2024
7192cf8
Updated comment
Jul 16, 2024
5228d03
Further refactoring
Jul 16, 2024
fb3a75c
Further refactoring
Jul 16, 2024
a78b248
Further refactoring
Jul 16, 2024
9717415
Corrected bad log message (pylint error)
Jul 16, 2024
7f359b3
Corrected bug
Jul 16, 2024
447a1c1
Removed unused functions
Jul 17, 2024
770cb27
Various errors and pylint updates
Jul 17, 2024
2cc0a76
Removed unused function that had a call to a non-existing function
Jul 17, 2024
179742a
Imports now in alphabetic order
Jul 17, 2024
163a23c
Pylint updates
Jul 17, 2024
d4012c8
Pylint updates
Jul 17, 2024
e26720f
Pylint updates
Jul 17, 2024
d35b6d9
Pylint updates
Jul 17, 2024
2f33163
Pylint updates
Jul 17, 2024
e90292e
Pylint updates
Jul 17, 2024
97f82b4
Pylint updates
Jul 18, 2024
60b8b7b
Pylint updates, removed traces errors
Jul 18, 2024
9108b5a
Pylint updates
Jul 18, 2024
21aed63
Pyright updates
Jul 18, 2024
87efcee
Cleanup
Jul 18, 2024
ef34c9b
Sending panda=True to request2() for getJob
Jul 18, 2024
bf437c4
Removed token from debug message
Jul 18, 2024
8b362d8
Update
Jul 18, 2024
ef69e33
Update
Jul 18, 2024
329b7b3
Pylint and type hints updates
Jul 19, 2024
772dbcb
Fixed NULL handling
Jul 19, 2024
258a09e
Pylint updates
Jul 19, 2024
8ba6fab
Pylint updates
Jul 19, 2024
f1fb41e
Pylint updates
Jul 19, 2024
2e013cd
Version update
Jul 19, 2024
b9cd43b
Pylint updates
Jul 19, 2024
b29b4b0
Update
Jul 22, 2024
fd95e71
Initial support for OIDC token downloads
Jul 23, 2024
2fe3bc8
Initial support for OIDC token downloads
Jul 23, 2024
7fd4ddb
Downloading OIDC token
PalNilsson Jul 23, 2024
845b52f
Pylint updates
Jul 24, 2024
19df23f
Token testing. Hiding token from header log message.
Jul 24, 2024
44068cc
Added the token key
Jul 24, 2024
24fbd1b
Added the token key
Jul 24, 2024
c2c1c8e
Added the token key
Jul 24, 2024
c1f545b
Added the token key
Jul 24, 2024
b5cb217
Added the client name
Jul 24, 2024
6085956
Added the client name
Jul 24, 2024
32665df
Updated comment
Jul 24, 2024
86f2517
Updated Request usage
Jul 24, 2024
c779a44
Updated headers
Jul 24, 2024
ff43a88
Converting bytes to string
Jul 24, 2024
1c0ba69
Updated token key
Jul 24, 2024
c9aea27
Debugging refreshed token
Jul 24, 2024
8d7378a
Debugging refreshed token
Jul 24, 2024
734cd07
Corrected server command
Jul 24, 2024
3bb112c
Now writing correct token to disk
Jul 24, 2024
37db775
Now writing correct token to disk
Jul 24, 2024
b09f10d
Now hiding token key as well. Some cleanup done as well
Jul 24, 2024
b7de6a8
Cleanup
Jul 24, 2024
62e6cd8
Using the final token refresh frequency of one hour
Jul 24, 2024
763016b
Cleanup
Jul 24, 2024
d9a434e
Cleanup
Jul 24, 2024
f528474
Now locating panda token key
Jul 25, 2024
80dbb0e
Now locating panda token key
Jul 29, 2024
ca9ccd4
Updated log message
Jul 29, 2024
5586ea4
Updated log message
Jul 29, 2024
01b005c
Unsetting OIDC_REFRESHED_AUTH_TOKEN in user environment
Jul 30, 2024
6bf9ba9
Added is_kubernetes_resource()
Jul 31, 2024
444318a
Added PREEMTPION error code, used instead of SIGTERM on Kubernetes re…
Jul 31, 2024
b8056fb
Pylint updates
Jul 31, 2024
ba96083
Pylint updates
Jul 31, 2024
f9198d2
Pylint updates
Jul 31, 2024
eb41a98
Pylint updates
Jul 31, 2024
ab9ebdc
Pylint updates
Jul 31, 2024
5413a89
Updated version
Jul 31, 2024
31133c8
Fixed problem with process. Some cleanup
Aug 1, 2024
8c56ca7
Cleaned up proxy checks
Aug 1, 2024
f1f4f69
Ingoring arcproxy lib failure. Some refactoring.
Aug 2, 2024
23e0edf
Removed useless variables
Aug 13, 2024
ba38d1f
Updated log message
Aug 13, 2024
d27609c
Removed useless variable
Aug 13, 2024
cecc342
Using EL9 container for remote file open
Aug 20, 2024
4077191
Test pre-commit
Aug 20, 2024
9a72b60
Out-commented test code
Aug 20, 2024
1fd72b2
Added rename()
Aug 21, 2024
b99dd9d
Corrected port type in several places. Updating OIDC token + renaming…
Aug 21, 2024
838d413
Out-commented dev code for proxy testing
Aug 22, 2024
4c164b2
Improved get_valid_base_urls()
Aug 22, 2024
105aaf3
Now reading base URLs from file
Aug 22, 2024
a78036a
Updated version number
Aug 22, 2024
4c9f83e
Updated version number
Aug 22, 2024
aa46361
Can now receive altStageOut from job definition
Aug 23, 2024
5404fc1
Added preliminary support for altStageOut
Aug 23, 2024
ff3518a
Added timer on gdb command
Aug 27, 2024
c155c91
Skipping useless file open
Aug 27, 2024
46b7b10
Corrected base url checks
Aug 27, 2024
4901f16
Added HOME to search location for tokens
PalNilsson Aug 27, 2024
fb3e28d
Skipping setting RUCIO_ACCOUNT for payload
PalNilsson Aug 27, 2024
55c4392
Merge remote-tracking branch 'upstream/next' into next
Aug 28, 2024
117b872
Merged with Wen's PR
Aug 28, 2024
21b7a0b
Merge remote-tracking branch 'origin/next' into next
Aug 28, 2024
cf781b6
Updated version
Aug 29, 2024
412114a
Updated execute_command_with_timeout()
Aug 29, 2024
f48c5f3
Updated execute_command_with_timeout()
Aug 29, 2024
23a8dfa
Unset of RUCIO_ACCOUNT
Aug 30, 2024
5907743
Support for site wide real-time logging activation
Aug 30, 2024
b1b0366
Merge remote-tracking branch 'origin/next' into next
Aug 30, 2024
6488127
Updated version
Aug 30, 2024
37c6919
Updated comment
PalNilsson Sep 3, 2024
e93f060
Setting job in debug mode if necessary
Sep 3, 2024
280791f
Merge remote-tracking branch 'origin/next' into next
Sep 3, 2024
ab2a03d
Added debug info
Sep 4, 2024
770fade
Added timeout to urlopen(), ten seconds
Sep 5, 2024
7006660
Increased timeout from 10 to 30s
Sep 5, 2024
fd5fa01
Some cleanup
Sep 6, 2024
265f8ef
Added option to disable updateWorkerPilotStatus calls
Sep 6, 2024
7efa6ea
Corrected loggingfile
Sep 6, 2024
9f88586
Corrected prmon setup
Sep 9, 2024
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion PILOTVERSION
Original file line number Diff line number Diff line change
@@ -1 +1 @@
3.7.9.1
3.8.1.66
3 changes: 1 addition & 2 deletions doc/components/info/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
http://www.apache.org/licenses/LICENSE-2.0

Authors:
- Paul Nilsson, [email protected], 2018
- Paul Nilsson, [email protected], 2018-24

info components
===============
Expand All @@ -23,6 +23,5 @@ info components
infoservice
jobdata
jobinfo
jobinfoservice
queuedata
storagedata
19 changes: 0 additions & 19 deletions doc/components/info/jobinfoservice.rst

This file was deleted.

3 changes: 1 addition & 2 deletions doc/components/resource/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -7,7 +7,7 @@
http://www.apache.org/licenses/LICENSE-2.0

Authors:
- Paul Nilsson, [email protected], 2018-2019
- Paul Nilsson, [email protected], 2018-24

resource components
===================
Expand All @@ -19,5 +19,4 @@ resource components
bnl
generic
nersc
summit
titan
19 changes: 0 additions & 19 deletions doc/components/resource/summit.rst

This file was deleted.

54 changes: 41 additions & 13 deletions pilot.py
Original file line number Diff line number Diff line change
Expand Up @@ -17,9 +17,9 @@
# under the License.
#
# Authors:
# - Mario Lassnig, [email protected], 2016-2017
# - Mario Lassnig, [email protected], 2016-17
# - Daniel Drizhuk, [email protected], 2017
# - Paul Nilsson, [email protected], 2017-2024
# - Paul Nilsson, [email protected], 2017-24

"""This is the entry point for the PanDA Pilot, executed with 'python3 pilot.py <args>'."""

Expand All @@ -39,29 +39,30 @@
from pilot.common.exception import PilotException
from pilot.info import infosys
from pilot.util.auxiliary import (
convert_signal_to_exit_code,
pilot_version_banner,
shell_exit_code,
convert_signal_to_exit_code
)
from pilot.util.config import config
from pilot.util.constants import (
get_pilot_version,
SUCCESS,
FAILURE,
ERRNO_NOJOBS,
PILOT_START_TIME,
FAILURE,
PILOT_END_TIME,
SERVER_UPDATE_NOT_DONE,
PILOT_MULTIJOB_START_TIME,
PILOT_START_TIME,
SERVER_UPDATE_NOT_DONE,
SUCCESS,
)
from pilot.util.cvmfs import (
cvmfs_diagnostics,
get_last_update,
is_cvmfs_available,
get_last_update
)
from pilot.util.filehandling import (
get_pilot_work_dir,
mkdirs,
store_base_urls
)
from pilot.util.harvester import (
is_harvester_mode,
Expand All @@ -72,6 +73,7 @@
get_panda_server,
https_setup,
send_update,
update_local_oidc_token_info
)
from pilot.util.loggingsupport import establish_logging
from pilot.util.networking import dump_ipv6_info
Expand Down Expand Up @@ -116,8 +118,11 @@ def main() -> int:
https_setup(args, get_pilot_version())
args.amq = None

# update the OIDC token if necessary
update_local_oidc_token_info(args.url, args.port)

# let the server know that the worker has started
if args.update_server:
if args.update_server and args.workerpilotstatusupdate:
send_worker_status(
"started", args.queue, args.url, args.port, logger, "IPv6"
) # note: assuming IPv6, fallback in place
Expand Down Expand Up @@ -160,6 +165,9 @@ def main() -> int:
)
logger.debug(f'PILOT_RUCIO_SITENAME={os.environ.get("PILOT_RUCIO_SITENAME")}')

#os.environ['RUCIO_ACCOUNT'] = 'atlpilo1'
#logger.warning(f"enforcing RUCIO_ACCOUNT={os.environ.get('RUCIO_ACCOUNT')}")

# store the site name as set with a pilot option
environ[
"PILOT_SITENAME"
Expand All @@ -171,6 +179,8 @@ def main() -> int:
f"pilot.workflow.{args.workflow}", globals(), locals(), [args.workflow], 0
)

# check if real-time logging is requested for this queue
#rtloggingtype
# update the pilot heartbeat file
update_pilot_heartbeat(time.time())

Expand All @@ -182,7 +192,7 @@ def main() -> int:
exitcode = None

# let the server know that the worker has finished
if args.update_server:
if args.update_server and args.workerpilotstatusupdate:
send_worker_status(
"finished",
args.queue,
Expand Down Expand Up @@ -357,15 +367,20 @@ def get_args() -> Any:
required=False, # From v 2.2.1 the site name is internally set
help="OBSOLETE: site name (e.g., AGLT2_TEST)",
)

# graciously stop pilot process after hard limit
arg_parser.add_argument(
"-j",
"--joblabel",
dest="job_label",
default="ptest",
help="Job prod/source label (default: ptest)",
)
arg_parser.add_argument(
"-g",
"--baseurls",
dest="baseurls",
default="",
help="Comma separated list of base URLs for validation of trf download",
)

# pilot version tag; PR or RC
arg_parser.add_argument(
Expand All @@ -385,6 +400,15 @@ def get_args() -> Any:
help="Disable server updates",
)

arg_parser.add_argument(
"-k",
"--noworkerpilotstatusupdate",
dest="workerpilotstatusupdate",
action="store_false",
default=True,
help="Disable updates to updateWorkerPilotStatus",
)

arg_parser.add_argument(
"-t",
"--noproxyverification",
Expand Down Expand Up @@ -842,7 +866,7 @@ def send_worker_status(
port: str,
logger: Any,
internet_protocol_version: str,
) -> None:
):
"""
Send worker info to the server to let it know that the worker has started.

Expand Down Expand Up @@ -956,6 +980,10 @@ def list_zombies():
# set environment variables (to be replaced with singleton implementation)
set_environment_variables()

# store base URLs in a file if set
if args.baseurls:
store_base_urls(args.baseurls)

# execute main function
trace = main()

Expand Down
14 changes: 11 additions & 3 deletions pilot/api/data.py
Original file line number Diff line number Diff line change
Expand Up @@ -80,8 +80,15 @@ class StagingClient:
# list of allowed schemas to be used for transfers from REMOTE sites
remoteinput_allowed_schemas = ['root', 'gsiftp', 'dcap', 'srm', 'storm', 'https']

def __init__(self, infosys_instance: Any = None, acopytools: dict = None, logger: Any = None,
default_copytools: str = 'rucio', trace_report: dict = None, ipv: str = 'IPv6', workdir: str = ""):
def __init__(self,
infosys_instance: Any = None,
acopytools: dict = None,
logger: Any = None,
default_copytools: str = 'rucio',
trace_report: dict = None,
ipv: str = 'IPv6',
workdir: str = "",
altstageout: str = None):
"""
Set default/init values.

Expand All @@ -106,6 +113,7 @@ def __init__(self, infosys_instance: Any = None, acopytools: dict = None, logger
self.infosys = infosys_instance or infosys
self.ipv = ipv
self.workdir = workdir
self.altstageout = altstageout

if isinstance(acopytools, str):
acopytools = {'default': [acopytools]} if acopytools else {}
Expand Down Expand Up @@ -221,7 +229,7 @@ def print_replicas(self, replicas: list, label: str = 'unsorted'):
"""
number = 1
maxnumber = 10
self.logger.info(f'{label} list of replicas: (max {maxnumber})')
self.logger.debug(f'{label} list of replicas: (max {maxnumber})')
for pfn, xdat in replicas:
self.logger.debug(f"{number}. "
f"lfn={pfn}, "
Expand Down
6 changes: 6 additions & 0 deletions pilot/common/errorcodes.py
Original file line number Diff line number Diff line change
Expand Up @@ -179,6 +179,9 @@ class ErrorCodes:
LOGCREATIONTIMEOUT = 1376
CVMFSISNOTALIVE = 1377
LSETUPTIMEDOUT = 1378
PREEMPTION = 1379
ARCPROXYFAILURE = 1380
ARCPROXYLIBFAILURE = 1381

_error_messages = {
GENERALERROR: "General pilot error, consult batch log",
Expand Down Expand Up @@ -320,6 +323,9 @@ class ErrorCodes:
LOGCREATIONTIMEOUT: "Log file creation timed out",
CVMFSISNOTALIVE: "CVMFS is not responding",
LSETUPTIMEDOUT: "Lsetup command timed out during remote file open",
PREEMPTION: "Job was preempted",
ARCPROXYFAILURE: "General arcproxy failure",
ARCPROXYLIBFAILURE: "Arcproxy failure while loading shared libraries",
}

put_error_codes = [1135, 1136, 1137, 1141, 1152, 1181]
Expand Down
Loading
Loading