Skip to content

Utilities

Tadashi Maeno edited this page Sep 6, 2017 · 39 revisions

File grouping for file transfers

JobSpec provides a couple of functions which allow plugins to easily group input or output files and to keep track of status for each group. This plugin shows how those functions are used.

Methods

def get_input_file_attributes(self, skip_ready=False)

This method returns a dictionary of input file attributes. The key of the dictionary is the logical file name (LFN) of the input file and the value is a dictionary of file attributes (fsize, guid, checksum, scope, dataset, and endpoint). If skip_ready is set to True, files are ignored if they are already in ready state. Concerning file status see the next section.

def set_groups_to_files(self, id_map)

To set group information to files. id_map is a dictionary of {identifier_string: {'lfns': [LFN, ...], 'groupStatus': status_string}. Identifier_string is the identifier of the file group, which contains files with the lfns, and can be an arbitrary string. Status_string can also be an arbitrary string, but groups are ignored for file->group lookup once the status_string is set to 'failed'.

def update_group_status_in_files(self, identifier_string, status_string)

This method updates status of the group. Status_string is explained in the set_groups_to_files method.

def get_groups_of_input_files(self, skip_ready=False)

To get the list of identifier strings for the groups of input files. If skip_ready is set to True, the method returns group identifiers for input files unless they are already in ready state, which could be useful in the check_status method of preparator plugins.


Protection against double input file transfers

If multiple jobs are fetched and they use the same input files, preparatory triggers stage-in only for the first job while keeping the others on hold until input files are successfully transferred. First, file status is set to to_prepare for the first job and preparing doe the other jobs. Once the check_status method of preparator plugin returns True for a job, file status is changed to ready. If the file status changes from preparing it inherits the grouping information, which is explained in the above section, of the first job.


Logging

The following parameters are available to optimize logging in etc/panda/panda_common.cfg.

Name Description
log_level Logging level. See python doc. Can be CRITICAL, ERROR, WARNING, INFO, DEBUG (default), or NOTSET.
rotating_policy Policy for log rotation. Can be time, size, or none. time : rotation at certain timed intervals, size : rotation at a predetermined size, none : no rotation (default).
rotating_backup_count How many old log files should be saved. Effective unless rotating_policy=none. 1 by default.
rotating_max_size Rotation happens when the file size (in MB) is about to be exceeded. Effective only when rotating_policy=size. 1024 by default.
rotating_interval Rotation interval in hours. Effective only when rotating_policy=time. 24 by default.

Profiling

Harvester support statistic, deterministic, or thread-aware profiling. Statistic profiling is done with python's standard profilers, while deterministic or thread-aware profiling is done with the pprofile package which needs to be installed using pip:

$ pip install pprofile

Harvester is launched with a profiler if the --profiler_output option is given to master.py. The option specifies the filename where the results of the profiler are dumped. If profiling is in the deterministic or thread-aware mode and the filename starts with "cachegrind.out", the results are written in the callgrind profile format which allows the file to be browsed with kcachegrind. If profiling is in the statistic mode, the dumo file can be analyzed using python's standard pstats package. Profiling is in the statistic mode by default, and can be changed with the --profile_mode option, "d" for the deterministic mode and "t" for the thread-aware mode. You can find detailed explanations about profiling modes in the pprofile's page. Note that the dump file is produced only when harvester is properly terminated with the USR2 or TERM signal, i.e.,

$ kill -USR2 `cat $PWD/tmp.pid`

or

$ kill `cat $PWD/tmp.pid`  
Clone this wiki locally