-
Notifications
You must be signed in to change notification settings - Fork 15
Production Server Info
Information of production harvester servers at CERN are recorded here.
Twiki page about central Harvester servers is here.
- CERN Openstack VM managed by CSOps
- OS: CC7
Production nodes use MySQL (MariaDB) as backend and uWSGI to run python (click the links for details).
Requirements of UPS.
In harvester.cfg, the key resource_types.json
should be contained in [cacher] data
:
[cacher]
data =
...
resource_types.json||panda_server:get_resource_types
...
In harvester queue configuration file, the following lines are necessary in the queue object:
"runMode": "slave",
"mapType": "NoJob",
On AGIS, the PQ must at least have SchedConfig parameters:
Capability: ucore
catchall: ...,Pull,...
pilot manager: Harvester
Also see here for more info.
FIFO is currently used on production node on motinor agent cycle and Cache of condor_q.
See here for setup and configuration of global FIFO backend of harvester node.
On Harvester nodes sharing same DB, we use MySQL fifo (can also be Redis fifo, but CERN provides MySQL DB on demand service already) to share the FIFO across harvester nodes as well.
On Harvester nodes with local DB (or remote DB but single node only), we use SQLite fifo with ramdisk for better performance.
Finish Global Setup mentioned above.
And see here for setup and configuration of monitor FIFO.
Cache of condor_q is enabled on production Harvester nodes to reduce condor_q queries and loading on schedd nodes. The cache is implemented with Harvester FIFO.
To enable cache of condor_q, one need to do Global Setup of FIFO first.
Make sure HTCondor system is running well, of course.
Then, set up monitor plugin cache.
Done. HTCondor monitor plugin will work with cache in Harvester FIFO.
Currently, HTCondor 8.8.4 is installed on CERN production harvesters and schedd nodes.
The HTCondor Python binding 8.9.0 (or newer) is installed on CERN production harvesters with pip:
# pip install --upgrade htcondor==8.9.0
Note that the python binding from condor-all yum package CANNOT work properly in harvester. Thus, pip htcondor is necessary.
NGINX (openresty) is running on production node to serve as http gateway with token authentication of Harvester apache messenger.
-
Yum install openresty-1.13.6.2-1 or above. Yum repo can be found here
-
Get the latest release (v1.0.1) of nginx-jwt from GitHub and untar it a in proper directory (more info)
wget -P /opt https://github.com/auth0/nginx-jwt/releases/download/v1.0.1/nginx-jwt.tar.gz cd /opt/ mkdir nginx-jwt tar -xf nginx-jwt.tar.gz -C nginx-jwt
-
Make a secret file for JWT token signature (must be the same file configured as secretFile in frontend section in harvester.cfg)
ls -l /data/atlpan/harvester_jwt.secret
-
Get nginx configuration file in place and make necessary modification. The nginx configuration template can be found here
mv /usr/local/openresty/nginx/conf/nginx.conf{,.rpmsave} vim /usr/local/openresty/nginx/conf/nginx.conf
-
Make the script nginx.service in place. The script example can be found here; make necessary modification of variables and paths in the script to fit your environment.
ls -l /opt/nginx.service chmod a+x /opt/nginx.service /opt/nginx.service start
One can stop, stop, or reload the nginx service via the following commands respectively:
/opt/nginx.service start
/opt/nginx.service stop
/opt/nginx.service reload
CERN CSOps already has a puppet module to build up an instance as central production harvester server.
The harvester instance from CSOps has already done almost all the installation steps. After getting the instance, one can skip the nginx installation steps above, and only need to run this script to initialize:
# /cephfs/atlpan/harvester/scripts/nginx-init.sh
If successful, the instance will run the nginx service binding with port 25443.
After that, one can ask CSOps to open the port to outside CERN.
Getting started |
---|
Installation and configuration |
Testing and running |
Debugging |
Work with Middleware |
Admin FAQ |
Development guides |
---|
Development workflow |
Tagging |
Production & commissioning |
---|
Scale up submission |
Condor experiences |
Commissioning on the grid |
Production servers |
Service monitoring |
Auto Queue Configuration with CRIC |
SSH+RPC middleware setup |
Kubernetes section |
---|
Kubernetes setup |
X509 credentials |
AWS setup |
GKE setup |
CERN setup |
CVMFS installation |
Generic service accounts |
Advanced payloads |
---|
Horovod integration |