diff --git a/search/search_index.json b/search/search_index.json index c362938ea..c52b97f31 100644 --- a/search/search_index.json +++ b/search/search_index.json @@ -1 +1 @@ -{"config":{"lang":["en"],"min_search_length":3,"prebuild_index":false,"separator":"[\\s\\-]+"},"docs":[{"location":"","text":"HTCondor-CE \u00b6 The HTCondor-CE software is a Compute Entrypoint (CE) based on HTCondor for sites that are part of a larger computing grid (e.g. European Grid Infrastructure , Open Science Grid ). As such, HTCondor-CE serves as a \"door\" for incoming resource allocation requests (RARs) \u2014 it handles authorization and delegation of these requests to a grid site's local batch system. Supported batch systems include Grid Engine , HTCondor , LSF , PBS Pro / Torque , and Slurm . For an introduction to HTCondor-CE, watch our recorded webinar from the EGI Community Webinar Programme: What is a Compute Entrypoint? \u00b6 A Compute Entrypoint (CE) is the door for remote organizations to submit requests to temporarily allocate local compute resources. These resource allocation requests are submitted as pilot jobs that create an environment for end-user jobs to match and ultimately run within the pilot job. CEs are made up of a thin layer of software that you install on a machine that already has the ability to submit and manage jobs in your local batch system. What is HTCondor-CE? \u00b6 HTCondor-CE is a special configuration of the HTCondor software designed as a Compute Entrypoint. It is configured to use the HTCondor Job Router daemon to delegate resource allocation requests by transforming and submitting them to the site\u2019s batch system. Benefits of running the HTCondor-CE: Scalability: HTCondor-CE is capable of supporting ~16k concurrent RARs Debugging tools: HTCondor-CE offers many tools to help troubleshoot issues with RARs Routing as configuration: HTCondor-CE\u2019s mechanism to transform and submit RARs is customized via configuration variables, which means that customizations will persist across upgrades and will not involve modification of software internals to route jobs Getting HTCondor-CE \u00b6 Learn how to get and install HTCondor-CE through our documentation . Contact Us \u00b6 HTCondor-CE is developed and maintained by the Center for High Throughput Computing . If you have questions or issues regarding HTCondor-CE, please see the HTCondor support page for how to contact us.","title":"Overview"},{"location":"#htcondor-ce","text":"The HTCondor-CE software is a Compute Entrypoint (CE) based on HTCondor for sites that are part of a larger computing grid (e.g. European Grid Infrastructure , Open Science Grid ). As such, HTCondor-CE serves as a \"door\" for incoming resource allocation requests (RARs) \u2014 it handles authorization and delegation of these requests to a grid site's local batch system. Supported batch systems include Grid Engine , HTCondor , LSF , PBS Pro / Torque , and Slurm . For an introduction to HTCondor-CE, watch our recorded webinar from the EGI Community Webinar Programme:","title":"HTCondor-CE"},{"location":"#what-is-a-compute-entrypoint","text":"A Compute Entrypoint (CE) is the door for remote organizations to submit requests to temporarily allocate local compute resources. These resource allocation requests are submitted as pilot jobs that create an environment for end-user jobs to match and ultimately run within the pilot job. CEs are made up of a thin layer of software that you install on a machine that already has the ability to submit and manage jobs in your local batch system.","title":"What is a Compute Entrypoint?"},{"location":"#what-is-htcondor-ce","text":"HTCondor-CE is a special configuration of the HTCondor software designed as a Compute Entrypoint. It is configured to use the HTCondor Job Router daemon to delegate resource allocation requests by transforming and submitting them to the site\u2019s batch system. Benefits of running the HTCondor-CE: Scalability: HTCondor-CE is capable of supporting ~16k concurrent RARs Debugging tools: HTCondor-CE offers many tools to help troubleshoot issues with RARs Routing as configuration: HTCondor-CE\u2019s mechanism to transform and submit RARs is customized via configuration variables, which means that customizations will persist across upgrades and will not involve modification of software internals to route jobs","title":"What is HTCondor-CE?"},{"location":"#getting-htcondor-ce","text":"Learn how to get and install HTCondor-CE through our documentation .","title":"Getting HTCondor-CE"},{"location":"#contact-us","text":"HTCondor-CE is developed and maintained by the Center for High Throughput Computing . If you have questions or issues regarding HTCondor-CE, please see the HTCondor support page for how to contact us.","title":"Contact Us"},{"location":"architecture/","text":"How Jobs Run \u00b6 Once an incoming pilot job is authorized, it is placed into HTCondor-CE\u2019s scheduler where the Job Router creates a transformed copy (called the routed job ) and submits the copy to the batch system (called the batch system job ). After submission, HTCondor-CE monitors the batch system job and communicates its status to the original pilot job, which in turn notifies the original submitter (e.g., job factory) of any updates. When the batch job job completes, files are transferred along the same chain: from the batch system to the CE, then from the CE to the original submitter. On HTCondor batch systems \u00b6 For a site with an HTCondor batch system , the Job Router uses HTCondor protocols to place a transformed copy of the pilot job directly into the batch system\u2019s scheduler, meaning that the routed job is also the batch system job. Thus, there are three representations of your job, each with its own ID (see diagram below): Submitter: the HTCondor job ID in the original queue HTCondor-CE: the incoming pilot job\u2019s ID HTCondor batch system: the routed job\u2019s ID In an HTCondor-CE/HTCondor setup, file transfer is handled natively between the two sets of daemons by the underlying HTCondor software. If you are running HTCondor as your batch system, you will have two HTCondor configurations side-by-side (one residing in /etc/condor/ and the other in /etc/condor-ce ) and will need to make sure to differentiate the two when modifying any configuration. On other batch systems \u00b6 For non-HTCondor batch systems, the Job Router transforms the pilot job into a routed job on the CE and the routed job submits a job into the batch system via a process called the BLAHP. Thus, there are four representations of your job, each with its own ID (see diagram below): Submitter: the HTCondor job ID in the original queue HTCondor-CE: the incoming pilot job\u2019s ID and the routed job\u2019s ID Non-HTCondor batch system: the batch system\u2019s job ID Although the following figure specifies the PBS case, it applies to all non-HTCondor batch systems: With non-HTCondor batch systems, HTCondor-CE cannot use internal HTCondor protocols to transfer files so its \"spool\" directory must be exported to a shared file system that is mounted on the batch system\u2019s worker nodes. Hosted CE over SSH \u00b6 The Hosted CE is designed to be an HTCondor-CE as a Service offered by a central grid operations team. Hosted CEs submit jobs to remote clusters over SSH, providing a simple starting point for opportunistic resource owners that want to start contributing to a computing grid with minimal effort. If your site intends to run over 10,000 concurrent pilot jobs, you will need to host your own HTCondor-CE because the Hosted CE has not yet been optimized for such loads. How the CE is Customized \u00b6 Aside from the [basic configuration] required in the CE installation, there are two main ways to customize your CE (if you decide any customization is required at all): Deciding which Virtual Organizations (VOs) are allowed to run at your site: HTCondor-CE leverages HTCondor's built-in ability to authenticate incoming jobs based on their OAuth token credentials. How to filter and transform the pilot jobs to be run on your batch system: Filtering and transforming pilot jobs (i.e., setting site-specific attributes or resource limits), requires configuration of your site\u2019s job routes. For examples of common job routes, consult the job router configuration pages. How Security Works \u00b6 In the grid, security depends on a PKI infrastructure involving Certificate Authorities (CAs) where CAs sign and issue certificates. When these clients and hosts wish to communicate with each other, the identities of each party is confirmed by cross-checking their certificates with the signing CA and establishing trust. In its default configuration, HTCondor-CE supports token-based authentication and authorization to the remote submitter's credentials.","title":"Architecture"},{"location":"architecture/#how-jobs-run","text":"Once an incoming pilot job is authorized, it is placed into HTCondor-CE\u2019s scheduler where the Job Router creates a transformed copy (called the routed job ) and submits the copy to the batch system (called the batch system job ). After submission, HTCondor-CE monitors the batch system job and communicates its status to the original pilot job, which in turn notifies the original submitter (e.g., job factory) of any updates. When the batch job job completes, files are transferred along the same chain: from the batch system to the CE, then from the CE to the original submitter.","title":"How Jobs Run"},{"location":"architecture/#on-htcondor-batch-systems","text":"For a site with an HTCondor batch system , the Job Router uses HTCondor protocols to place a transformed copy of the pilot job directly into the batch system\u2019s scheduler, meaning that the routed job is also the batch system job. Thus, there are three representations of your job, each with its own ID (see diagram below): Submitter: the HTCondor job ID in the original queue HTCondor-CE: the incoming pilot job\u2019s ID HTCondor batch system: the routed job\u2019s ID In an HTCondor-CE/HTCondor setup, file transfer is handled natively between the two sets of daemons by the underlying HTCondor software. If you are running HTCondor as your batch system, you will have two HTCondor configurations side-by-side (one residing in /etc/condor/ and the other in /etc/condor-ce ) and will need to make sure to differentiate the two when modifying any configuration.","title":"On HTCondor batch systems"},{"location":"architecture/#on-other-batch-systems","text":"For non-HTCondor batch systems, the Job Router transforms the pilot job into a routed job on the CE and the routed job submits a job into the batch system via a process called the BLAHP. Thus, there are four representations of your job, each with its own ID (see diagram below): Submitter: the HTCondor job ID in the original queue HTCondor-CE: the incoming pilot job\u2019s ID and the routed job\u2019s ID Non-HTCondor batch system: the batch system\u2019s job ID Although the following figure specifies the PBS case, it applies to all non-HTCondor batch systems: With non-HTCondor batch systems, HTCondor-CE cannot use internal HTCondor protocols to transfer files so its \"spool\" directory must be exported to a shared file system that is mounted on the batch system\u2019s worker nodes.","title":"On other batch systems"},{"location":"architecture/#hosted-ce-over-ssh","text":"The Hosted CE is designed to be an HTCondor-CE as a Service offered by a central grid operations team. Hosted CEs submit jobs to remote clusters over SSH, providing a simple starting point for opportunistic resource owners that want to start contributing to a computing grid with minimal effort. If your site intends to run over 10,000 concurrent pilot jobs, you will need to host your own HTCondor-CE because the Hosted CE has not yet been optimized for such loads.","title":"Hosted CE over SSH"},{"location":"architecture/#how-the-ce-is-customized","text":"Aside from the [basic configuration] required in the CE installation, there are two main ways to customize your CE (if you decide any customization is required at all): Deciding which Virtual Organizations (VOs) are allowed to run at your site: HTCondor-CE leverages HTCondor's built-in ability to authenticate incoming jobs based on their OAuth token credentials. How to filter and transform the pilot jobs to be run on your batch system: Filtering and transforming pilot jobs (i.e., setting site-specific attributes or resource limits), requires configuration of your site\u2019s job routes. For examples of common job routes, consult the job router configuration pages.","title":"How the CE is Customized"},{"location":"architecture/#how-security-works","text":"In the grid, security depends on a PKI infrastructure involving Certificate Authorities (CAs) where CAs sign and issue certificates. When these clients and hosts wish to communicate with each other, the identities of each party is confirmed by cross-checking their certificates with the signing CA and establishing trust. In its default configuration, HTCondor-CE supports token-based authentication and authorization to the remote submitter's credentials.","title":"How Security Works"},{"location":"v5/operation/","text":"Operating an HTCondor-CE \u00b6 To verify that you have a working installation of HTCondor-CE, ensure that all the relevant services are started and enabled then perform the validation steps below. Managing HTCondor-CE services \u00b6 In addition to the HTCondor-CE job gateway service itself, there are a number of supporting services in your installation. The specific services are: Software Service name Fetch CRL fetch-crl-boot and fetch-crl-cron Your batch system condor or pbs_server or \u2026 HTCondor-CE condor-ce (Optional) APEL uploader condor-ce-apel and condor-ce-apel.timer Start and enable the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... On EL7, run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable Validating HTCondor-CE \u00b6 To validate an HTCondor-CE, perform the following steps: Verify that local job submissions complete successfully from the CE host. For example, if you have a Slurm cluster, run sbatch from the CE and verify that it runs and completes with scontrol and sacct . Verify that all the necessary daemons are running with condor_ce_status -any . Verify the CE's network configuration using condor_ce_host_network_check . Verify that jobs can complete successfully using condor_ce_trace . Draining an HTCondor-CE \u00b6 To drain an HTCondor-CE of jobs, perform the following steps: Set CONDORCE_MAX_JOBS = 0 in /etc/condor-ce/config.d Run condor_ce_reconfig to apply the configuration change Use condor_ce_rm as needed to stop and remove any jobs that should stop running Once draining is completed, don't forget to restore the value of CONDORCE_MAX_JOBS to its previous value before trying to operate the HTCondor-CE again. Checking User Authentication \u00b6 There are two primary authentication methods for submitting jobs to an HTCondor-CE: GSI (currently being phased out) and SciTokens. To see which authentication method and identity were used to submit a particular job (or modify existing jobs), you can look in /var/log/condor-ce/AuditLog . If GSI authentication was used, you'll see a set of lines like this: 10/15/21 17:52:32 (cid:14) (D_AUDIT) Command=QMGMT_WRITE_CMD, peer=<172.17.0.2:41045> 10/15/21 17:52:32 (cid:14) (D_AUDIT) AuthMethod=GSI, AuthId=/DC=org/DC=opensciencegrid/C=US/O=OSG Software/OU=People/CN=testuser, CondorId=testuser@users.htcondor.org 10/15/21 17:52:32 (cid:14) (D_AUDIT) Submitting new job 1.0 If SciTokens authentication was used, you'll see a set of lines like this: 10/15/21 17:54:08 (cid:130) (D_AUDIT) Command=QMGMT_WRITE_CMD, peer=<172.17.0.2:37869> 10/15/21 17:54:08 (cid:130) (D_AUDIT) AuthMethod=SCITOKENS, AuthId=https://demo.scitokens.org,htcondor-ce-dev, CondorId=testuser@users.htcondor.org 10/15/21 17:54:08 (cid:130) (D_AUDIT) Submitting new job 2.0 Lines pertaining to the same client request will have the same cid value. Lines from different client requests may be interleaved. Getting Help \u00b6 If any of the above validation steps fail, consult the troubleshooting guide . If that still doesn't resolve your issue, please contact us for assistance.","title":"Operation"},{"location":"v5/operation/#operating-an-htcondor-ce","text":"To verify that you have a working installation of HTCondor-CE, ensure that all the relevant services are started and enabled then perform the validation steps below.","title":"Operating an HTCondor-CE"},{"location":"v5/operation/#managing-htcondor-ce-services","text":"In addition to the HTCondor-CE job gateway service itself, there are a number of supporting services in your installation. The specific services are: Software Service name Fetch CRL fetch-crl-boot and fetch-crl-cron Your batch system condor or pbs_server or \u2026 HTCondor-CE condor-ce (Optional) APEL uploader condor-ce-apel and condor-ce-apel.timer Start and enable the services in the order listed and stop them in reverse order. As a reminder, here are common service commands (all run as root ): To... On EL7, run the command... Start a service systemctl start Stop a service systemctl stop Enable a service to start on boot systemctl enable Disable a service from starting on boot systemctl disable ","title":"Managing HTCondor-CE services"},{"location":"v5/operation/#validating-htcondor-ce","text":"To validate an HTCondor-CE, perform the following steps: Verify that local job submissions complete successfully from the CE host. For example, if you have a Slurm cluster, run sbatch from the CE and verify that it runs and completes with scontrol and sacct . Verify that all the necessary daemons are running with condor_ce_status -any . Verify the CE's network configuration using condor_ce_host_network_check . Verify that jobs can complete successfully using condor_ce_trace .","title":"Validating HTCondor-CE"},{"location":"v5/operation/#draining-an-htcondor-ce","text":"To drain an HTCondor-CE of jobs, perform the following steps: Set CONDORCE_MAX_JOBS = 0 in /etc/condor-ce/config.d Run condor_ce_reconfig to apply the configuration change Use condor_ce_rm as needed to stop and remove any jobs that should stop running Once draining is completed, don't forget to restore the value of CONDORCE_MAX_JOBS to its previous value before trying to operate the HTCondor-CE again.","title":"Draining an HTCondor-CE"},{"location":"v5/operation/#checking-user-authentication","text":"There are two primary authentication methods for submitting jobs to an HTCondor-CE: GSI (currently being phased out) and SciTokens. To see which authentication method and identity were used to submit a particular job (or modify existing jobs), you can look in /var/log/condor-ce/AuditLog . If GSI authentication was used, you'll see a set of lines like this: 10/15/21 17:52:32 (cid:14) (D_AUDIT) Command=QMGMT_WRITE_CMD, peer=<172.17.0.2:41045> 10/15/21 17:52:32 (cid:14) (D_AUDIT) AuthMethod=GSI, AuthId=/DC=org/DC=opensciencegrid/C=US/O=OSG Software/OU=People/CN=testuser, CondorId=testuser@users.htcondor.org 10/15/21 17:52:32 (cid:14) (D_AUDIT) Submitting new job 1.0 If SciTokens authentication was used, you'll see a set of lines like this: 10/15/21 17:54:08 (cid:130) (D_AUDIT) Command=QMGMT_WRITE_CMD, peer=<172.17.0.2:37869> 10/15/21 17:54:08 (cid:130) (D_AUDIT) AuthMethod=SCITOKENS, AuthId=https://demo.scitokens.org,htcondor-ce-dev, CondorId=testuser@users.htcondor.org 10/15/21 17:54:08 (cid:130) (D_AUDIT) Submitting new job 2.0 Lines pertaining to the same client request will have the same cid value. Lines from different client requests may be interleaved.","title":"Checking User Authentication"},{"location":"v5/operation/#getting-help","text":"If any of the above validation steps fail, consult the troubleshooting guide . If that still doesn't resolve your issue, please contact us for assistance.","title":"Getting Help"},{"location":"v5/reference/","text":"Reference \u00b6 Configuration \u00b6 The following directories contain the configuration for HTCondor-CE. The directories are parsed in the order presented and thus configuration within the final directory will override configuration specified in the previous directories. Location Comment /usr/share/condor-ce/config.d/ Configuration defaults (overwritten on package updates) /etc/condor-ce/config.d/ Files in this directory are parsed in alphanumeric order (i.e., 99-local.conf will override values in 01-ce-auth.conf ) For a detailed order of the way configuration files are parsed, run the following command: user@host $ condor_ce_config_val -config Users \u00b6 The following users are needed by HTCondor-CE at all sites: User Comment condor The HTCondor-CE will be run as root, but perform most of its operations as the condor user. Certificates \u00b6 File User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem Host key root /grid-security/hostkey.pem Networking \u00b6 Service Name Protocol Port Number Inbound Outbound Comment Htcondor-CE tcp 9619 X HTCondor-CE shared port Allow inbound and outbound network connection to all internal site servers, such as the batch system head-node only ephemeral outgoing ports are necessary.","title":"Reference"},{"location":"v5/reference/#reference","text":"","title":"Reference"},{"location":"v5/reference/#configuration","text":"The following directories contain the configuration for HTCondor-CE. The directories are parsed in the order presented and thus configuration within the final directory will override configuration specified in the previous directories. Location Comment /usr/share/condor-ce/config.d/ Configuration defaults (overwritten on package updates) /etc/condor-ce/config.d/ Files in this directory are parsed in alphanumeric order (i.e., 99-local.conf will override values in 01-ce-auth.conf ) For a detailed order of the way configuration files are parsed, run the following command: user@host $ condor_ce_config_val -config","title":"Configuration"},{"location":"v5/reference/#users","text":"The following users are needed by HTCondor-CE at all sites: User Comment condor The HTCondor-CE will be run as root, but perform most of its operations as the condor user.","title":"Users"},{"location":"v5/reference/#certificates","text":"File User that owns certificate Path to certificate Host certificate root /etc/grid-security/hostcert.pem Host key root /grid-security/hostkey.pem","title":"Certificates"},{"location":"v5/reference/#networking","text":"Service Name Protocol Port Number Inbound Outbound Comment Htcondor-CE tcp 9619 X HTCondor-CE shared port Allow inbound and outbound network connection to all internal site servers, such as the batch system head-node only ephemeral outgoing ports are necessary.","title":"Networking"},{"location":"v5/releases/","text":"Releases \u00b6 HTCondor-CE 5 is distributed via RPM and are available from the following Yum repositories: HTCondor stable and current channels Open Science Grid Known Issues \u00b6 Known bugs affecting HTCondor-CEs can be found in Jira In particular, the following bugs are of note: C-style comments, e.g. /* comment */ , in JOB_ROUTER_ENTRIES will prevent the JobRouter from routing jobs ( HTCONDOR-864 ). For the time being, remove any comments if you are still using the deprecated syntax . Updating to HTCondor-CE 5 \u00b6 Finding relevant configuration changes When updating HTCondor-CE RPMs, .rpmnew and .rpmsave files may be created containing new defaults that you should merge or new defaults that have replaced your customzations, respectively. To find these files for HTCondor-CE, run the following command: root@host # find /etc/condor-ce/ -name '*.rpmnew' -name '*.rpmsave' HTCondor-CE 5 is a major release that adds many features and overhauls the default configuration. As such, upgrades from older versions of HTCondor-CE may require manual intervention: Support for ClassAd transforms added to the JobRouter \u00b6 Transforms will override JOB_ROUTER_ENTRIES routes with the same name Even if you do not plan on immediately using the new syntax, it's important to note that route transforms will override JOB_ROUTER_ENTRIES routes with the same name. In other words, the route transform names returned by condor_ce_config_val -dump -v JOB_ROUTER_ROUTE_ should only appear in your list of used routes returned by condor_ce_config_val JOB_ROUTER_ROUTE_NAMES if you intend to use the new transform syntax. HTCondor-CE now includes default ClassAd transforms equivalent to its JOB_ROUTER_DEFAULTS , allowing administrators to write job routes using the transform synatx. The old syntax continues to be the default in HTCondor-CE 5. Writing routes in the new syntax provides many benefits including: Statements being evaluated in the order they are written Use of variables that are not included in the resultant job ad Use of simple case statements. Additionally, it is now easier to include transforms that should be evaluated before or after your routes by including transforms in the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES , respectively. To use the new transform syntax: Disable use of JOB_ROUTER_ENTRIES by setting the following in /etc/condor-ce/config.d/ : JOB_ROUTER_USE_DEPRECATED_ROUTER_ENTRIES = False Set JOB_ROUTER_ROUTE_ to a job route in the new transform syntax where is the name of the route that you'd like to be reflected in logs and tool output. Add the above to the list of routes in JOB_ROUTER_ROUTE_NAMES New condor_mapfile format and locations \u00b6 HTCondor-CE 5 separates its unified mapfile used for authentication between multiple files across multiple directories. Additionally, any regular expressions in the second field must be enclosed by / . To update your mappings to the new format and location, perform the following actions: Upon upgrade, your existing mapfile will be moved to /etc/condor-ce/condor_mapfile.rpmsave . Remove any of the following lines provided by default in the HTCondor-CE packaging: GSI (.*) GSS_ASSIST_GRIDMAP SSL \"[-.A-Za-z0-9/= ]*/CN=([-.A-Za-z0-9/= ]+)\" \\1@unmapped.htcondor.org CLAIMTOBE .* anonymous@claimtobe FS \"^(root|condor)$\" \\1@daemon.htcondor.org FS \"(.*)\" \\1 Copy the remaining contents of /etc/condor-ce/condor_mapfile.rpmsave to a file ending in *.conf in /etc/condor-ce/mapfiles.d/ . Note that files in this folder are parsed in lexicographic order. Update the second field of any existing mappings by enclosing any regular expressions in / , escaping any slashes with a backslash (e.g. \\/ ). Consider converting any GSI mappings into Perl Compatible Regular Expressions (PCRE) since the authenticated name of incoming proxies may contain additional VOMS FQANs in addition to the Distinguished Name (DN): ,,,..., For example, to accept a given DN with any VOMS attributes, the mapping should look like the following: GSI /^\\/DC=org\\/DC=cilogon\\/C=US\\/O=University of Wisconsin-Madison\\/CN=Brian Lin A226624,.*/ blin Alternatively, to accept any DN from the OSG VO: GSI /.*,\\/osg\\/Role=Pilot\\/Capability=.*/ osg Also consider converting SCITOKENS mappings to PCRE since the authenticated name of incoming tokens will contain the token issuer ( iss ) and any token subject ( sub ) fields: , For example, to accept a token issued by the OSG VO with any subject, write the following mapping: SCITOKENS /^https:\\/\\/scitokens.org\\/osg-connect,.*/ osg Specify certificate locations for token authentication \u00b6 HTCondor-CE 5 adds improved support for accepting pilot jobs submitted with bearer tokens (e.g., SciTokens or WLCG tokens). As part of the bearer token authentication, HTCondor-CE uses its host certificate to perform an SSL handshake with the client to establish trust with its token issuer. Consult the authentication documentation to configure certificate locations for token authentication. No longer set $HOME by default \u00b6 Older versions of HTCondor-CE set $HOME in the routed job to the user's $HOME directory on the HTCondor-CE. To re-enable this behavior, set USE_CE_HOME_DIR = True in /etc/condor-ce/config.d/ . HTCondor-CE 5 Version History \u00b6 This section contains release notes for each version of HTCondor-CE 5. Full HTCondor-CE version history can be found on GitHub . 5.1.6 \u00b6 This release includes the following changes: HTCondor-CE now uses the C++ Collector plugin for payload job traceability Fix HTCondor-CE mapfiles to be compliant with PCRE2 and HTCondor 9.10.0+ Add support for multiple APEL accounting scaling factors Suppress spurious log message about a missing negotiator Fix crash in HTCondor-CE View 5.1.5 \u00b6 This release includes the following changes: Rename AuthToken attributes in the routed job to better support accounting Prevent GSI environment from pointing the job to the wrong certificates Fix issue where HTCondor-CE would need port 9618 open to start up 5.1.4 \u00b6 This release includes the following changes: Fix whole node job glidein CPUs and GPUs expressions that caused held jobs Fix bug where default CERequirements were being ignored Pass whole node request from GlideinWMS to the batch system Since CentOS 8 has reached end of life, we build and test on Rocky Linux 8 5.1.3 \u00b6 This release includes the following changes: The HTCondor-CE central collector requires SSL credentials from client CEs Fix BDII crash if an HTCondor Access Point is not available Fix formatting of APEL records that contain huge values HTCondor-CE client mapfiles are not installed on the central collector 5.1.2 \u00b6 This release includes the following changes: Fixed the default memory and CPU requests when using job router transforms Apply default MaxJobs and MaxJobsIdle when using job router transforms Improved SciTokens support in submission tools Fixed --debug flag in condor_ce_run Update configuration verification script to handle job router transforms Corrected ownership of the HTCondor PER_JOBS_HISTORY_DIR Fix bug passing maximum wall time requests to the local batch system 5.1.1 \u00b6 This release includes the following changes: Improve restart time of HTCondor-CE View ( HTCONDOR-420 ) Fix bug that caused HTCondor-CE to ignore incoming BatchRuntime requests (#480) Fixed error that occurred during RPM installation of non-HTCondor batch systems regarding missing file batch_gahp ( HTCONDOR-504 ) 5.1.0 \u00b6 This release includes the following new features: Add support for ClassAd transforms to the JobRouter ( HTCONDOR-243 ) Add mapped user and X.509 attribute to local HTCondor pool AccountingGroup mappings to Job Routers configured to use the ClassAd transform syntax ( HTCONDOR-187 ) Split condor_mapfile into files that use regular expressions in /etc/condor-ce/mapfiles.d/*.conf ( HTCONDOR-244 ) Accept BatchRuntime attributes from incoming jobs to set their maximum walltime ( HTCONDOR-80 ) Update the HTCondor-CE registry to Python 3 ( HTCONDOR-307 ) Enable SSL authentication by default for READ / WRITE authorization levels ( HTCONDOR-366 ) APEL reporting scripts now use history files in the local HTCondor PER_JOB_HISTORY_DIR to collect job data. ( HTCONDOR_293 ) Use the GlobalJobID attribute as the APEL record lrmsID ( #426 ) Downgrade errors in the configuration verification startup script to support routes written in the transform syntax ( #465 ) Allow required directories to be owned by non- condor groups ( #451 ) This release also includes the following bug-fixes: Fix an issue with an overly aggressive default SYSTEM_PERIODIC_REMOVE ( HTCONDOR-350 ) Fix incorrect path to Python 3 Collector plugin ( HTCONDOR-400 ) Fix faulty validation of JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ENTRIES in the startup script ( HTCONDOR-406 ) Fix various Python 3 incompatibilities ( #460 ) 5.0.0 \u00b6 This release includes the following new features: Python 3 and Enterprise Linux 8 support ( HTCONDOR_13 ) HTCondor-CE no longer sets $HOME in routed jobs by default ( HTCONDOR-176 ) Whole node jobs (local HTCondor batch systems only) now make use of GPUs ( HTCONDOR-103 ) HTCondor-CE Central Collectors now prefer GSI over SSL authentication ( HTCONDOR-237 ) HTCondor-CE registry now validates the value of submitted client codes ( HTCONDOR-241 ) Automatically remove CE jobs that exceed their maxWalltime (if defined) or the configuration value of ROUTED_JOB_MAX_TIME (default: 4320 sec/72 hrs) This release also includes the following bug-fixes: Fix a circular configuration definition in the HTCondor-CE View that resulted in 100% CPU usage by the condor_gangliad daemon ( HTCONDOR-161 ) Getting Help \u00b6 If you have any questions about the release process or run into issues with an upgrade, please contact us for assistance.","title":"Releases"},{"location":"v5/releases/#releases","text":"HTCondor-CE 5 is distributed via RPM and are available from the following Yum repositories: HTCondor stable and current channels Open Science Grid","title":"Releases"},{"location":"v5/releases/#known-issues","text":"Known bugs affecting HTCondor-CEs can be found in Jira In particular, the following bugs are of note: C-style comments, e.g. /* comment */ , in JOB_ROUTER_ENTRIES will prevent the JobRouter from routing jobs ( HTCONDOR-864 ). For the time being, remove any comments if you are still using the deprecated syntax .","title":"Known Issues"},{"location":"v5/releases/#updating-to-htcondor-ce-5","text":"Finding relevant configuration changes When updating HTCondor-CE RPMs, .rpmnew and .rpmsave files may be created containing new defaults that you should merge or new defaults that have replaced your customzations, respectively. To find these files for HTCondor-CE, run the following command: root@host # find /etc/condor-ce/ -name '*.rpmnew' -name '*.rpmsave' HTCondor-CE 5 is a major release that adds many features and overhauls the default configuration. As such, upgrades from older versions of HTCondor-CE may require manual intervention:","title":"Updating to HTCondor-CE 5"},{"location":"v5/releases/#support-for-classad-transforms-added-to-the-jobrouter","text":"Transforms will override JOB_ROUTER_ENTRIES routes with the same name Even if you do not plan on immediately using the new syntax, it's important to note that route transforms will override JOB_ROUTER_ENTRIES routes with the same name. In other words, the route transform names returned by condor_ce_config_val -dump -v JOB_ROUTER_ROUTE_ should only appear in your list of used routes returned by condor_ce_config_val JOB_ROUTER_ROUTE_NAMES if you intend to use the new transform syntax. HTCondor-CE now includes default ClassAd transforms equivalent to its JOB_ROUTER_DEFAULTS , allowing administrators to write job routes using the transform synatx. The old syntax continues to be the default in HTCondor-CE 5. Writing routes in the new syntax provides many benefits including: Statements being evaluated in the order they are written Use of variables that are not included in the resultant job ad Use of simple case statements. Additionally, it is now easier to include transforms that should be evaluated before or after your routes by including transforms in the lists of JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES and JOB_ROUTER_PRE_ROUTE_TRANSFORM_NAMES , respectively. To use the new transform syntax: Disable use of JOB_ROUTER_ENTRIES by setting the following in /etc/condor-ce/config.d/ : JOB_ROUTER_USE_DEPRECATED_ROUTER_ENTRIES = False Set JOB_ROUTER_ROUTE_ to a job route in the new transform syntax where is the name of the route that you'd like to be reflected in logs and tool output. Add the above to the list of routes in JOB_ROUTER_ROUTE_NAMES","title":"Support for ClassAd transforms added to the JobRouter"},{"location":"v5/releases/#new-condor_mapfile-format-and-locations","text":"HTCondor-CE 5 separates its unified mapfile used for authentication between multiple files across multiple directories. Additionally, any regular expressions in the second field must be enclosed by / . To update your mappings to the new format and location, perform the following actions: Upon upgrade, your existing mapfile will be moved to /etc/condor-ce/condor_mapfile.rpmsave . Remove any of the following lines provided by default in the HTCondor-CE packaging: GSI (.*) GSS_ASSIST_GRIDMAP SSL \"[-.A-Za-z0-9/= ]*/CN=([-.A-Za-z0-9/= ]+)\" \\1@unmapped.htcondor.org CLAIMTOBE .* anonymous@claimtobe FS \"^(root|condor)$\" \\1@daemon.htcondor.org FS \"(.*)\" \\1 Copy the remaining contents of /etc/condor-ce/condor_mapfile.rpmsave to a file ending in *.conf in /etc/condor-ce/mapfiles.d/ . Note that files in this folder are parsed in lexicographic order. Update the second field of any existing mappings by enclosing any regular expressions in / , escaping any slashes with a backslash (e.g. \\/ ). Consider converting any GSI mappings into Perl Compatible Regular Expressions (PCRE) since the authenticated name of incoming proxies may contain additional VOMS FQANs in addition to the Distinguished Name (DN): ,,,..., For example, to accept a given DN with any VOMS attributes, the mapping should look like the following: GSI /^\\/DC=org\\/DC=cilogon\\/C=US\\/O=University of Wisconsin-Madison\\/CN=Brian Lin A226624,.*/ blin Alternatively, to accept any DN from the OSG VO: GSI /.*,\\/osg\\/Role=Pilot\\/Capability=.*/ osg Also consider converting SCITOKENS mappings to PCRE since the authenticated name of incoming tokens will contain the token issuer ( iss ) and any token subject ( sub ) fields: , For example, to accept a token issued by the OSG VO with any subject, write the following mapping: SCITOKENS /^https:\\/\\/scitokens.org\\/osg-connect,.*/ osg","title":"New condor_mapfile format and locations"},{"location":"v5/releases/#specify-certificate-locations-for-token-authentication","text":"HTCondor-CE 5 adds improved support for accepting pilot jobs submitted with bearer tokens (e.g., SciTokens or WLCG tokens). As part of the bearer token authentication, HTCondor-CE uses its host certificate to perform an SSL handshake with the client to establish trust with its token issuer. Consult the authentication documentation to configure certificate locations for token authentication.","title":"Specify certificate locations for token authentication"},{"location":"v5/releases/#no-longer-set-home-by-default","text":"Older versions of HTCondor-CE set $HOME in the routed job to the user's $HOME directory on the HTCondor-CE. To re-enable this behavior, set USE_CE_HOME_DIR = True in /etc/condor-ce/config.d/ .","title":"No longer set $HOME by default"},{"location":"v5/releases/#htcondor-ce-5-version-history","text":"This section contains release notes for each version of HTCondor-CE 5. Full HTCondor-CE version history can be found on GitHub .","title":"HTCondor-CE 5 Version History"},{"location":"v5/releases/#516","text":"This release includes the following changes: HTCondor-CE now uses the C++ Collector plugin for payload job traceability Fix HTCondor-CE mapfiles to be compliant with PCRE2 and HTCondor 9.10.0+ Add support for multiple APEL accounting scaling factors Suppress spurious log message about a missing negotiator Fix crash in HTCondor-CE View","title":"5.1.6"},{"location":"v5/releases/#515","text":"This release includes the following changes: Rename AuthToken attributes in the routed job to better support accounting Prevent GSI environment from pointing the job to the wrong certificates Fix issue where HTCondor-CE would need port 9618 open to start up","title":"5.1.5"},{"location":"v5/releases/#514","text":"This release includes the following changes: Fix whole node job glidein CPUs and GPUs expressions that caused held jobs Fix bug where default CERequirements were being ignored Pass whole node request from GlideinWMS to the batch system Since CentOS 8 has reached end of life, we build and test on Rocky Linux 8","title":"5.1.4"},{"location":"v5/releases/#513","text":"This release includes the following changes: The HTCondor-CE central collector requires SSL credentials from client CEs Fix BDII crash if an HTCondor Access Point is not available Fix formatting of APEL records that contain huge values HTCondor-CE client mapfiles are not installed on the central collector","title":"5.1.3"},{"location":"v5/releases/#512","text":"This release includes the following changes: Fixed the default memory and CPU requests when using job router transforms Apply default MaxJobs and MaxJobsIdle when using job router transforms Improved SciTokens support in submission tools Fixed --debug flag in condor_ce_run Update configuration verification script to handle job router transforms Corrected ownership of the HTCondor PER_JOBS_HISTORY_DIR Fix bug passing maximum wall time requests to the local batch system","title":"5.1.2"},{"location":"v5/releases/#511","text":"This release includes the following changes: Improve restart time of HTCondor-CE View ( HTCONDOR-420 ) Fix bug that caused HTCondor-CE to ignore incoming BatchRuntime requests (#480) Fixed error that occurred during RPM installation of non-HTCondor batch systems regarding missing file batch_gahp ( HTCONDOR-504 )","title":"5.1.1"},{"location":"v5/releases/#510","text":"This release includes the following new features: Add support for ClassAd transforms to the JobRouter ( HTCONDOR-243 ) Add mapped user and X.509 attribute to local HTCondor pool AccountingGroup mappings to Job Routers configured to use the ClassAd transform syntax ( HTCONDOR-187 ) Split condor_mapfile into files that use regular expressions in /etc/condor-ce/mapfiles.d/*.conf ( HTCONDOR-244 ) Accept BatchRuntime attributes from incoming jobs to set their maximum walltime ( HTCONDOR-80 ) Update the HTCondor-CE registry to Python 3 ( HTCONDOR-307 ) Enable SSL authentication by default for READ / WRITE authorization levels ( HTCONDOR-366 ) APEL reporting scripts now use history files in the local HTCondor PER_JOB_HISTORY_DIR to collect job data. ( HTCONDOR_293 ) Use the GlobalJobID attribute as the APEL record lrmsID ( #426 ) Downgrade errors in the configuration verification startup script to support routes written in the transform syntax ( #465 ) Allow required directories to be owned by non- condor groups ( #451 ) This release also includes the following bug-fixes: Fix an issue with an overly aggressive default SYSTEM_PERIODIC_REMOVE ( HTCONDOR-350 ) Fix incorrect path to Python 3 Collector plugin ( HTCONDOR-400 ) Fix faulty validation of JOB_ROUTER_ROUTE_NAMES and JOB_ROUTER_ENTRIES in the startup script ( HTCONDOR-406 ) Fix various Python 3 incompatibilities ( #460 )","title":"5.1.0"},{"location":"v5/releases/#500","text":"This release includes the following new features: Python 3 and Enterprise Linux 8 support ( HTCONDOR_13 ) HTCondor-CE no longer sets $HOME in routed jobs by default ( HTCONDOR-176 ) Whole node jobs (local HTCondor batch systems only) now make use of GPUs ( HTCONDOR-103 ) HTCondor-CE Central Collectors now prefer GSI over SSL authentication ( HTCONDOR-237 ) HTCondor-CE registry now validates the value of submitted client codes ( HTCONDOR-241 ) Automatically remove CE jobs that exceed their maxWalltime (if defined) or the configuration value of ROUTED_JOB_MAX_TIME (default: 4320 sec/72 hrs) This release also includes the following bug-fixes: Fix a circular configuration definition in the HTCondor-CE View that resulted in 100% CPU usage by the condor_gangliad daemon ( HTCONDOR-161 )","title":"5.0.0"},{"location":"v5/releases/#getting-help","text":"If you have any questions about the release process or run into issues with an upgrade, please contact us for assistance.","title":"Getting Help"},{"location":"v5/remote-job-submission/","text":"Submitting Jobs Remotely to an HTCondor-CE \u00b6 This document outlines how to submit jobs to an HTCondor-CE from a remote client using two different methods: With dedicated tools for quickly verifying end-to-end job submission, and From an existing HTCondor submit host, useful for developing pilot submission infrastructure If you are the administrator of an HTCondor-CE, consider verifying your HTCondor-CE using the administrator-focused documentation . Before Starting \u00b6 Before attempting to submit jobs to an HTCondor-CE as documented below, ensure the following: The HTCondor-CE administrator has independently verified their HTCondor-CE The HTCondor-CE administrator has added your credential information (e.g. SciToken or grid proxy) to the HTCondor-CE authentication configuration Your credentials are valid and unexpired Submission with Debugging Tools \u00b6 The HTCondor-CE client contains debugging tools designed to quickly test an HTCondor-CE. To use these tools, install the RPM package from the relevant Yum repository : root@host # yum install htcondor-ce-client Verify end-to-end submission \u00b6 The HTCondor-CE client package includes a debugging tool that perform tests of end-to-end job submission called condor_ce_trace . To submit a diagnostic job with condor_ce_trace , run the following command: user@host $ condor_ce_trace --debug Replacing with the hostname of the CE you wish to test. On success, you will see Job status: Completed and the job's environment on the worker node where it ran. If you do not see the expected output, refer to the troubleshooting guide . CONDOR_CE_TRACE_ATTEMPTS For a busy site cluster, it may take longer than the default 5 minutes to test end-to-end submission. To extend the length of time that condor_ce_trace waits for the job to complete, prepend the command with _condor_CONDOR_CE_TRACE_ATTEMPTS=