Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

WMAgent: Implement all Oracle functionalities needed for the WMAgent initialization #11720

Closed
todor-ivanov opened this issue Sep 14, 2023 · 8 comments
Assignees

Comments

@todor-ivanov
Copy link
Contributor

todor-ivanov commented Sep 14, 2023

Impact of the new feature
WMAgent

Fixed by: dmwm/CMSKubernetes#1451

Is your feature request related to a problem? Please describe.
This issue is a followup on #11627 and is part of #11314
While working on the WMAgent initialization process we were focusing mostly on separating the MariaDB database from the agent image and once this was done we started adding new functionalities such as:

  • Exporting the whole database schema once the agent has fully initialized
  • Checking the database schema upon agent restart and preventing from running with a schema different than the one which the agent was originally initialized
  • Preserving additional initialization information in a metadata table called wma_init

The tests from above are based on the new service model and are implemented mostly for MariaDB while the Oracle implementation drag behind. The current issue is supposed to list all functions that need to have their oracle counterpart developed and also to track their development and testing.

Here follows the list:

  • At manage script:

    • init_wmagent
    • clean_oracle
    • clean_all (to just reference clean_oracle)
  • At manage-common.sh:

    • _status_of_oracle (only test is needed)
    • _sql_write_agentid
    • _sql_db_isclean
    • _sql_dbid_valid
    • _sql_dumpSchema
    • _exec_oracle

Describe the solution you'd like
The list of the above functions to be implemented

Describe alternatives you've considered
None - It is a must do

Additional context
Depends on:

Part of the following meta issue:

@todor-ivanov
Copy link
Contributor Author

Just to log here where I stand with the progress on this issue:

Following our Oracle contact's advice from last week, I am currently testing the possibilities to fetch all user objects from the database and to create a comparison view for the initial and current states. :

select * from user_objects;
select * from user_tables; 
select * from USER_TAB_COLS;
select * from user_indexes;
select * from  user_constraints ;

Again following her advice: If we need to generate and compare definition of the objects:

SELECT DBMS_METADATA.get_ddl(object_Type, object_name, owner) FROM ALL_OBJECTS WHERE OWNER = 'OWNER_NAME';

or

SELECT DBMS_METADATA.GET_DDL('TABLE', TABLE_NAME) FROM USER_TABLES;
SELECT DBMS_METADATA.GET_DDL('INDEX', INDEX_NAME) FROM USER_INDEXES WHERE INDEX_TYPE ='NORMAL';

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Mar 5, 2024

And here is where we stand with the current issue:

  • I managed to develop and test every Orcale functionalities needed in the manage script
  • I am still to develop the few extra functions in manage-common.sh which are to be used for checking of statuses and some more.

FYI @amaltaro @khurtado - please read the all 3 bullets of this long log printout from bellow.

Here is the proof of concept:

  • Trying to run it on a non-empty Oracle database:
cmst1@vocms0290:wmagent $ ./wmagent-docker-run.sh -t 2.3.0 && docker logs -f wmagent 
Checking if there is no other wmagent container running and creating a link to the 2.3.0 in the host mount area.
Starting the wmagent:2.3.0 docker container with the following parameters:  -t 2.3.0
68f82687d3a889ad4b6284d2826a275c7cf1d499bc3a60ae68b226be8916a7e5
Start initialization

=======================================================
Starting WMAgent with the following initialisation data:
-------------------------------------------------------
 - WMAgent Version            : 2.3.0
 - WMAgent User               : cmst1
 - WMAgent Root path          : /data
 - WMAgent Host               : vocms0290.cern.ch
 - WMAgent TeamName           : testbed-vocms0290
 - WMAgent Number             : 0
 - WMAgent Relational DB type : oracle
 - Python  Version            : Python 3.8.16
 - Python  Module path        : /usr/local/lib/python3.8/site-packages
=======================================================

-------------------------------------------------------
Start: Performing basic_checks

Done: Performing basic_checks
-------------------------------------------------------

check_wmasecrets: Checking for changes in the WMAgent.secrets file
check_wmasecrets: No change found.
-------------------------------------------------------
Start: Performing checks for successful Docker initialisation steps...
WMA_BUILD_ID: 271b91f725becb73bbc506845bc8c1c350c74694a5f60720172df7f3dc360c85
dockerInitId: /data/srv/wmagent/2.3.0/config/.initActive
/data/srv/wmagent/2.3.0/config/.initAdmin
/data/srv/wmagent/2.3.0/config/.initAgent
/data/srv/wmagent/2.3.0/config/.initConfig
/data/srv/wmagent/2.3.0/config/.initCouchDB
/data/srv/wmagent/2.3.0/config/.initResourceControl
/data/srv/wmagent/2.3.0/config/.initRucio
/data/srv/wmagent/2.3.0/config/.initSqlDB
/data/srv/wmagent/2.3.0/config/.initUpload
/data/srv/wmagent/2.3.0/config/.initUsing
WARNING: dockerInit vs buildId mismatch
-------------------------------------------------------
Start: Performing Docker image to Host initialisation steps
deploy_to_host: Copy the proper manage file
'/usr/local/bin/manage' -> '/data/srv/wmagent/2.3.0/config/manage'
deploy_to_host: Initialise && Validate && Load WMAgent.secrets
deploy_to_host: checking /data/admin/wmagent/WMAgent.secrets
deploy_to_host: Initialise Rucio config
Done: Performing Docker image to Host initialisation steps
-------------------------------------------------------
-------------------------------------------------------
Start: Performing local Docker image initialisation steps
deploy_to_host: Checking Certificates and Proxy
_renew_proxy: Checking Certificate lifetime:
_renew_proxy: Certificate end date: Nov  1 17:41:15 2024 GMT
_renew_proxy: Checking myproxy lifetime:
_renew_proxy: myproxy end date: Mar 12 16:54:01 2024 GMT
MyProxy v6.2 Aug 2019 PAM SASL KRB5 LDAP VOMS OCSP
Attempting to connect to 2001:1458:d00:e::100:32:7512 
Attempting to connect to 2001:1458:201:a4::100:3d3:7512 
Successfully connected to myproxy.cern.ch:7512 
using trusted certificates directory /etc/grid-security/certificates
Using Host cert file (/data/certs/servicecert.pem), key file (/data/certs/servicekey.pem)
server name: /DC=ch/DC=cern/OU=computers/CN=px502.cern.ch
checking that server name is acceptable...
server name matches "myproxy.cern.ch"
authenticated server name is acceptable
A credential has been received for user amaltaro in /data/certs/mynewproxy.pem.
Your identity: ***
Contacting  ***] "cms" Done
Creating proxy  Done

.....................................................
Warning: your certificate and proxy will expire Tue Mar 12 17:47:22 2024
which is within the requested lifetime of the proxy
_renew_proxy: ERROR: Failed to renew expired myproxy
Done: Performing local Docker image initialisation steps
-------------------------------------------------------
_check_oracle: Checking whether the Oracle server is reachable ...
_status_of_oracle:

SQL*Plus: Release 21.0.0.0.0 - Production on Tue Mar 5 17:47:23 2024
Version 21.5.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.

Last Successful login time: Tue Mar 05 2024 16:54:03 +00:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.21.0.0.0

SQL> SQL> 
	 1
----------
	 1

SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.21.0.0.0
_status_of_oracle: Oracle connection is OK!
_check_oracle: Checking whether the Oracle database is clean and not used by other agents ...
_sql_db_isclean: Checking if the current SQL Database is clean and empty.
Not implemented
_check_couch: Checking whether the CouchDB database is reachable...
_status_of_couch:
{"couchdb":"Welcome","version":"3.2.2","git_sha":"d5b746b7c","uuid":"18f53118737ed74893055db0ffa972e2","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"]}
_status_of_couch: CouchDB connection is OK!
-------------------------------------------------------
Start: Performing activate_agent
activate_agent: triggered.
'/usr/local/etc/WMAgentConfig.py' -> '/data/srv/wmagent/2.3.0/config/config-template.py'
Done: Performing activate_agent
-------------------------------------------------------
-------------------------------------------------------
Start: Performing init_agent
init_agent: triggered.
Initialising Agent...
init_wmagent: Using ORACLE user schema: ACCT@ALIAS 
DEBUG:root:Log file ready
DEBUG:root:Using SQLAlchemy v.1.4.52
INFO:root:Instantiating base WM DBInterface
DEBUG:root:Problem creating database table 

CREATE TABLE wmbs_fileset (
                 id          INTEGER      NOT NULL,
                 name        VARCHAR(1250) NOT NULL,
                 open        CHAR(1)      CHECK (open IN ('0', '1' )) NOT NULL,
                 last_update INTEGER      NOT NULL
                 ) 

(cx_Oracle.DatabaseError) ORA-00955: name is already used by an existing object
[SQL: CREATE TABLE wmbs_fileset (
                 id          INTEGER      NOT NULL,
                 name        VARCHAR(1250) NOT NULL,
                 open        CHAR(1)      CHECK (open IN ('0', '1' )) NOT NULL,
                 last_update INTEGER      NOT NULL
                 ) ]
(Background on this error at: https://sqlalche.me/e/14/4xp6)
checking default database connection
default database connection tested
Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
cx_Oracle.DatabaseError: ORA-00955: name is already used by an existing object

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/usr/local/lib/python3.8/site-packages/WMCore/Database/DBCreator.py", line 55, in execute
    self.dbi.processData(self.create[i],
  File "/usr/local/lib/python3.8/site-packages/WMCore/Database/DBCore.py", line 148, in processData
    r = self.executebinds(i, connection=connection,
  File "/usr/local/lib/python3.8/site-packages/WMCore/Database/DBCore.py", line 62, in executebinds
    resultProxy = connection.execute(s)
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1370, in execute
    return self._exec_driver_sql(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1674, in _exec_driver_sql
    ret = self._execute_context(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(
  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)
sqlalchemy.exc.DatabaseError: (cx_Oracle.DatabaseError) ORA-00955: name is already used by an existing object
[SQL: CREATE TABLE wmbs_fileset (
                 id          INTEGER      NOT NULL,
                 name        VARCHAR(1250) NOT NULL,
                 open        CHAR(1)      CHECK (open IN ('0', '1' )) NOT NULL,
                 last_update INTEGER      NOT NULL
                 ) ]
(Background on this error at: https://sqlalche.me/e/14/4xp6)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/usr/local/bin/wmcore-db-init", line 158, in <module>
    create(cfgObject)
  File "/usr/local/bin/wmcore-db-init", line 133, in create
    wmInit.setSchema(modules, params = params)
  File "/usr/local/lib/python3.8/site-packages/WMCore/WMInit.py", line 167, in setSchema
    createworked = create.execute(conn=myThread.transaction.conn,
  File "/usr/local/lib/python3.8/site-packages/WMCore/WMBS/CreateWMBSBase.py", line 536, in execute
    DBCreator.execute(self, conn, transaction)
  File "/usr/local/lib/python3.8/site-packages/WMCore/Database/DBCreator.py", line 62, in execute
    raise WMException(msg,'WMCORE-2')
WMCore.WMException.WMException: <@========== WMException Start ==========@>
Exception Class: WMException
Message: Problem creating database table 

CREATE TABLE wmbs_fileset (
                 id          INTEGER      NOT NULL,
                 name        VARCHAR(1250) NOT NULL,
                 open        CHAR(1)      CHECK (open IN ('0', '1' )) NOT NULL,
                 last_update INTEGER      NOT NULL
                 ) 

(cx_Oracle.DatabaseError) ORA-00955: name is already used by an existing object
[SQL: CREATE TABLE wmbs_fileset (
                 id          INTEGER      NOT NULL,
                 name        VARCHAR(1250) NOT NULL,
                 open        CHAR(1)      CHECK (open IN ('0', '1' )) NOT NULL,
                 last_update INTEGER      NOT NULL
                 ) ]
(Background on this error at: https://sqlalche.me/e/14/4xp6)
	ClassName : None
	ModuleName : WMCore.Database.DBCreator
	MethodName : execute
	ClassInstance : None
	FileName : /usr/local/lib/python3.8/site-packages/WMCore/Database/DBCreator.py
	LineNumber : 62
	ErrorNr : WMCORE-2

Traceback: 
  File "/usr/local/lib/python3.8/site-packages/WMCore/Database/DBCreator.py", line 55, in execute
    self.dbi.processData(self.create[i],

  File "/usr/local/lib/python3.8/site-packages/WMCore/Database/DBCore.py", line 148, in processData
    r = self.executebinds(i, connection=connection,

  File "/usr/local/lib/python3.8/site-packages/WMCore/Database/DBCore.py", line 62, in executebinds
    resultProxy = connection.execute(s)

  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1370, in execute
    return self._exec_driver_sql(

  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1674, in _exec_driver_sql
    ret = self._execute_context(

  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1953, in _execute_context
    self._handle_dbapi_exception(

  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 2134, in _handle_dbapi_exception
    util.raise_(

  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/util/compat.py", line 211, in raise_
    raise exception

  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/base.py", line 1910, in _execute_context
    self.dialect.do_execute(

  File "/usr/local/lib/python3.8/site-packages/sqlalchemy/engine/default.py", line 736, in do_execute
    cursor.execute(statement, parameters)

<@---------- WMException End ----------@>
Installing FWJRDump into wmagent_jobdump/fwjrs
Installing FWJRDump app into database: http://127.0.0.1:5984/wmagent_jobdump%2Ffwjrs
Installing JobDump into wmagent_jobdump/jobs
Installing JobDump app into database: http://127.0.0.1:5984/wmagent_jobdump%2Fjobs
Installing WMStatsAgent into wmagent_summary
Installing WMStatsAgent app into database: http://127.0.0.1:5984/wmagent_summary
Installing SummaryStats into stat_summary
Installing SummaryStats app into database: http://127.0.0.1:5984/stat_summary
Setting up cron jobs for the job dump.
Installing WorkQueue into workqueue
Installing WorkQueue app into database: http://127.0.0.1:5984/workqueue
Installing WorkQueue into workqueue_inbox
Installing WorkQueue app into database: http://127.0.0.1:5984/workqueue_inbox
ERROR: Failed to initialise WMAgent databases!
ERROR: init_agent
Start sleeping now ...zzz...
  • Then cleaning the database from the agent:
cmst1@vocms0290:wmagent $ docker exec -it wmagent bash  

(WMAgent-2.3.0) [cmst1@vocms0290:current]$ manage clean-oracle
clean_oracle: Dropping Oracle DB...
execute_command_agent: Executing: clean-oracle  ...
Are you sure you want to wipe out ACCT oracle database (yes/no): yes
Alright, dropping and purging everything

SQL*Plus: Release 21.0.0.0.0 - Production on Tue Mar 5 17:48:04 2024
Version 21.5.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.

Last Successful login time: Tue Mar 05 2024 17:48:01 +00:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.21.0.0.0

SQL> 
Recyclebin purged.

SQL> 
no rows selected

SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.21.0.0.0
Done!

(WMAgent-2.3.0) [cmst1@vocms0290:current]$ 

  • Restarting the agent:
cmst1@vocms0290:wmagent $ ./wmagent-docker-run.sh -t 2.3.0 && docker logs -f wmagent 
Checking if there is no other wmagent container running and creating a link to the 2.3.0 in the host mount area.
Starting the wmagent:2.3.0 docker container with the following parameters:  -t 2.3.0
a85d235ace65cab492cabb5bc4af87b6487bee09ffd361175a751d613f5c0627
Start initialization

=======================================================
Starting WMAgent with the following initialisation data:
-------------------------------------------------------
 - WMAgent Version            : 2.3.0
 - WMAgent User               : cmst1
 - WMAgent Root path          : /data
 - WMAgent Host               : vocms0290.cern.ch
 - WMAgent TeamName           : testbed-vocms0290
 - WMAgent Number             : 0
 - WMAgent Relational DB type : oracle
 - Python  Version            : Python 3.8.16
 - Python  Module path        : /usr/local/lib/python3.8/site-packages
=======================================================

-------------------------------------------------------
Start: Performing basic_checks

Done: Performing basic_checks
-------------------------------------------------------

check_wmasecrets: Checking for changes in the WMAgent.secrets file
check_wmasecrets: No change found.
-------------------------------------------------------
Start: Performing checks for successful Docker initialisation steps...
WMA_BUILD_ID: 271b91f725becb73bbc506845bc8c1c350c74694a5f60720172df7f3dc360c85
dockerInitId: /data/srv/wmagent/2.3.0/config/.initAgent
/data/srv/wmagent/2.3.0/config/.initConfig
/data/srv/wmagent/2.3.0/config/.initCouchDB
/data/srv/wmagent/2.3.0/config/.initResourceControl
/data/srv/wmagent/2.3.0/config/.initSqlDB
/data/srv/wmagent/2.3.0/config/.initUpload
/data/srv/wmagent/2.3.0/config/.initUsing
271b91f725becb73bbc506845bc8c1c350c74694a5f60720172df7f3dc360c85
WARNING: dockerInit vs buildId mismatch
-------------------------------------------------------
Start: Performing Docker image to Host initialisation steps
deploy_to_host: Copy the proper manage file
'/usr/local/bin/manage' -> '/data/srv/wmagent/2.3.0/config/manage'
deploy_to_host: Initialise && Validate && Load WMAgent.secrets
deploy_to_host: Initialise Rucio config
Done: Performing Docker image to Host initialisation steps
-------------------------------------------------------
-------------------------------------------------------
Start: Performing local Docker image initialisation steps
deploy_to_host: Checking Certificates and Proxy
_renew_proxy: Checking Certificate lifetime:
_renew_proxy: Certificate end date: Nov  1 17:41:15 2024 GMT
_renew_proxy: Checking myproxy lifetime:
_renew_proxy: myproxy end date: Mar 12 17:47:22 2024 GMT
MyProxy v6.2 Aug 2019 PAM SASL KRB5 LDAP VOMS OCSP
Attempting to connect to 2001:1458:d00:e::100:32:7512 
Attempting to connect to 2001:1458:201:a4::100:3d3:7512 
Successfully connected to myproxy.cern.ch:7512 
using trusted certificates directory /etc/grid-security/certificates
Using Host cert file (/data/certs/servicecert.pem), key file (/data/certs/servicekey.pem)
server name: /DC=ch/DC=cern/OU=computers/CN=px502.cern.ch
checking that server name is acceptable...
server name matches "myproxy.cern.ch"
authenticated server name is acceptable
A credential has been received for user amaltaro in /data/certs/mynewproxy.pem.
Your identity: /**
Contacting  ***] "cms" Done
Creating proxy  Done

Your proxy is valid until Tue Mar 12 17:48:44 2024
_renew_proxy: OK
Done: Performing local Docker image initialisation steps
-------------------------------------------------------
_check_oracle: Checking whether the Oracle server is reachable ...
_status_of_oracle:

SQL*Plus: Release 21.0.0.0.0 - Production on Tue Mar 5 17:48:44 2024
Version 21.5.0.0.0

Copyright (c) 1982, 2021, Oracle.  All rights reserved.

Last Successful login time: Tue Mar 05 2024 17:48:05 +00:00

Connected to:
Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.21.0.0.0

SQL> SQL> 
	 1
----------
	 1

SQL> Disconnected from Oracle Database 19c Enterprise Edition Release 19.0.0.0.0 - Production
Version 19.21.0.0.0
_status_of_oracle: Oracle connection is OK!
_check_oracle: Checking whether the Oracle database is clean and not used by other agents ...
_sql_db_isclean: Checking if the current SQL Database is clean and empty.
Not implemented
_check_couch: Checking whether the CouchDB database is reachable...
_status_of_couch:
{"couchdb":"Welcome","version":"3.2.2","git_sha":"d5b746b7c","uuid":"18f53118737ed74893055db0ffa972e2","features":["access-ready","partitioned","pluggable-storage-engines","reshard","scheduler"]}
_status_of_couch: CouchDB connection is OK!
-------------------------------------------------------
Start: Performing activate_agent
Done: Performing activate_agent
-------------------------------------------------------
-------------------------------------------------------
Start: Performing init_agent
init_agent: triggered.
Initialising Agent...
init_wmagent: Using ORACLE user schema: ACCT@ALIAS 
....................................................................DEBUG:root:Log file ready
DEBUG:root:Using SQLAlchemy v.1.4.52
INFO:root:Instantiating base WM DBInterface
DEBUG:root:Tables for WMCore.WMBS created
DEBUG:root:Tables for WMCore.Agent.Database created
DEBUG:root:Tables for WMComponent.DBS3Buffer created
DEBUG:root:Tables for WMCore.BossAir created
DEBUG:root:Tables for WMCore.ResourceControl created
checking default database connection
default database connection tested
Installing FWJRDump into wmagent_jobdump/fwjrs
Installing FWJRDump app into database: http://127.0.0.1:5984/wmagent_jobdump%2Ffwjrs
Installing JobDump into wmagent_jobdump/jobs
Installing JobDump app into database: http://127.0.0.1:5984/wmagent_jobdump%2Fjobs
Installing WMStatsAgent into wmagent_summary
Installing WMStatsAgent app into database: http://127.0.0.1:5984/wmagent_summary
Installing SummaryStats into stat_summary
Installing SummaryStats app into database: http://127.0.0.1:5984/stat_summary
Setting up cron jobs for the job dump.
Installing WorkQueue into workqueue
Installing WorkQueue app into database: http://127.0.0.1:5984/workqueue
Installing WorkQueue into workqueue_inbox
Installing WorkQueue app into database: http://127.0.0.1:5984/workqueue_inbox
_sql_write_agentid: Preserving the current WMA_BUILD_ID and HostName at database: wmagent.
Not implemented
_sql_dumpSchema: Dumping the current SQL schema of database: wmagent to /data/srv/wmagent/2.3.0/config/.wmaSchemaFile.sql
_sql_dumpSchema: NOT implemented
Done: Performing init_agent
-------------------------------------------------------
-------------------------------------------------------
Start: Performing agent_tweakconfig
agent_tweakconfig: triggered.
agent_tweakconfig: Making agent configuration changes needed for Docker
agent_tweakconfig: Making other agent configuration changes
Done: Performing agent_tweakconfig
-------------------------------------------------------
-------------------------------------------------------
Start: Performing agent_resource_control
agent_resource_control: triggered.
agent_resource_control: Populating resource-control
agent_resource_control: Adding only T1 and T2 sites to resource-control...
execute_command_agent: Executing: wmagent-resource-control --add-T1s --plugin=SimpleCondorPlugin --pending-slots=50 --running-slots=50 --down ...
Retrieved 7 maps from https://cms-cric.cern.ch/
Adding T1_DE_KIT to the resource control db...
Adding T1_ES_PIC to the resource control db...
Adding T1_FR_CCIN2P3 to the resource control db...
Adding T1_IT_CNAF to the resource control db...
Adding T1_RU_JINR to the resource control db...
Adding T1_UK_RAL to the resource control db...
Adding T1_US_FNAL to the resource control db...
Retrieved 16 PNNs from https://cms-cric.cern.ch/
execute_command_agent: Executing: wmagent-resource-control --add-T2s --plugin=SimpleCondorPlugin --pending-slots=50 --running-slots=50 --down ...
Retrieved 49 maps from https://cms-cric.cern.ch/
Adding T2_AT_Vienna to the resource control db...
Adding T2_BE_IIHE to the resource control db...
Adding T2_BE_UCL to the resource control db...
Adding T2_BR_SPRACE to the resource control db...
Adding T2_BR_UERJ to the resource control db...
Adding T2_CH_CERN to the resource control db...
Adding T2_CH_CERN_HLT to the resource control db...
Adding T2_CH_CERN_P5 to the resource control db...
Adding T2_CH_CSCS to the resource control db...
Adding T2_CN_Beijing to the resource control db...
Adding T2_DE_DESY to the resource control db...
Adding T2_DE_RWTH to the resource control db...
Adding T2_EE_Estonia to the resource control db...
Adding T2_ES_CIEMAT to the resource control db...
Adding T2_ES_IFCA to the resource control db...
Adding T2_FI_HIP to the resource control db...
Adding T2_FR_GRIF to the resource control db...
Adding T2_FR_IPHC to the resource control db...
Adding T2_GR_Ioannina to the resource control db...
Adding T2_HU_Budapest to the resource control db...
Adding T2_IN_TIFR to the resource control db...
Adding T2_IT_Bari to the resource control db...
Adding T2_IT_Legnaro to the resource control db...
Adding T2_IT_Pisa to the resource control db...
Adding T2_IT_Rome to the resource control db...
Adding T2_KR_KISTI to the resource control db...
Adding T2_PK_NCP to the resource control db...
Adding T2_PL_Cyfronet to the resource control db...
Adding T2_PL_Swierk to the resource control db...
Adding T2_PT_NCG_Lisbon to the resource control db...
Adding T2_RU_IHEP to the resource control db...
Adding T2_RU_INR to the resource control db...
Adding T2_RU_ITEP to the resource control db...
Adding T2_RU_JINR to the resource control db...
Adding T2_TR_METU to the resource control db...
Adding T2_TW_NCHC to the resource control db...
Adding T2_UA_KIPT to the resource control db...
Adding T2_UK_London_Brunel to the resource control db...
Adding T2_UK_London_IC to the resource control db...
Adding T2_UK_SGrid_Bristol to the resource control db...
Adding T2_UK_SGrid_RALPP to the resource control db...
Adding T2_US_Caltech to the resource control db...
Adding T2_US_Florida to the resource control db...
Adding T2_US_MIT to the resource control db...
Adding T2_US_Nebraska to the resource control db...
Adding T2_US_Purdue to the resource control db...
Adding T2_US_UCSD to the resource control db...
Adding T2_US_Vanderbilt to the resource control db...
Adding T2_US_Wisconsin to the resource control db...
Retrieved 51 PNNs from https://cms-cric.cern.ch/
Done: Performing agent_resource_control
-------------------------------------------------------
-------------------------------------------------------
Start: Performing agent_upload_config
agent_upload_config: triggered.
agent_upload_config: Tweaking central agent configuration befre uploading
agent_upload_config: Testbed agent, setting MaxRetries to 0...
*** Upload WMAgentConfig to AuxDB ***
execute_command_agent: Executing: wmagent-upload-config {"MaxRetries":0} ...
Pushing the following agent configuration:
{'AgentDrainMode': False,
 'CondorJobsFraction': 0.75,
 'CondorOverflowFraction': 0.2,
 'DiskUseThreshold': 85,
 'IgnoreDisks': ['/mnt/ramdisk'],
 'MaxRetries': 0,
 'NoRetryExitCodes': [70,
                      73,
                      8001,
                      8006,
                      8009,
                      8023,
                      8026,
                      8501,
                      50660,
                      50661,
                      50664,
                      71102,
                      71104,
                      71105],
 'SpeedDrainConfig': {'CondorPriority': {'Enabled': False, 'Threshold': 500},
                      'EnableAllSites': {'Enabled': False, 'Threshold': 200},
                      'NoJobRetries': {'Enabled': False, 'Threshold': 200}},
 'SpeedDrainMode': False,
 'UserDrainMode': False}
Done: Performing agent_upload_config
-------------------------------------------------------
-------------------------------------------------------
Start: Performing checks for successful Docker initialisation steps...
WMA_BUILD_ID: 271b91f725becb73bbc506845bc8c1c350c74694a5f60720172df7f3dc360c85
dockerInitId: 271b91f725becb73bbc506845bc8c1c350c74694a5f60720172df7f3dc360c85
OK

Docker container has been initialised! However you still need to:
  1) Double check agent configuration: less /data/[dockerMount]/srv/wmagent/current/config/config.py
  2) Start the agent by either of the methods bellow:
     a) From inside the already running container
          * Access the running WMAgent container:
            docker exec -it wmagent bash
          * Use the regular manage script inside the container:
            manage start-agent

     b) From the host - by restarting the whole container
          * Kill the currently running container:
            docker kill wmagent
          * Start a fresh instance of wmagent:
            ./wmagent-docker-run.sh -t <WMA_TAG> & 
Have a nice day!

Start sleeping now ...zzz...

@amaltaro
Copy link
Contributor

amaltaro commented Mar 5, 2024

Thanks Todor. I understand that some of the oracle checks are still to be implemented, such that we don't have to hit all those exceptions when creating tables in an initialized database, right?

BTW, I had to edit your previous comment because you dumped too much information in GH, like DN and db connection string, which IMO does not need to be shared here.

@todor-ivanov
Copy link
Contributor Author

Thanks @amaltaro

I had to edit your previous comment

Thanks.

I understand that some of the oracle checks are still to be implemented,

That is correct. But those which are related to the schema validation features, I am about to separate in another issue and we will work on them later. This way the new functionality we add would not stand as a blocker for the T0 team and for us delivering a fully functional Wmagent container.

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Mar 7, 2024

It is All done and tested now. Waiting for the review of: dmwm/CMSKubernetes#1451

The only bit left behind is the proper implementation of the schema validation for Oracle. But since I'd like to have this whole mechanism improved for both MAriaDB and Oracle agents, I created a separate issue: #11925 for this but have not categorized its priority. I'll let others do that. I have stopped those checks for the Oracle agents in the code, such that the work on this feature does not become a blocker for T0 agents or for us to deliver the containerized model before the end of Q1.

FYI: @amaltaro @vkuznet @khurtado @klannon .

@todor-ivanov
Copy link
Contributor Author

As said above - all is done and well tested. I am resolving this issue now.
FYI @amaltaro @vkuznet @khurtado @klannon

@amaltaro
Copy link
Contributor

@todor-ivanov I left my review in the changes you provided in the CMSKubernetes repository. Please also update the description of this issue accordingly, e.g.: this issue has been closed but no changes have been provided to _sql_dumpSchema.

Given the impact of such changes, I would like to ask to wait for a final review approval, before merging changes in. In addition, such changes are likely better to have 2 reviewers.

@todor-ivanov
Copy link
Contributor Author

todor-ivanov commented Mar 12, 2024

hi @amaltaro

In addition, such changes are likely better to have 2 reviewers.

The number of reviewers were indeed two: Valentin and Kenyi

this issue has been closed but no changes have been provided to _sql_dumpSchema.

#11925

In addition it has been listed in the meta issue as well

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Done
Development

No branches or pull requests

2 participants