-
Notifications
You must be signed in to change notification settings - Fork 172
Rcons_hierachy_support
{{:Design Warning}}
1. What are the problems we were seeing in scaling environment?
We were seeing some rcons related problems when doing the scaling cluster setup and administration, we have to restart conserver from time to time.
-
the conserver will start responding slow after the conserver have been running for a while(maybe several days, I am not so sure), when the conserver responds slow, it probably takes more than 5 or even 10 seconds to open the node console, caused the rnetboot and getmacs timeout, or occasionally the rcons can not open the consoles for the nodes at all. we have to restart the conserver to fix the problem.
-
The conserver restart will take a very long time, about 5 minutes, to finish the initialization with 1K nodes, during the conserver initialization, the rcons will get "Connection refused" error.
-
Even xCAT has set the "consoleondeman" attribute, but the conserver will fork a lot of child daemons to handle the consoles defined in /etc/conserver.cf, each conserver child daemon can handle only about 15 consoles, the conserver child daemons will probably be causing performance problems on the management nodes in scaling environment. Here is an example:
c906mgrs2:~ # cat /etc/conserver.cf | grep "console c906" | wc -l 1204 c906mgrs2:~ # ps -ef | grep "conserver -o -O1 -d" | wc -l 78 c906mgrs2:~ #
We can see that with 1024 consoles defined in the /etc/conserver.cf, the conserver needs to fork 78-1=77 child daemons.
2. How the problems are caused? The current rcons/makeconservercf implementation does not support hierarchy very well, this should be the root cause of why the conserver overloaded the management node. 1) The current makeconservercf implementation will add all the nodes console definitions into /etc/conserver.cf on the management node, and the nodes console definitions will also be sent to the nodes' conserver-host. It means that all the nodes in the cluster will be in the /etc/conserver.cf on the management node.
- We can specify the conserver-host through rcons command line, but the rcons will use the xCAT management node as the default conserver host if the conserver-host is not specified, so the rcons command will always connect to the conserver on the management node by default.
3. What changes I am planning to make?
-
Add a new flag -c|--conserver to makeconservercf to only add the nodes into the /etc/conserver.cf on the node's conserver host. The default behaviour of makeconserver will not be changed if -c is not specified. The -c flag can not be used with -l flag.
-
Change the rcons to read the node's conserver attribute and run "console -M <conserver> <nodename>" to connect to the conserver on the nodes' conserver host. The priority list of the possible -M values are: user specified parameters, noderes.conserver (don't you mean nodehm.conserver??), $XCATHOST, localhost.
-
The conserver can be configured on the number of consoles each daemon can handle. The documentation will be updated to include the instructions.
4. Future considerations. For now, the conserver-host must be a servicenode because makeconsercf needs to use the xcatclient->xcatd communication to send the nodes console definitions to the conserver-host. It makes sense because the conserver-host is providing service to the nodes so it should be a "service"node. But tt will be easier for the users if we could eliminate the requirement that the conserver-host must be a service node, the users can simply install the conserver on designated servers and then the servers can act as the conserver-hosts, it provides flexibility and simplicity for the setup. But I think we do not need to do this in xCAT 2.4, and I am not even sure whether we need do this in the future.
5. Alternative design The rcons is NOT a command that goes through the xcatd, so the above design is not a completely hierarchy support from xCAT perspective, an alternative design is to change rcons to a program going through xcatd and the rcons request be sent to the conserver-host, but I guess there will be some issues, because rcons is a little bit different with the other commands, it need to read output from the conserver-host continuously. I do not see any performance advantages by using the xcatclient->xcatd communication instead of the console->conserver communication. And it will need structural changes for the rcons/makeconservercf logic. So I do not think we can or need to go with this alternative design.
- Nov 13, 2024: xCAT 2.17 released.
- Mar 08, 2023: xCAT 2.16.5 released.
- Jun 20, 2022: xCAT 2.16.4 released.
- Nov 17, 2021: xCAT 2.16.3 released.
- May 25, 2021: xCAT 2.16.2 released.
- Nov 06, 2020: xCAT 2.16.1 released.
- Jun 17, 2020: xCAT 2.16 released.
- Mar 06, 2020: xCAT 2.15.1 released.
- Nov 11, 2019: xCAT 2.15 released.
- Mar 29, 2019: xCAT 2.14.6 released.
- Dec 07, 2018: xCAT 2.14.5 released.
- Oct 19, 2018: xCAT 2.14.4 released.
- Aug 24, 2018: xCAT 2.14.3 released.
- Jul 13, 2018: xCAT 2.14.2 released.
- Jun 01, 2018: xCAT 2.14.1 released.
- Apr 20, 2018: xCAT 2.14 released.
- Mar 14, 2018: xCAT 2.13.11 released.
- Jan 26, 2018: xCAT 2.13.10 released.
- Dec 18, 2017: xCAT 2.13.9 released.
- Nov 03, 2017: xCAT 2.13.8 released.
- Sep 22, 2017: xCAT 2.13.7 released.
- Aug 10, 2017: xCAT 2.13.6 released.
- Jun 30, 2017: xCAT 2.13.5 released.
- May 19, 2017: xCAT 2.13.4 released.
- Apr 14, 2017: xCAT 2.13.3 released.
- Feb 24, 2017: xCAT 2.13.2 released.
- Jan 13, 2017: xCAT 2.13.1 released.
- Dec 09, 2016: xCAT 2.13 released.
- Dec 06, 2016: xCAT 2.9.4 (AIX only) released.
- Nov 11, 2016: xCAT 2.12.4 released.
- Sep 30, 2016: xCAT 2.12.3 released.
- Aug 19, 2016: xCAT 2.12.2 released.
- Jul 08, 2016: xCAT 2.12.1 released.
- May 20, 2016: xCAT 2.12 released.
- Apr 22, 2016: xCAT 2.11.1 released.
- Mar 11, 2016: xCAT 2.9.3 (AIX only) released.
- Dec 11, 2015: xCAT 2.11 released.
- Nov 11, 2015: xCAT 2.9.2 (AIX only) released.
- Jul 30, 2015: xCAT 2.10 released.
- Jul 30, 2015: xCAT migrates from sourceforge to github
- Jun 26, 2015: xCAT 2.7.9 released.
- Mar 20, 2015: xCAT 2.9.1 released.
- Dec 12, 2014: xCAT 2.9 released.
- Sep 5, 2014: xCAT 2.8.5 released.
- May 23, 2014: xCAT 2.8.4 released.
- Jan 24, 2014: xCAT 2.7.8 released.
- Nov 15, 2013: xCAT 2.8.3 released.
- Jun 26, 2013: xCAT 2.8.2 released.
- May 17, 2013: xCAT 2.7.7 released.
- May 10, 2013: xCAT 2.8.1 released.
- Feb 28, 2013: xCAT 2.8 released.
- Nov 30, 2012: xCAT 2.7.6 released.
- Oct 29, 2012: xCAT 2.7.5 released.
- Aug 27, 2012: xCAT 2.7.4 released.
- Jun 22, 2012: xCAT 2.7.3 released.
- May 25, 2012: xCAT 2.7.2 released.
- Apr 20, 2012: xCAT 2.7.1 released.
- Mar 19, 2012: xCAT 2.7 released.
- Mar 15, 2012: xCAT 2.6.11 released.
- Jan 23, 2012: xCAT 2.6.10 released.
- Nov 15, 2011: xCAT 2.6.9 released.
- Sep 30, 2011: xCAT 2.6.8 released.
- Aug 26, 2011: xCAT 2.6.6 released.
- May 20, 2011: xCAT 2.6 released.
- Feb 14, 2011: Watson plays on Jeopardy and is managed by xCAT!
- xCAT OS And Hw Support Matrix
- Oct 22, 2010: xCAT 2.5 released.
- Apr 30, 2010: xCAT 2.4 is released.
- Oct 31, 2009: xCAT 2.3 released. xCAT's 10 year anniversary!
- Apr 16, 2009: xCAT 2.2 released.
- Oct 31, 2008: xCAT 2.1 released.
- Sep 12, 2008: Support for xCAT 2 can now be purchased!
- June 9, 2008: xCAT breaths life into (at the time) the fastest supercomputer on the planet
- May 30, 2008: xCAT 2.0 for Linux officially released!
- Oct 31, 2007: IBM open sources xCAT 2.0 to allow collaboration among all of the xCAT users.
- Oct 31, 1999: xCAT 1.0 is born!
xCAT started out as a project in IBM developed by Egan Ford. It was quickly adopted by customers and IBM manufacturing sites to rapidly deploy clusters.