Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Some PLDM questions about WarmReboot #300

Open
ChicagoDuan opened this issue Oct 20, 2023 · 4 comments
Open

Some PLDM questions about WarmReboot #300

ChicagoDuan opened this issue Oct 20, 2023 · 4 comments

Comments

@ChicagoDuan
Copy link
Member

We've noticed that the current PLDM code only supports WarmReboot[1], which, as we understand it, means that when you reboot, the motherboard's power remains on (it doesn't go through the chassis power-off process).

Why PLDM has this constraint? Is it because of the hardware design constraints of the Rainier (p10) platform that led to the choice of only supporting WarmReboot?

Additionally, in the code of phosphor-state-manager[2], we see that WarmReboot can be mapped to ColdReboot, and it mentions that "Some systems do not support a warm reboot." Is it due to motherboard design constraints that some of the systems mentioned here do not support warm reboot?

Which systems do not support WarmReboot, which ones do, and how can they be distinguished?

[1] https://github.com/ibm-openbmc/pldm/blob/1050/oem/ibm/libpldmresponder/oem_ibm_handler.cpp#L1534
[2] https://github.com/ibm-openbmc/phosphor-state-manager/blob/1050/host_state_manager.cpp#L100

@geissonator
Copy link
Contributor

Why PLDM has this constraint? Is it because of the hardware design constraints of the Rainier (p10) platform that led to the choice of only supporting WarmReboot?

I think the only use case we had was for the host to do warm reboots. Warm reboots are faster so should be the default. Do you have a need for a cold reboot? We could look into a mechanism over PLDM if needed. There is no reason we couldn't do it.

Additionally, in the code of phosphor-state-manager[2], we see that WarmReboot can be mapped to ColdReboot, and it mentions that "Some systems do not support a warm reboot." Is it due to motherboard design constraints that some of the systems mentioned here do not support warm reboot?

This was just our witherspoon system that could not support this. It was an issue we found when we introduced warm reboots (for P10) and did not see the effort to debugging it on P9 being worth it.

@jaypadath
Copy link
Contributor

As Andrew mentioned, since the WarmReboot needs to be faster, the current design of PLDM is to fetch all the PLDM PDRs from the host once we reach back after reboot. So if the chassis power off happens, this design won't work and needs a change in the current PLDM mechanism.

@mzipse
Copy link

mzipse commented Nov 28, 2023

@ChicagoDuan , can we close this?

@ChicagoDuan
Copy link
Member Author

As Andrew mentioned, since the WarmReboot needs to be faster, the current design of PLDM is to fetch all the PLDM PDRs from the host once we reach back after reboot. So if the chassis power off happens, this design won't work and needs a change in the current PLDM mechanism.

Hi @jaypadath, We only use ColdReboot. We have disabled WarmReboot in the phosphor-state-manager:
https://github.com/openbmc/phosphor-state-manager/blob/0886545d42088fbe6241309577a4f11451325cc0/host_state_manager.cpp#L102

What changes do we need to make regarding the mechanism of PLDM you mentioned? Thanks

rfrandse added a commit that referenced this issue Mar 28, 2024
Sandeepa Singh (47):
  Firmware-change (#66)
  Allow only tar file upload (#71)
  Hardware Deconfiguration Page (#84)
  Deconfig-Toggles (#110)
  Filter SNMP data (#112)
  Upload acf certificate on login page (#126)
  Hardware deconfiguration fix (#128)
  TFTP firmware update (#104)
  Add filter to remove absent dimms form GUI (#139)
  Add abiliy to sort hardware deconfig columns (#162)
  Add helptext for FQDN (#164)
  Add deconfiguration type as None (#163)
  Fix link to deconfiguration records (#155)
  Remove regex from firmware (#151)
  Add alert for HMC connection disconnect (#152)
  Update hardware deconfiguration per Demo feedback (#180)
  Remove Default option from Server power operations page (#188)
  reverting removal of Default partition environment dropdown (#190)
  Add Lateral cast out page (#177)
  fix toggle issue (#191)
  Add details on login page (#193)
  Remove TFTP server option from firmware page (#194)
  Real time post codes converted to ASCII (#207)
  fix TFTP bug (#213)
  Show/Hide ACF upload button (#214)
  Fix toggle issue (#219)
  Change the toggle text to configure/deconfigure (#223)
  Fix the sorting issue in progress logs (#240)
  Translate severity to fatal,predictive and manual (#235)
  Add Pel ID column on HW deconfiguration page (#244)
  Show FW_boot_side_current attribute value (#262)
  Added filter to remove 00000000 from post code table (#272)
  Fix toast msg for HW deconfiguration page (#251)
  Add location code of Deconfig records page (#293)
  Make memory page consistent (#308)
  Add pel id column (#332)
  Update service login condition (#326)
  Edit app nav and login file (#335)
  Update Automatic helptext (#340)
  Grey out toggle when DHCP is disabled (#338)
  Disable delete when system is powered on (#327)
  Renamed added optimization page (#346)
  Fix deconfiguration record translation bug (#360)
  Fix power page translation bug (#361)
  Operating mode is translatable now (#363)
  Fix user management page translation bugs (#365)
  Fix server power ops translation bugs (#359)

Kenneth Fullbright (85):
  Removed irrelevant fields from the VET Capabilities table (#68)
  Update Firmware page interactions when system is powered on (#51)
  Updated CSR Modal & Service login Certificate Modal (#59)
  Removed OemIBMServiceAgent from  Group Privilege list (#76)
  Updated Power saver modes descriptions (#83)
  Popup SOL Console (Host Console) not showing correct connection status (#79)
  Removed irrelevant fields from the VET Capabilities table (#93)
  Added Initiate Resource Dump Function (#103)
  Fixed password change/reset code for expired password (#125)
  Fixed global action vuex error getUsers (#120)
  Fixed 'Promise.all' related errors on Overview (#119)
  Renamed "Serial over LAN (SOL) console" page (#54)
  Fixed event log table to be fully responsive (#122)
  Prevent service user password change (#88)
  Turned dumps PHYP alert into a toast (#140)
  Repaired Service login consoles links in the navbar (#145)
  Removed LDAP from navigation on non admin role accounts (#108)
  Updated the link to consoles and other nav related items
  Refactored Power page and power page related things (#109)
  Added Power restore policy missing alert on operating mode manual (#147)
  Made non-service roles not pass default password for resource dumps (#135)
  Fixed BMC Hypervisor console switch (#159)
  Enhanced user creation and current user failed message for password change (#81)
  Fixed translation double key error (#146)
  Removed service privilege option from edit user and add user (#161)
  Enhanced resource dump error messages (#168)
  Refactored Power page code for efficiency and clarity (#158)
  Fixed init system dump from resource dump (#136)
  Added toast for invalid privilege (#172)
  Fixed Service consoles (#176)
  Fix user management delete table action (#179)
  Fixed service account resource dumps password field to allow any string (#183)
  Fixed Idle power saving missing reset button option (#184)
  Removed lower and upper limit and warning sensors (#186)
  Fixed missing fields for add user on user modal (#185)
  Fixed maximum amount of users toast error (#196)
  Fixed delete and replace function in Certificates table (#197)
  Fixed navbar missing error (#206)
  Fixed popup BMC and Hypervisor consoles. (#205)
  Fixed init system dump PHYP in standby check error (#204)
  Fixed closing console conntections. (#220)
  Fixed upload certificate button not being disabled on max certificates (#224)
  Added info tool tips on password changing fields. (#225)
  Removed operator role from add role group modal (#229) [SW550540]
  Removed Operator and NoAccess roles from desciption table (#228) [SW550558]
  Fixed proxy logout error (#226)
  Created info icon for enhanced information about power consumption (#232)
  Fixed some tables not being fully responsive (#222)
  Set autocomplete option to off for password fields (#231)
  Added dump being offloaded warning for reboot and shutdown (#241)
  Fixed system dump error messages (#238)
  Fixed factory reset to default code (#243)
  Changed OemIBMServiceAgent to ServiceAgent (#261)
  Add safe mode to user interface (#250)
  Fixed fresh install set password and login error (#263)
  Fixed DHCP delete button not disabled (#273)
  Removed unsupported ServiceAgent group from LDAP group privilege modal (#268)
  Fixed Zombie state when factory resetting (#270)
  Fixed unauthorized error toast on page loading (#267)
  Fixed firmware swapping confusion (#271)
  Fixed console connection indicators (#275)
  Fixed account polocy settings displaying not updated info on refresh (#276)
  Fixed running and backup image info render problem (#287)
  Fixed event logs not updating upon delete all button (#290)
  Fixed account policy radio buttons (#289)
  Fixed secure LDAP checkbox not showing correct values (#291)
  Fixed firmware update function (#296)
  Fixed JSON.parse error from localStorage (#298)
  Fixed factory reset function to be fully async (#306)
  Removed host console access from ReadOnly roles (#307)
  Fixed SRC Details not showing on non manual records (#300)
  Fixed page memory validation error (#313)
  Fixed location code not showing on Deconfiguration records table (#317)
  Disabled users from changing username on user management table (#321)
  Added Location codes for TPM (#324)
  Fixed console indicators not updating status (#304)
  Added Location codes for TPM (#325)
  Made more meaningful toasts (#314)
  Fixed manage access keys hyperlink being disabled problems on Firmware page (#322)
  Fixed asset tag info not showing up in modal after app refresh and tag update (#333)
  Removed hashes from files (#334)
  Created real time indicator postCodeValue filter (#302)
  Fixed Deconfig table download additional data button (#328)
  Changed page "Lateral cast out" to "Added optimization" (#341)
  Added notices page (#336)

A Nikhil (47):
  Update Inventory DIMM table (#74)
  Update Inventory Assemblies table (#87)
  Update Inventory Processors table (#86)
  Incorrect Power mode value (#89)
  Dumps available on BMC are not displayed on BMC-GUI (#72)
  Components on the hardware page not in order (#101)
  No values populated for licensed and configured cores (#91)
  Update GUI as IBM (#116)
  Rename Update Firmware access key (#117)
  Health and state field of assembly components is missing in inventory page (#99)
  Event logs add missing information (#111)
  GUI has no way to turn off System attention LED (#129)
  Event log does not show information for service (#133)
  GUI missing detailed COD (#124)
  Rename count in system table (#149)
  FCO page accepts value greater than the number of licensed cores (#142)
  Part number field is showing spare part number value (#165)
  Wrong lable on SRC for logs (#156)
  Inventory and LEDs page has two system entries (#137)
  Add toggle to enable/disable the secure version lock in (#167)
  Factory reset option should only be provided at power off (#174)
  Health in critical state after marking critical errors as resolved (#189)
  Concurrent maintenance Page (#202)
  Download implementation in Event logs (#192)
   Missing host USB enable/disable (#239)
  Prevent system power on when BMC is not in Ready state (#227)
  Adding mex chassis Info (#233)
  Mex IO enclosure firmware version not displayed (#265)
  PCIe Hardware Topology (#181)
  Warning in PcieTopology.vue (#282)
  Pcie-topology and Inventory fixes (#288)
  Unable to edit group name in the Add Role group field. (#303)
  PCIe Topology Save changes (#309)
  Invalid range for I/O Adapter enlarged capacity (#311)
  Status for both system and chassis comes as absent at host power off state (#312)
  Status for system table should be Present (#320)
  Fixed Identify LED error in MEX chassis (#330)
  Assemblies section does not has search option in Inventory page (#315)
  PCIe link width for empty slots is showing as -1 (#319)
  Warning message only in manual mode (#323)
  Fixed incorrect Identity LEDs error message (#331)
  Unwanted fields for MEX components removed (#329)
  PCIe topology performance improved (#337)
  AIX/LINUX and IBM i partition are only for non-HMC manage system (#318)
  Severity values is now translatable (#357)
  Enabled value taken from translation file (#362)
  Removed .tar.xz extension from dumps (#410)

whitesource-ets[bot] (1):
  Add .whitesource configuration file

sandeepasingh116 (17):
  Add new toggles on CM page (#3)
  Changed connection status logic for Hypervisor console (#6)
  Remove dump download option from overview page (#9)
  Add text on user management page (#8)
  Rename the save setting button (#20)
  Add success toast (#18)
  Fix network eth1 error (#21)
  Disable date and time page (#24)
  Update password helptext (#19)
  Add info tooltip to frequency cap (#25)
  Read only user will not be able to toggle switches (#28)
  Make filters translatable (#33)
  Fix translations of vet capabilities (#35)
  fix english texts containing links (#38)
  Remove service login label for read only user (#45)
  fix translation defect for server power ops (#52)
  add toogle on Policies page (#73)

Reed Frandsen (1):
  Removed alert message from Update firmware component (#90)

Gunnar Mills (3):
  Enable hmc proxy (#208)
  Update notices to 1030 (#50)
  Revert "Refresh only once after login (#42)" (#59)

Nikhil Ashoka (33):
  pdated the text of server power ops documentation (#7)
  Displaying Sensors table one row at a time (#11)
  NTP server duplicate entry is not accepted (#4)
  Fabric Adapters Info in Inventory page (#12)
  Fixed Secure LDAP using SSL checkbox value (#2)
  Added progress bar for activate access key (#1)
  Error message displayed if fails to authenticate the user (#10)
  Memory page made HMC-managed independent (#15)
  Sorting fixed for status (#17)
  Sensors table now updating on refresh (#22)
  Secure LDAP is disabled when LDAP authentication disables (#23)
  Removed Service consoles page for read-only users (#14)
  Additional message added on Disable SSH (#30)
  Default partition value taken from translation file (#36)
  Updated password Max Limit (#26)
  New Error message displayed if fails to authenticate the user (#27)
  Added Status and roles values to the translation file (#31)
  Title translation (#34)
  Power values added to translation file (#32)
  Health and Date format taken from translation file (#37)
  Added possible property values in translation file (#39)
  Displaying System Anchor value (#40)
  Added Info tooltip to VirtualTPM (#47)
  Added max limit based on selected user (#46)
  Refresh only once after login (#42)
  Lamp test switch disabled once ON (#48)
  Tab names translated in Inventory page (#54)
  Using privilege values from the translation file (#56)
  Deconfiguration type is taken from translation file (#57)
  Fabric Adapter table showing Name (#55)
  PCIe topology overlapping fix (#53)
  Added Identity LED to Fabric Adapters (#49)
  Removed Error message from Accounts verification (#44)

Dixsie Wolmers (14):
  Fix network settings defects - FQDN, link info, and MAC address (#113)
  Audit translation file (#115)
  Network settings - update DHCP section (#114)
  Add deconfiguration logs page (#121)
  Fix host console route (#157)
  Fix language dropdown on login page (#166)
  Network settings fixes - dhcp modal, edit ipv4, default gateway (#175)
  Update deconfig log table (#200)
  Update  network settings ipv4 table (#199)
  Fix network settings hostname and IUM errors (#210)
  Add  ability to edit asset tag (#211)
  Fix LDAP form values when LDAP disabled - SW546990 (#245)
  Fix deconfig records defects (#246)
  Update maintainers - Remove Dixsie and add Sandeepa (#286)

aixt9n aixt9n (2):
  i18n: KO_KR: Drop latest translated files for webui-vue (#257)
  i18n: ES_ES: Drop latest translated files for webui-vue (#258)

Change-Id: Ib5cb6cfccace5b718d22173ff1df4e8ce2a1e05c
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants