Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add 4 CPUs servers support / Invalid Octal Number #91

Open
vanvonlj opened this issue May 26, 2024 · 10 comments
Open

Add 4 CPUs servers support / Invalid Octal Number #91

vanvonlj opened this issue May 26, 2024 · 10 comments
Assignees

Comments

@vanvonlj
Copy link

When running the container, an invalid octal number error is thrown during execution of the shell script.

It appears that all the values are in the echo, so I'm unsure what's causing this.

username: root
iDRAC/IPMI password:
Server model: DELL PowerEdge R930
iDRAC/IPMI host:
Fan speed objective: 5%
CPU temperature threshold: 28°C
Check interval: 300s
------- Temperatures -------
Date & time Inlet CPU 1 CPU 2 Exhaust Active fan speed profile Third-party PCIe card Dell default cooling response Comment
26-05-2024 03:09:23 22°C 0°C 29°C 28°C Dell default dynamic fan control profile Enabled -
./Dell_iDRAC_fan_controller.sh: line 159: printf: 09: invalid octal number

@tigerblue77 tigerblue77 self-assigned this Nov 10, 2024
@tigerblue77
Copy link
Owner

tigerblue77 commented Nov 10, 2024

Hello @vanvonlj, sorry for the delay of my reply. Are you able to identify the line of my script printing these error messages please ? (I'll try to use it on my new Dell T630 and see if it print the same errors)

@vanvonlj
Copy link
Author

The error is in 159, but I’m not sure which value is an invalid octal number. Looks like it’s the CPU1 reading (displays as 0 Celsius above)

@ctark
Copy link

ctark commented Dec 1, 2024

Hey,

I just had the same error and after some debugging I noticed that my CPU1 temperature was reading 0*C because the data indexes were off.

I have an R930, iDRAC 8, v2.86.86.86

Output of CPU_DATA for me is:

CPU_DATA= 09 38 06 39 07 37 08 37
CPU1 debug= 09
CPU2 debug= 38 

Here is the output of my ipmitool:

ipmitool ... sdr type temperature | grep degrees | grep "3\."
Temp             | 09h | ok  |  3.4 | 38 degrees C
Temp             | 06h | ok  |  3.1 | 39 degrees C
Temp             | 07h | ok  |  3.2 | 38 degrees C
Temp             | 08h | ok  |  3.3 | 37 degrees C 

Perhaps we can filter only the 5th column using cut?

ipmitool ... sdr type temperature | grep degrees | grep "3\." | cut -d'|' -f5 | grep -Po '\d{2}'
39
39
38
38 

@tigerblue77
Copy link
Owner

tigerblue77 commented Dec 1, 2024

Hello @ctark, thanks for your detailed message. #56 should have fixed this issue but I forgot to close it. Can you please test the latest container version and tell me ?

@ctark
Copy link

ctark commented Dec 2, 2024

Hey @tigerblue77,
I still receive the error when using :latest tag.
Unsure if it's because the R930 is quad processor, so the output is slightly different from a dual proc R530/R730?

@ctark
Copy link

ctark commented Dec 2, 2024

So after doing some testing it looks like you should be fine to just use the below. Tested on an R530 and R930 (1 and 4 processors respectively). It should solve your trouble of indexing, and because you are matching the 3.x Sensors, you know they are CPU temp's.

local CPU_DATA=$(echo "$DATA" | grep "3\." | cut -d'|' -f5 | grep -Po '\d{2}')
CPU1_TEMPERATURE=$(echo $CPU_DATA | awk "{print $1}")
CPU2_TEMPERATURE=$(echo $CPU_DATA | awk "{print $2}")

If you had plans to add 4 processor support, you could even make it into a for loop:

local CPU_DATA=$(echo "$DATA" | grep "3\." | cut -d'|' -f5 | grep -Po '\d{2}')
declare -i CPU_NUM=$(echo "$CPU_DATA" | wc -l)  
for ((i=1;i<=CPU_NUM;i++)); do
    declare "CPU"$i"_TEMPERATURE"=$(echo $CPU_DATA | awk "{print \$$i;}")
done

Or instead of CPU_NUM, set the loop between 1 and 4, then you can eval an empty string and set the IS_CPU2_TEMPERATURE_SENSOR_PRESENT to false?

EDIT:
As to the above point, loop through 4 possible temperature readings, compare the value, and set sensor_present depending on value.
Perhaps put this as a separate function that is ran once on script startup, as you already are calling retrieve_temperatures() ?

local CPU_DATA=$(echo "$DATA" | grep "3\." | cut -d'|' -f5 | grep -Po '\d{2}')
declare -gi CPU_NUM=$(echo "$CPU_DATA" | wc -l)  
for ((i=1;i<=4;i++)); do
    local VALUE=$(echo $CPU_DATA | awk "{print \$$i;}")
        if [ -z "$VALUE" ]; then
            echo "No CPU $i temperature sensor detected."
            declare -g "IS_CPU"$i"_TEMPERATURE_SENSOR_PRESENT"=false
        else
            declare "CPU"$i"_TEMPERATURE"=$VALUE
            declare -g "IS_CPU"$i"_TEMPERATURE_SENSOR_PRESENT"=true
        fi
done

@tigerblue77 tigerblue77 changed the title Invalid Octal Number Add 4 CPUs servers support / Invalid Octal Number Dec 29, 2024
@tigerblue77
Copy link
Owner

tigerblue77 commented Dec 29, 2024

Note for this issue : when 4 CPU, use the same IDs as Gen 14 servers :

readonly CPU1_TEMPERATURE_INDEX=2
readonly CPU2_TEMPERATURE_INDEX=4
readonly CPU3_TEMPERATURE_INDEX=6
readonly CPU4_TEMPERATURE_INDEX=8

3 CPU might be possible (tell me if I'm wrong)

@vanvonlj
Copy link
Author

vanvonlj commented Dec 30, 2024

Agreed, I updated the Gen Check Regex to include Gen 13 - All 4 CPU Temperature Readings are available.

# If server model is Gen 14 (*40) or newer
if [[ $SERVER_MODEL =~ .*[RT][[:space:]]?[0-9][3-9]0.* ]]; then

@tigerblue77
Copy link
Owner

tigerblue77 commented Dec 30, 2024

Good morning @vanvonlj,
AFAIK, this easy fix is not the way to go as, if I understood correctly what @7Adrian and @ctark told me yesterday :

  • Gen 13 IDs when 1-2 CPUs are pre-Gen 14 IDs
  • Gen 13 IDs when 3-4 CPUs are Gen 14 plus IDs

Also, when you say "I updated", which code are you about ? Do you have a fork or pull request to mention ? Even if it's some draft code, feel free to propose it 🙂

@ctark
Copy link

ctark commented Dec 30, 2024

Agreed, I updated the Gen Check Regex to include Gen 13 - All 4 CPU Temperature Readings are available.

# If server model is Gen 14 (*40) or newer
if [[ $SERVER_MODEL =~ .*[RT][[:space:]]?[0-9][3-9]0.* ]]; then

Good idea but only issue with this is the R930 is still a 13th gen server, just using a different index value. And below the check is a variable, to state if it’s 14 gen or newer as they removed some functions such as 3rd party pcie slot cooling.

7Adrian@628d40d
There is an idea of how to add an edge case for the 830 and 930 CPU’s and can be expanded if/when we know earlier generations do the same indexing.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants