Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Mellanox] [202311] Add CMIS Host Management Files to 'show techsupport' Dumps #5

Open
wants to merge 71 commits into
base: master
Choose a base branch
from

Conversation

tshalvi
Copy link
Owner

@tshalvi tshalvi commented Aug 28, 2024

What I did

For Mellanox platforms, I added the following CMIS host management-related files to the 'show techsupport' dumps (if they exist): sai.profile, pmon_daemon_control.json, media_settings.json, optics_si_settings.json, and autoneg.status.

How I did it

I copied the relevant files from the SKU/platform folder and ran the 'show interface autoneg status' command to store the auto-negotiation status for all ports.

How to verify it

Run 'show techsupport' and verify that autoneg.status is located in the 'dumps' directory and that the other files are present in the cmis-host-mgmt path within the generated dump.

Previous command output (if the output of a command-line utility has changed)

New command output (if the output of a command-line utility has changed)

rajkumar38 and others added 30 commits December 4, 2023 19:54
* [sflow][db_migrator] Egress Sflow support
Why I did it
Fix issue: sonic-net/sonic-buildimage#15047 of after deleting vlan member and vlan, the counters for for vlan / vlan member are still seen.

How I did it
Delete related counter entry in state_db when deleting vlan and vlan members.

How to verify it
All UTs passed
Manually test

Signed-off-by: Yaqiang Zhu <[email protected]>
**HLD:** sonic-net/SONiC#1501

#### What I did
* Implemented CLI for Generic Hash feature

#### How I did it
* Integrated Generic Hash interface into `config` and `show` CLI root

#### How to verify it
* Run Generic Hash CLI UTs

#### Previous command output (if the output of a command-line utility has changed)
```
root@sonic:/home/admin# show switch-hash global
ECMP HASH          LAG HASH
-----------------  -----------------
DST_MAC            DST_MAC
SRC_MAC            SRC_MAC
ETHERTYPE          ETHERTYPE
IP_PROTOCOL        IP_PROTOCOL
DST_IP             DST_IP
SRC_IP             SRC_IP
L4_DST_PORT        L4_DST_PORT
L4_SRC_PORT        L4_SRC_PORT
INNER_DST_MAC      INNER_DST_MAC
INNER_SRC_MAC      INNER_SRC_MAC
INNER_ETHERTYPE    INNER_ETHERTYPE
INNER_IP_PROTOCOL  INNER_IP_PROTOCOL
INNER_DST_IP       INNER_DST_IP
INNER_SRC_IP       INNER_SRC_IP
INNER_L4_DST_PORT  INNER_L4_DST_PORT
INNER_L4_SRC_PORT  INNER_L4_SRC_PORT
```

#### New command output (if the output of a command-line utility has changed)
```
root@sonic:/home/admin# show switch-hash global
+--------+-------------------------------------+
| Hash   | Configuration                       |
+========+=====================================+
| ECMP   | +-------------------+-------------+ |
|        | | Hash Field        | Algorithm   | |
|        | |-------------------+-------------| |
|        | | DST_MAC           | CRC         | |
|        | | SRC_MAC           |             | |
|        | | ETHERTYPE         |             | |
|        | | IP_PROTOCOL       |             | |
|        | | DST_IP            |             | |
|        | | SRC_IP            |             | |
|        | | L4_DST_PORT       |             | |
|        | | L4_SRC_PORT       |             | |
|        | | INNER_DST_MAC     |             | |
|        | | INNER_SRC_MAC     |             | |
|        | | INNER_ETHERTYPE   |             | |
|        | | INNER_IP_PROTOCOL |             | |
|        | | INNER_DST_IP      |             | |
|        | | INNER_SRC_IP      |             | |
|        | | INNER_L4_DST_PORT |             | |
|        | | INNER_L4_SRC_PORT |             | |
|        | +-------------------+-------------+ |
+--------+-------------------------------------+
| LAG    | +-------------------+-------------+ |
|        | | Hash Field        | Algorithm   | |
|        | |-------------------+-------------| |
|        | | DST_MAC           | CRC         | |
|        | | SRC_MAC           |             | |
|        | | ETHERTYPE         |             | |
|        | | IP_PROTOCOL       |             | |
|        | | DST_IP            |             | |
|        | | SRC_IP            |             | |
|        | | L4_DST_PORT       |             | |
|        | | L4_SRC_PORT       |             | |
|        | | INNER_DST_MAC     |             | |
|        | | INNER_SRC_MAC     |             | |
|        | | INNER_ETHERTYPE   |             | |
|        | | INNER_IP_PROTOCOL |             | |
|        | | INNER_DST_IP      |             | |
|        | | INNER_SRC_IP      |             | |
|        | | INNER_L4_DST_PORT |             | |
|        | | INNER_L4_SRC_PORT |             | |
|        | +-------------------+-------------+ |
+--------+-------------------------------------+
```
Depends on PR sonic-net/sonic-buildimage#17458

What I did
Add CLIs to enable/disable containercfgd to optimize warm/fast boot path

How I did it
Add CLIs to enable/disable containercfgd

How to verify it
unit test
manual test
sonic-net#3008) (sonic-net#3073)

* Support reading/writing module EEPROM data by page and offset (sonic-net#3008)
* Support reading/writing module EEPROM data by page and offset
* Revert "[config/show] Add command to control pending FIB suppression (sonic-net#2495)"

This reverts commit 9126e7f.

* Revert "Revert "Revert frr route check (sonic-net#2761)" (sonic-net#2762)"

This reverts commit b4f4e63.
What I did
Need to support golden config in db migrator.

How I did it
If there's golden config json, read from golden config instead of minigraph.
And db migrator will use golden config data to generate new configuration.

How to verify it
Run unit test.
What I did
db_migrator failed to initialize SonicDBConfig, and I fix this issue.

How I did it
If SonicDBConfig is already initialized, do not invoke initialize() again.

How to verify it
Run unit test, and verified on DUT.
…tus (sonic-net#3069)

For each BGP status, if the `admin_status` field is not present, then
whether the BGP session is admin up or admin down depends on the default
BGP status (in the `default_bgp_status` field coming from
`init_cfg.json`), which is specified during image build. If the default
BGP status is up, then `admin_status` will be created only when the BGP
session is brought down; similarly, if the default BGP status is down,
then `admin_status` will be created when the BGP session is brought up.

Because of that, modify the script to use the default BGP status as the
initial value.

Signed-off-by: Saikrishna Arcot <[email protected]>
Fix sonic-net/sonic-buildimage#17322
Remove the route migration operation from db_migrator. The route migration operation takes a lot of time as indicated in the below issue. This is not necessary since the hardcoded assert in the fpmsyncd on new fields is removed in sonic-net/sonic-swss#2981
…no external neighbors are configured on chassis LC (sonic-net#3099)

Support show ip bgp summary to display without error when no external neighbors are configured on chassis LC
…atforms (sonic-net#3115)

Disabling key validation feature in grub file as its not yet supported for Cisco platforms

What I did
Check if the platform we are installing the image on is a Cisco platform
Return success if it is so we are on Cisco platform. This way, we do not perform signature verification as this feature is not yet supported on our platforms.
How I did it
Modified sonic-installer grub.py code
Add the core files to the tarball while they are been processed, this ensures that
only one core file at a time will be consuming flash space inside the tarpath and the
tarball.
…client.eth0.pid does not exist" (sonic-net#3149)

* Fix load_mgmt_config not exit when dhclient.eth0.pid not exists

Signed-off-by: Mai Bui <[email protected]>

* add UT

Signed-off-by: Mai Bui <[email protected]>

---------

Signed-off-by: Mai Bui <[email protected]>
…decimal (sonic-net#3153) (sonic-net#3160)

* Fix the sfputil treats page number as decimal instead of hexadecimal (sonic-net#22)

Signed-off-by: Kebo Liu <[email protected]>
Co-authored-by: Kebo Liu <[email protected]>
…SIC (sonic-net#3158)

This PR sonic-net#3099 fixes the case where on chassis Linecard there are no BGP neighbors. However, if the Linecard has neighbors on one ASIC but not on other, the command show bgp summary displayed no neighbors. This PR fixes this.

How I did it
Add check in bgp_util to create empty peer list only once
Add UT to cover this case
…KUs if the buffer configuration is empty (sonic-net#3114)

### What I did

Do not touch the buffer model on generic SKUs if the buffer configuration is empty.

#### How I did it

Set the buffer model to traditional on generic SKUs in Mellanox db migrator only if the buffer configuration is not default and not empty.

#### How to verify it

Manually and mock test.

### Details ####
Buffer configuration contains two parts:
1. the buffer model in `DEVICE_METADATA|localhost` which is from `init_cfg.json` and can be updated by Mellanox buffer migrator
2. the buffer pools, profiles, PGs, and queues which are renderred from the buffer templates in `config qos reload`

There was a logic to update the buffer model in Mellanox buffer migrator: if the buffer configuration is not default, the buffer model is set to traditional. However, if a device is installed from ONIE, the buffer configuration is also empty. As a result, the traditional buffer manager starts after the device is installed from ONIE, and it requires to restart the buffer manager to switch to the dynamic model. This can be done only by `config reload`.
It didn't matter since it was required to execute `config qos reload` to complete buffer configuration which required `config save` and `config reload` in any case due to issue sonic-net/sonic-buildimage#9088.
Now that the issue has been fixed and `config reload` isn't required anymore to complete `config qos reload`, we should avoid setting the buffer model to traditional in such case, otherwise `config reload` is still required to switch the buffer model.

Verified the following scenarios:
1. non-default configuration generic SKU upgrade from 202305: warm/cold boot: expected: traditional model
2. default configuration generic SKU upgrade from 201911/202305: warm/cold boot: expected: dynamic model
3. install from ONIE: expected: dynamic model
4. MSFT SKU upgrade from 201911 by cold boot/ from 202012 by warm boot: expected: traditional model
…le (sonic-net#3177)

* Retrieve firmware version fields from TRANSCEIVER_FIRMWARE_INFO table

Signed-off-by: Mihir Patel <[email protected]>

* Fixed test failures

* Removed update_firmware_info_to_state_db function

* Revert "Removed update_firmware_info_to_state_db function"

This reverts commit 68f52a2.

---------

Signed-off-by: Mihir Patel <[email protected]>
…ateTask thread (sonic-net#3187)

* CLI to skip polling for periodic infomration for a port in DomInfoUpdateTask thread

Signed-off-by: Mihir Patel <[email protected]>

* Fixed unit-test failure

* Modified dom_status to dom_polling

* Modified comment for failing the command

---------

Signed-off-by: Mihir Patel <[email protected]>
…ATE_DB is empty (sonic-net#3199)

* Add skip_action_validation option to acl-loader
…ic-net#3148) (sonic-net#3224)

* [show] Update show run all to cover all asic config in masic

* per comment

Co-authored-by: jingwenxie <[email protected]>
Basically port2alias Cli became broken on multi-asic platforms after introduction of sonic-net/sonic-buildimage#10960 which removed the initialization of global DB config from portconfig.py (library side) and expects application to do it, but here application side (port2alias) was not updated accordingly.

How I did it
Add load_db_config call to port2alias for initialization
#### What I did
Add alerting for YANG validation when load_minigraph during override. This is to alert early if golden config is invalid which will breaks GCU feature. 
#### How I did it
Add alerting when `is_yang_config_validation_enabled` is not set during load_minigraph with override
#### How to verify it
Unit test
stepanblyschak and others added 29 commits May 14, 2024 04:01
…sonic-net#3240)

* [fast/warm-reboot] Retain TRANSCEIVER_INFO/STATUS tables on reboot

Signed-off-by: Stepan Blyschak <[email protected]>

* Remove TRANSCEIVER_STATUS

---------

Signed-off-by: Stepan Blyschak <[email protected]>
sonic-net#3272)

- What I did
Add support for a new platform x86_64-nvidia_sn5400-r0

- How to verify it
Manual and unit test
…l reboot (sonic-net#3292)

* [chassis][midplane] Add notification to Supervisor when LC is graceful reboot

* Address review comment by adding log message when failed to create wentry in CHASSIS_STATE_DB

Signed-off-by: mlok <[email protected]>
…-net#3236)

### What I did

Update sonic-utilities to support new SKU Mellanox-SN5600-O128

1. Add the SKU to the generic configuration updater
2. Simplify the logic of the buffer migrator to support the new SKU

### How to verify it

Manual and unit tests
Migrate AAA table in db_migrator

#### Why I did it
    per-command AAA need enable in warm-upgrade case

#### How I did it
    Add db_migrator code to migrate AAA table

#### How to verify it
    Pass all test case.
    Add new test case.

#### Which release branch to backport (provide reason below if selected)
    N/A

#### Description for the changelog
    Migrate AAA table in db_migrator

#### A picture of a cute animal (not mandatory but encouraged)
…#3296)

Migrate AAA table per-command authorization in db_migrator

#### Why I did it
    per-command AAA need enable in warm-upgrade case

#### How I did it
    Add code to migrate per-command aunthorization

#### How to verify it
    Pass all test case.
    Add new test case.

#### Which release branch to backport (provide reason below if selected)
    N/A

#### Description for the changelog
    Migrate AAA table per-command authorization in db_migrator

#### A picture of a cute animal (not mandatory but encouraged)
…onic-net#3305)

- What I did
Added code to remove leftover symlinks and directories created by featured. Featured creates a symlink to /dev/null when unit is masked and an auto restart configuration is left under corresponding service.d/ directory.

- How I did it
Added necessary changes and UT to cover it.

- How to verify it
Uninstall an extension and verify no leftovers from featured.

Signed-off-by: Stepan Blyschak <[email protected]>
…en urllib3 and requests packages (sonic-net#3328) (sonic-net#3337)

* [build] Fix base OS compilation issue caused by incompatibility between urllib3 and requests packages

* [pipeline] Pin request package to v2.31.0
* Backup STATE_DB PORT_TABLE during warm-reboot

Signed-off-by: Mihir Patel <[email protected]>

* Backing up selected fields from STATE_DB PORT_TABLE|Ethernet* and deleting unwanted fields during warm-reboot

---------

Signed-off-by: Mihir Patel <[email protected]>
- What I did
Change the target path for SDK Sniffer from "/var/log/mellanox/sniffer/" To: "/var/log/sdk_dbg"

- How I did it
Change the default for SDK_SNIFFER_TARGET_PATH

- How to verify it
Run SDK sniffer and make sure the sniffer output file kept in the new location
…V256 (sonic-net#3312)

- What I did
Update sonic-utilities to support new SKU Mellanox-SN5600-V256
Add the SKU to the generic configuration updater

- How I did it

- How to verify it
Manual and unit tests
**What I did?**
1. Bugfix for console CLI (This is introduced by [consutil] replace shell=True sonic-net#2725, * cannot be treated as wildcard correctly).
```
admin@sonic:~$ show line
ls: cannot access '/dev/C0-*': No such file or directory
```
2. Enhance UT to avoid regression mentioned in 1.
3. Fix incorrect statement in UT.
4. Fix critical Flake8 error.

**How to verify it**
1. Verified on Nokia-7215 MC0 device.
2. Verified by UT

Sign-Off By: Zhijian Li <[email protected]>
In the previous commit with hash a3cf5c that aimed to address the issue
where sfputil incorrectly interpreted page numbers as decimal instead of
hexadecimal, there was an inadvertent double conversion from hexadecimal
to decimal. For instance, inputting 11 resulted in conversion to 17 and
then further to 23. To rectify this, the second conversion would be
removed.

A related ut has also been added.

Signed-off-by: Yuanzhe, Liu <[email protected]>
… (sonic-net#3372)

* Improve load_mingraph to wait eth0 restart before exist
- What I did
Backup DB after syncd and swss are stopped. I observed an issue with fast-reboot that in a rare circumstances a queued FDB event might be written to ASIC_DB by a thread inside syncd after a call to FLUSHDB ASIC_DB was made.
That left ASIC_DB only with one record about that FDB entry and caused syncd to crash at start:

Mar 15 13:28:42.765108 sonic NOTICE syncd#SAI: :- Syncd: syncd started
Mar 15 13:28:42.765268 sonic NOTICE syncd#SAI: :- onSyncdStart: performing hard reinit since COLD start was performed
Mar 15 13:28:42.765451 sonic NOTICE syncd#SAI: :- readAsicState: loaded 1 switches
Mar 15 13:28:42.765465 sonic NOTICE syncd#SAI: :- readAsicState: switch VID: oid:0x21000000000000
Mar 15 13:28:42.765465 sonic NOTICE syncd#SAI: :- readAsicState: read asic state took 0.000205 sec
Mar 15 13:28:42.766364 sonic NOTICE syncd#SAI: :- onSyncdStart: on syncd start took 0.001097 sec
Mar 15 13:28:42.766376 sonic ERR syncd#SAI: :- run: Runtime error during syncd init: map::at
Mar 15 13:28:42.766376 sonic NOTICE syncd#SAI: :- sendShutdownRequest: sending switch_shutdown_request notification to OA for switch: oid:0x0
Mar 15 13:28:42.766518 sonic NOTICE syncd#SAI: :- sendShutdownRequestAfterException: notification send successfully

- How I did it
Backup DB after syncd/swss have stopped.

- How to verify it
Run fast-reboot.

Signed-off-by: Stepan Blyschak <[email protected]>
* [pbh]: Fix show PBH counters when cache is partial.

Signed-off-by: Nazarii Hnydyn <[email protected]>
* [DPB]Fix return code in case of failure

* Updating UT
What I did
Show techsupport is designed to collect logs and core files since given date.
I find that some core files are missing when given date is relative, for example "5 minutes ago".
Microsoft ADO: 28737486

How I did it
Create the reference file at the start of the script, and don't update it in find_files.

How to verify it
Run end to end test: show_techsupport/test_auto_techsupport.py
…cs (sonic-net#3448)

What I did
Due to a conflict while cherry-picking of the PR#3369 to branch 202311, re-create this pull request to merge it manually

Add a debug group and a sub-command loopback under the sfputil command for debugging and module diagnostic purposes.

How I did it
Implement the loopback command by directly calling the set_loopback_mode() API.

How to verify it
Tested under Cisco8111 with Credo C1 cable.

Turn off loopback mode
sfputil debug loopback Ethernet88 none
Turn on host input loopback
sfputil debug loopback Ethernet88 host-side-input

MSFT ADO: 26677525

Signed-off-by: xinyu <[email protected]>
)

#### What I did
If there is something wrong getting eeprom while exectuing script `decode-syseeprom`, it will raise an exception and log the error. There was no definition of `log` in script `decode-syseeprom`, which will raise such error 
```
Traceback (most recent call last):
  File "/usr/local/bin/decode-syseeprom", line 264, in <module>
    sys.exit(main())
             ^^^^^^
  File "/usr/local/bin/decode-syseeprom", line 246, in main
    print_serial(use_db)
  File "/usr/local/bin/decode-syseeprom", line 171, in print_serial
    eeprom = instantiate_eeprom_object()
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/local/bin/decode-syseeprom", line 36, in instantiate_eeprom_object
    log.log_error('Failed to obtain EEPROM object due to {}'.format(repr(e)))
    ^^^
NameError: name 'log' is not defined
```
In this PR, I add the definition of log to avoid such error. 

#### How I did it
Add the definition of log. 

#### How to verify it
```
admin@vlab-01:~$ sudo decode-syseeprom -s                
Failed to read system EEPROM info
```
…nd used save_cmd() instead of running the command directly and manually storing the output in a file
@tshalvi tshalvi force-pushed the 202311_adding_cmis_host_mgmt_files_to_show_techsupport branch from 4893f69 to 6a27b47 Compare August 28, 2024 08:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.