Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[sonic-mgmt][dualtor-aa] Fix fdb/test_fdb_mac_learning.py failures #15675

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

vkjammala-arista
Copy link
Contributor

@vkjammala-arista vkjammala-arista commented Nov 21, 2024

Description of PR

Summary: [dualtor-aa] Fix "fdb/test_fdb_mac_learning.py" failures
Fixes # https://github.com/aristanetworks/sonic-qual.msft/issues/329

Type of change

  • Bug fix
  • Testbed and Framework(new/improvement)
  • Test case(new/improvement)

Back port request

  • 202012
  • 202205
  • 202305
  • 202311
  • 202405

Approach

What is the motivation for this PR?

Test is currently failing on dualtor-aa topologies due to

  1. Packet sometimes going to unselected dut (due to active-active topology) and thus lead to mac learning failure.

  2. After bringing up interfaces (from shutdown state), there is time.sleep of 30 seconds which seem to be not enough for muxcable status on duthost to become consistent with mux server_status (see SERVER_STATUS shown as unknown below). We need to wait for SERVER_STATUS to match with STATUS field for mac learning to happen.

PORT       STATUS    SERVER_STATUS    HEALTH     HWSTATUS      LAST_SWITCHOVER_TIME
---------  --------  ---------------  ---------  ------------  ----------------------
Ethernet0  active    unknown          unhealthy  inconsistent
  1. As test is bringing down all the interfaces (including portchannels), ERR swss#tunnel_packet_handler.py: All portchannels failed to come up within 3 minutes, exiting. is coming during the test and causing test faiure (as log_analyzer is complaining)

How did you do it?

  1. Add fixture to setup topo in active-standby mode. This is needed to make sure packets goto selected dut (for mac
    learning to happen correctly).
  2. Introduce logic to wait for mux status to become consistent before sending traffic (instead of relying on time.sleep delay).
  3. Ignore "All port channels failed to come up ..." syslog, which seems to be expected as test is bringing down all the
    portchannels.

How did you verify/test it?

Stressed the test on Arista-7260CX3-D108C8 platform with dualtor-aa[-56] deployed and test is passing.

Any platform specific information?

Supported testbed topology if it's a new test case?

Documentation

1) Add fixture to setup topo in active-standby mode. This is needed to
   make sure packets goto selected dut (for mac learning to happen
   correctly).
2) Introduce logic to wait for mux status to become consistent before
   sending traffic (instead of relying on time.sleep delay).
3) Ignoring "...All port channels failed to come up within 3 minutes"
   syslog, as test is bringing down portchannels and restores them at
   the end.
@mssonicbld
Copy link
Collaborator

The pre-commit check detected issues in the files touched by this pull request.
The pre-commit check is a mandatory check, please fix detected issues.

Detailed pre-commit check results:
trim trailing whitespace.................................................Passed
fix end of files.........................................................Passed
check yaml...........................................(no files to check)Skipped
check for added large files..............................................Passed
check python ast.........................................................Passed
flake8...................................................................Failed
- hook id: flake8
- exit code: 1

tests/fdb/test_fdb_mac_learning.py:17:1: E302 expected 2 blank lines, found 1
tests/fdb/test_fdb_mac_learning.py:29:1: E302 expected 2 blank lines, found 1
tests/fdb/test_fdb_mac_learning.py:195:43: E225 missing whitespace around operator
tests/fdb/test_fdb_mac_learning.py:235:121: E501 line too long (128 > 120 characters)

flake8...............................................(no files to check)Skipped
check conditional mark sort..........................(no files to check)Skipped

To run the pre-commit checks locally, you can follow below steps:

  1. Ensure that default python is python3. In sonic-mgmt docker container, default python is python2. You can run
    the check by activating the python3 virtual environment in sonic-mgmt docker container or outside of sonic-mgmt
    docker container.
  2. Ensure that the pre-commit package is installed:
sudo pip install pre-commit
  1. Go to repository root folder
  2. Install the pre-commit hooks:
pre-commit install
  1. Use pre-commit to check staged file:
pre-commit
  1. Alternatively, you can check committed files using:
pre-commit run --from-ref <commit_id> --to-ref <commit_id>

time.sleep(30)
target_ports = [target_ports_to_ptf_mapping[0][0]]
duthost.shell("sudo config interface startup {}".format(target_ports[0]))
pytest_assert(wait_until(150, 5, 0, self.check_mux_status_consistency, duthost, target_ports))
Copy link
Contributor

@lolyu lolyu Nov 22, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

do we need to check if this is dualtor testbed first? What if this is a t0?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @lolyu for catching this, yeah for t0 mux status is irrelevant (as muxcable is specific to dualtor), will update check_mux_status_consistency method to handle this case.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants