Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

next: Changes queued for the v2016.07 release #956

Merged
merged 144 commits into from
Jul 13, 2016
Merged

next: Changes queued for the v2016.07 release #956

merged 144 commits into from
Jul 13, 2016

Conversation

lukego
Copy link
Member

@lukego lukego commented Jun 29, 2016

Changes queued up for v2016.07.

Fixes needed:

  • Selftest in core.timer is printing tens of thousands of debug messages. Have to account for why this has started happening and likely remove some debug printouts (seems excessive to print a message on every timer invocation even when developer_debug is enabled... could be a use case for timeline instead soon).
  • NFV selftest is failing CI. Could be related to migration of davos from Ubuntu to NixOS that is ongoing. @eugeneia?
  • Performance regression detected by Hydra tests developed over on NFV appears to allocate #951. This benchmark is automatically executing as the issue-951 jobset. Results can be found by clicking an Evaluation (i.e. test run) and locating the benchmark-report job.

lukego and others added 30 commits February 22, 2016 13:35
Set the shared memory path (shm.path) to a private namespace for each
app with prefix "app/$name".

This means that apps can create shm objects such as counters and by
default these will appear in a local namespace for that app.
I rewrote the git-workflow explanation with the goal of briefly but
clearly explaining both how to submit a change to Snabb Switch and
also how to be a subsystem maintainer.

This is intended to be a chapter in the manual.
Contains "XXX rewrite" where old sections have been removed but
replacements not written yet.
This file now contains a diagram so it needs to have separate .src.md
and .md versions. This can be cleaned up when these changes eventually
merge with the on-demand markdown (#829).
also stopped capitalizing "Pull Request" because it seemed awkward.
 - Use "apps/" instead of "app/" for uniformity
 - Set shm path to "apps/$name" when calling `app:stop' too
 - Unlink "apps/$name" after `app:stop' using `shm.unlink'
 - Add a test case to core.app selftest
Resolved conflicts with incoming changes from master branch:

*.src.md replaced with simply *.md

"Snabb Switch" being renamed to "Snabb"
Ditaa images are not kept in tree anymore...
option and support injecting a function to determine the
current time.
MRG_RXBUF is enabled by default in the beginning, and QEMU will initially
negotiate a feature set with Snabb NFV that includes MRG_RXBUF.  This
adds a field onto the virtio header, in legacy mode.  However if we
later negotiate to not have MRG_RXBUF, we need to re-set this value to
not have the extra fields.

(cherry picked from commit df8e83c)
@lukego
Copy link
Member Author

lukego commented Jul 1, 2016

I cherry-picked the virtio-net fix df8e83c from lwaftr repo for renegotiating virtio-net options with guests that switch drivers while running.

…dicate."

Reason: its actually slower than the initial naive version.

This reverts commit c186591.

# Conflicts:
#	src/apps/vhost/vhost_user.lua
@eugeneia
Copy link
Member

eugeneia commented Jul 4, 2016

@lukego I have tracked down the regression to the statistics counters in vhost_user. I pushed e3fcbbf to #931 which reverts a premature optimization I did, and this removes half of the slowdown. That leaves us with a 2.5% regression. Now looking into what can be done about the remaining slowdown.

Edit: well “premature optimization” is the wrong term here really as I tried to outsmart the compiler and failed.

@eugeneia
Copy link
Member

eugeneia commented Jul 4, 2016

Profiling tells me that 10% of time is spent in counter.add, and does not highlight vhost_user in particular. Theory: more counters → more work. Possible solutions I can think of:

  • optimize counter.add
  • update vhost_user to invoke counter.add less, e.g. aggregate updates like intel10g does

@eugeneia
Copy link
Member

eugeneia commented Jul 4, 2016

Alternatively we could omit counting successfully processed rx/tx packets in vhost_user, and rely on the existing link statistics to reflect these. (rx/tx packes/bytes app-counters are mostly redundant with respect to link stats.)

@lukego
Copy link
Member Author

lukego commented Jul 5, 2016

@eugeneia Looking at a new Hydra run it looks like the same performance regression is there on both next and your statistics-superset branch. If you push new code to your branch it should automatically rerun. Hopefully you can find results by clicking around the snabb-new-tests project on Hydra. Currently we have two relevant CI jobs, regression-min-2016.07 running a quick 5-iteration benchmark and regression-2016.07 running a thorough 30-iteration benchmark.

@domenkozar I would like to generalize these CI jobs. Instead of specifically testing for a performance regression on the 2016.07 release they could be continuously showing the comparative performance of a collection of branches: master, next, lukego/optimize, eugeneia/optimize, wingo/optimize, kbara/optimize, etc. That way everybody can always interact with the CI simply by pushing to their optimization branch and waiting to see the result. Just a matter of renaming the jobsets and locking in the branch names?

Meta: I reckon that all "out of band" information we are discussing should be brought "in band" somehow. For example, if profiler information is required for interpreting results then the CI tests should include runs that are profiled and publish the reports. This way we are all looking at the same information and can see exactly how it was produced.

@lukego
Copy link
Member Author

lukego commented Jul 5, 2016

@domenkozar This is really cool testing to have up and running, btw! I am especially impressed that Hydra is smart enough to reuse all previous results that are still valid. So if one branch is updated then its benchmarks will be rerun but the results for the unmodified branches will be reused. This seems extremely efficient and means we can include a lot of branches in the benchmark campaigns.

@domenkozar
Copy link
Member

domenkozar commented Jul 5, 2016

@lukego I've renamed the CI jobs to next-regression-benchmarks and next-regression-benchmarks-small: https://hydra.snabb.co/project/snabb-new-tests

Once /optimize branches exist we can also change those :)

By "profiler information" do you mean we should write those tests and include them? I'm +1 on that, let's add more regressions tests.

The counter value for the interface speed should be in units
of bps.  The ifSpeed SNMP object must obey RFC3635 sec. 3.2.8.
@domenkozar
Copy link
Member

domenkozar commented Jul 5, 2016

Looks like SnabbBot errored out due to snabblab/snabblab-nixos#52

@eugeneia eugeneia mentioned this pull request Jul 8, 2016
@eugeneia eugeneia merged commit 3e1f4dc into master Jul 13, 2016
eugeneia added a commit that referenced this pull request Jul 13, 2016
dpino pushed a commit to dpino/snabb that referenced this pull request Sep 25, 2017
Allow multiple CPUs via --cpu argument
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

7 participants