-
Notifications
You must be signed in to change notification settings - Fork 299
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
next: Collected changes for the v2016.03 release #778
Conversation
* lib/luajit/src/lj_record.c (lj_record_tailcall): Tail calls don't contribute to loop unroll limit.
* src/Makefile (MDSRC): Rename from RMSRC, and search for any .md.src file. Update users. (MDOBJS): Rename from RMOBJS.
… functions * src/lib/ctable.lua (make_equal_fn): Add specialized implementations for 2-byte and 6-byte keys as well. (hash_32): Rename from hash_i32, as it works on any 32-bit number. (hashv_32, hashv_48, hashv_64): New functions. (selftest): Update to test the new equal functions.
* src/doc/genbook.sh: Add section for specialized data structures. * src/lib/README.ctable.md: New file.
Thanks to Kristian Larsson for catching it.
Convert all shell scripts from having path dependencies like: #!/bin/bash to #!/usr/bin/env bash The immediate utility of this change is to fix annoying breakage on NixOS. For example, now "make doc/snabbswitch.html" works. Exception is that I did not descend into subdirectories of dependencies e.g. ljsyscall and pflua. I made the changes with an Emacs macro in the hope of avoiding an unfortunate typo that will break something.
Streamline the core.memory API: - Rewrite the blurb. - Keep dma_alloc() and virtual_to_physical(). - Remove dma_min_addr and dma_max_addr (unsed). - Deprecate huge_page_size (used but should it be?) Backstory here is that the core.memory module has exported low-level information to other modules: the VFIO module (since removed) that needed to whitelist all DMA memory with the kernel, a /dev/vhost-net accelerator for tap devices (since removed) that also needed to whitelist all packet memory, and to a lesser extent the vhost-user implementation (that is still alive and well but may be overusing the memory API e.g. in dead code that should be removed). Adding the deprecation note on "huge_page_size" may or may not be a good idea. Have to look more closely at the usage and clarify with a later commit.
Remove the variable 'pagebits' and the function 'VirtioNetDevice:translate_physical_addr()'. These are not used and they depend on parts of the core.memory API that I want to remove.
This reverts commit 9666348.
Thanks to Max Rottenkolber for review.
Rework the parsing of the Snabb command line to allow us to ship a nicely named snabb-lwaftr binary, but have it map internally to the also nicely-named 'programs.lwaftr' module tree. Gets rid of the nasty "snabb" name duplication in the module tree.
This is intended to make Github match the ".md" suffix and treat the files as markdown format. This should enable better support in the web interface e.g. showing diffs with the rendered markdown. Link to relevant Github feature: https://help.github.com/articles/rendering-differences-in-prose-documents/
The syscall.util.if_nametoindex() only needs to be passed the name of the interface. The version of the function which accepts an open socket as second argument is internal to ljsyscall, and not even exported to client code. Also, this moves the call to syscall.util.if_nametoindex() to the top of RawSocket:new() to simplify error handling: this way one less explicit call to syscall.close() is needed in case of errors.
…nametoindex) into max-next
…) into max-next
@eugeneia I'll take a look at this tomorrow |
@domenkozar You would need to add a NixOS package for https://github.com/eugeneia/snabbswitch-docker (see |
Thus spake the wise man:
I have taken @sleinen's bait and started learning R for making sense of test results. Bear with me, I am surely doing it all wrong. Here is a first visualization of around 200 test results from This seems to be telling us an encouraging message: that 256-byte consistently hits line rate (~4.4 Mpps), that 128 and 64 byte both exceed this, that variance is low, and that master and next are much the same. If this is indeed the message then this seems like a valuable kind of test to run (and particularly on release candidates). There is one outlier with next and 64-byte packets: it would be interesting to make that dot into a hyperlink to the relevant log file (which I know can be done in R using SVG output). I also found the function
If I am reading this correctly (massive "if") then it is saying that different packet sizes have a major impact on performance (as expected) but different branches do not (as desired i.e. same perf from head and next). I like the fact that with R this is all scripted and so analysis like this could easily be integrated into the CI. End braindump. Can be that we look back on this post and chuckle at how I managed to completely misinterpret every aspect of R on my first day trying to use it. But that is then and this is now so I am feeling quite proud for the moment :). |
(The cool thing about the analysis of variance is that it can tell you both how much effect a variable has and how likely that is to really be caused by random background noise. I am really keen to explore looking for relationships between multiple factors in the context of #755 e.g. how sensitive are memcpy/checksum to alignment and how much does that depend on which level of the cache hierarchy they are hitting.) |
cc @domenkozar could be that we could have Hydra run these kinds of analysis and make the results available for every PR. |
Here is a picture of the same test above but with the DPDK 2.1 version of l2fwd: On the one hand it is good for the I am excited to press ahead with parallel efforts of measurement and optimization. These are both really valuable and mutually beneficial activities. There is a lot that can be done in both areas and for more applications. (I am focusing on the NFV application here only because it is the one that has upstream performance tests available: going forward I see this activity as encompassing all programs.) Onward! One step at a time... |
@mwiget mentioned seeing lower performance (~10%) when leaving an application idle for a while before starting to process traffic. This is likely some JIT behavior that we need to understand and correct (following @wingo's fine example with #707). The way forward that I see would be to define a test case where the waiting time before processing traffic can be controlled so that we can measure its impact and see which code changes successfully reduce variance. This is all extremely interesting from my perspective :-) |
I can believe that. I feel like sometimes if we start taking too many side traces, LuaJIT doesn't react appropriately, and if you do a jit.flush() things get back to good performance. Something to look at in the future. |
Hey wow cool awesome! I asked R to do an analysis of variance with the branch, the packet size, and the container (DPDK version) and here is what it says:
and if I am interpreting this correctly then it is really providing valuable information that is consistent with the graphs above:
This starts to look like exactly the thing I was looking for in #755 as a means to search for interactions between data alignment (src/dst/len) and memory hierarchy for different functions. And we already have the data in a CSV file to try this out with! |
Charging ahead with R and surely making a generous helping of mistakes... Post hoc analysis with Tukey's test is another way to use the ANOVA results. This can be easier to interpret because it tells us how much the average changes between groups of results: the best estimate and the +/- range that can be predicted with 95% confidence. Here is what Tukey's test says about the effect on Mpps of branch
This says that the estimated difference is 0.0083 Mpps (i.e. very small) and we predict with 95% confidence that if we reran the test the effect would be within -0.177 and +0.194 of that value (i.e. not clear that there is a positive or negative difference overall). Here is the same analysis but with instead comparing the DPDK/L2FWD version in the guest:
This estimates that DPDK 1.7 shows +1.64 Mpps in this dataset and we can say with 95% confidence that the real difference is between +1.55 and +1.74 Mpps. Cool information! Just for another angle, maybe pointlessly fancy, we can also look at the effect of branch and DPDK version in combination:
which makes it look like the effects are independent: same results with same DPDK version, consistent 1.6Mpps boost with DPDK 1.7 vs 2.1. |
(FYI: My mum tells me that this use and interpretation of Tukey's test is right.) |
This file contains only the documentation for the ctable module and not the whole lib/ API so README.md was a misnomer.
Extracted the documentation from the pmu.lua source file into a markdown-formatted README.
Extracted the documentation for the checksum module from the source code and into a separate markdown-formatted README file.
@eugeneia ship'able from my perspective. |
Require link and engine
I am declaring that
next
is now a release candidate for the v2016.03 release. I plan to focus on testing and stability and have it in good shape on March 1st for potential merge by @eugeneia onto master. I leave the[wip]
tag for the next few days so that we have a chance to test.