-
Notifications
You must be signed in to change notification settings - Fork 1
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
CP-52226: Implement XS_DIRECTORY_PART
call
#2
base: main
Are you sure you want to change the base?
CP-52226: Implement XS_DIRECTORY_PART
call
#2
Conversation
… per-connection cache Implements XS_DIRECTORY_PART (originally specified in https://lore.kernel.org/xen-devel/[email protected]/). Like in CXenstored, it's limited to dom0. It is intended to list directories larger than the size of a single payload (4096 bytes) over several calls (not necessarily inside a single transaction). To alleviate the impact of the differences in how node's children are represented between C and OCaml Xenstoreds, implements a per-connection cache allowing the string representation of node's children to be generated once and reused later as long as the underlying list of children doesn't change (and the connection is reused). Whereas CXenstore uses a global generation count, modified at transaction creation and node changes, the OCaml version relies on comparisons of immutable trees. A per-node generation count is instead faked in the connection cache - it allows clients to know that the underlying information has changed and they should restart the sequence of partial directory calls to get a consistent picture. (i.e., while the generation count returned by CXenstore is globally unique, this is not useful for this particular function, since generation count is only compared against the results of previous invocations of partial directory calls on the same node) The cache is refreshed by OXenstored itself, and is limited to 8 entries (by default, configurable). A direct translation of the C code when it comes to string handling and allocation. Signed-off-by: Andrii Sultanov <[email protected]>
Original XSA fix only returns XS_ERROR payload with E2BIG inside for domU connections. libxenstore, however, expects an E2BIG response for proper handling of partial directory calls, for dom0 as well. In practice this removes the possibility for a "large packets" patch: https://github.com/xenserver/xen.pg/blob/9cde7213128d777c4856d719c10945c052df0ce1/patches/oxenstore-large-packets.patch It should be now superseded with the proper partial directory support. Signed-off-by: Andrii Sultanov <[email protected]>
client support for this call in mirage/ocaml-xenstore will come separately |
NOTE: cache size is said to be configurable but isn't yet - I wasn't sure if it's worth doing it and if so, through what means. Whatever the size, it doesn't stop any calls from making progress, it just makes them potentially slower (if you are doing X+1 chains of calls with different nodes at the same time) |
let child_length = ref 0 in | ||
let i = ref offset in | ||
let cache_length = String.length children in | ||
while !buffer_length + !child_length < maxlen && !i < cache_length do |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This could be written as a fold to avoid the mutations. I think the goal is to add items, but only entire items as long as they fit, right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've thought about expressing this in a nicer way, but all of the Array.fold, iter and such are implemented with mutations anyway, and I need to mutate several things so that the niceness isn't particularly maintained anyhow
…m0 with per-connection cache Signed-off-by: Andrii Sultanov <[email protected]>
| Some (_, generation_count, _, _) -> | ||
Int64.add generation_count 1L | ||
| None -> | ||
1L |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we need to remember what the largest previous value was and start there, or some other way to ensure it is unique. Otherwise if we have to evict something from the cache (because the size got exceeded), and we readd it, then it'll go back to 1 and the client may not notice that it has changed.
We could store a highest-evicted-counter, or a counter of how many operations we've done on the connection cache, etc.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I can do that, but then if the cache entry was evicted, the client will be forced to restart the sequence of calls as the generation count will have some new value. This is better than being oblivious to changes inbetween calls, of course.
in | ||
String.blit generation_s 0 buffer 0 genlen ; | ||
String.blit children offset buffer genlen !buffer_length ; | ||
if last_chunk then |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we remove the cached entry if this was the last chunk?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the connection is kept, and another sequence of calls are issued, it's still useful
Signed-off-by: Andrii Sultanov <[email protected]>
|
||
child_length := 0 | ||
) ; | ||
i := !i + 1 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There's incr
, but perhaps that's less clear. Is there a reason why some variables are foolen
but others are foo_length
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
some were created in the morning, others towards the evening...
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
oh, genlen in particular is just a copy of the C's variable name
XS_DIRECTORY_PART
callXS_DIRECTORY_PART
call
; transactions= Hashtbl.create 5 | ||
; directory_cache | ||
; next_tid= initial_next_tid | ||
; watches= Hashtbl.create 8 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is the size here indicating an intent? It's an implementation concern but it's basically redundant to give small an initial size of < 16
. These will all start with the same initial size.
@@ -205,11 +224,22 @@ let create xbcon dom = | |||
| Some _ -> | |||
0 | |||
in | |||
let directory_cache = | |||
match dom with | |||
| Some dom -> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could be expressed using when
to make Dom0 its own isolated case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
None
and Some dom when ... = 0
would still need to be separate arms, because variable must occur on both sides of the pattern
| Some (cached_node, gen_count, cache) -> | ||
(* Update the timestamp in the cache *) | ||
Hashtbl.replace directory_cache path | ||
(cached_node, gen_count, cache, Unix.gettimeofday ()) ; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd rather use a monotonic clock here, but it doesn't look like there's one in the standard library :(
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think we're limited to the standard library anymore now that oxenstored is forked. Maybe I'm missing something?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's good practise to keep the dependency cone as small as it's practical
These changes passed Ring3 BST+BVT tests (Run ID #207701) |
339f065
to
680ccc8
Compare
CP-52226 - oxenstored: Implement partial directory call for dom0 with per-connection cache
Implements
XS_DIRECTORY_PART
(originally specified inhttps://lore.kernel.org/xen-devel/[email protected]/).
Like in CXenstored, it's limited to dom0. It is intended to list
directories larger than the size of a single payload (4096 bytes) over
several calls (not necessarily inside a single transaction).
To alleviate the impact of the differences in how node's children are
represented between C and OCaml Xenstoreds, implements a per-connection
cache allowing the string representation of node's children to be generated
once and reused later as long as the underlying list of children doesn't change
(and the connection is reused).
Whereas CXenstore uses a global generation count, modified at
transaction creation and node changes, the OCaml version relies on
comparisons of immutable trees. A per-node generation count is instead
faked in the connection cache - it allows clients to know that the
underlying information has changed and they should restart the sequence
of partial directory calls to get a consistent picture.
(i.e., while the generation count returned by CXenstore is globally unique,
this is not useful for this particular function, since generation count
is only compared against the results of previous invocations of partial
directory calls on the same node)
The cache is refreshed by OXenstored itself, and is limited to 8 entries
(by default, configurable).
A direct translation of the C code when it comes to string handling and
allocation.
CP-52226 - oxenstored: Return XS_ERROR (E2BIG) on socket connections too
Original XSA fix only returns XS_ERROR payload with E2BIG inside for
domU connections. libxenstore, however, expects an E2BIG response for
proper handling of partial directory calls, for dom0 as well.
In practice this removes the possibility for a "large packets" patch:
https://github.com/xenserver/xen.pg/blob/9cde7213128d777c4856d719c10945c052df0ce1/patches/oxenstore-large-packets.patch
It should be now superseded with the proper partial directory support.
====
This has been tested on large directories, where Oxenstored used by C clients (that did not know about partial packets, did not handle larger packets) no longer errors out and complies with the existing protocol:
I've also confirmed that the generation count is incremented on changes to node's children list between calls, and that the cache does indeed serve its purpose.