Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

CP-52226: Implement XS_DIRECTORY_PART call #2

Open
wants to merge 4 commits into
base: main
Choose a base branch
from

Conversation

last-genius
Copy link
Contributor

@last-genius last-genius commented Nov 12, 2024

CP-52226 - oxenstored: Implement partial directory call for dom0 with per-connection cache

Implements XS_DIRECTORY_PART (originally specified in
https://lore.kernel.org/xen-devel/[email protected]/).
Like in CXenstored, it's limited to dom0. It is intended to list
directories larger than the size of a single payload (4096 bytes) over
several calls (not necessarily inside a single transaction).

To alleviate the impact of the differences in how node's children are
represented between C and OCaml Xenstoreds, implements a per-connection
cache allowing the string representation of node's children to be generated
once and reused later as long as the underlying list of children doesn't change
(and the connection is reused).

Whereas CXenstore uses a global generation count, modified at
transaction creation and node changes, the OCaml version relies on
comparisons of immutable trees. A per-node generation count is instead
faked in the connection cache - it allows clients to know that the
underlying information has changed and they should restart the sequence
of partial directory calls to get a consistent picture.
(i.e., while the generation count returned by CXenstore is globally unique,
this is not useful for this particular function, since generation count
is only compared against the results of previous invocations of partial
directory calls on the same node)

The cache is refreshed by OXenstored itself, and is limited to 8 entries
(by default, configurable).

A direct translation of the C code when it comes to string handling and
allocation.


CP-52226 - oxenstored: Return XS_ERROR (E2BIG) on socket connections too

Original XSA fix only returns XS_ERROR payload with E2BIG inside for
domU connections. libxenstore, however, expects an E2BIG response for
proper handling of partial directory calls, for dom0 as well.

In practice this removes the possibility for a "large packets" patch:
https://github.com/xenserver/xen.pg/blob/9cde7213128d777c4856d719c10945c052df0ce1/patches/oxenstore-large-packets.patch

It should be now superseded with the proper partial directory support.

====

This has been tested on large directories, where Oxenstored used by C clients (that did not know about partial packets, did not handle larger packets) no longer errors out and complies with the existing protocol:

xenstore-write /test "test"

for i in `seq 100 500`
do
    xenstore-write /test/entry_with_very_long_name_$i $i
done

xenstore-ls
xenstore-rm /test

I've also confirmed that the generation count is incremented on changes to node's children list between calls, and that the cache does indeed serve its purpose.

… per-connection cache

Implements XS_DIRECTORY_PART (originally specified in
https://lore.kernel.org/xen-devel/[email protected]/).
Like in CXenstored, it's limited to dom0. It is intended to list
directories larger than the size of a single payload (4096 bytes) over
several calls (not necessarily inside a single transaction).

To alleviate the impact of the differences in how node's children are
represented between C and OCaml Xenstoreds, implements a per-connection
cache allowing the string representation of node's children to be generated
once and reused later as long as the underlying list of children doesn't change
(and the connection is reused).

Whereas CXenstore uses a global generation count, modified at
transaction creation and node changes, the OCaml version relies on
comparisons of immutable trees. A per-node generation count is instead
faked in the connection cache - it allows clients to know that the
underlying information has changed and they should restart the sequence
of partial directory calls to get a consistent picture.
(i.e., while the generation count returned by CXenstore is globally unique,
this is not useful for this particular function, since generation count
is only compared against the results of previous invocations of partial
directory calls on the same node)

The cache is refreshed by OXenstored itself, and is limited to 8 entries
(by default, configurable).

A direct translation of the C code when it comes to string handling and
allocation.

Signed-off-by: Andrii Sultanov <[email protected]>
Original XSA fix only returns XS_ERROR payload with E2BIG inside for
domU connections. libxenstore, however, expects an E2BIG response for
proper handling of partial directory calls, for dom0 as well.

In practice this removes the possibility for a "large packets" patch:
https://github.com/xenserver/xen.pg/blob/9cde7213128d777c4856d719c10945c052df0ce1/patches/oxenstore-large-packets.patch

It should be now superseded with the proper partial directory support.

Signed-off-by: Andrii Sultanov <[email protected]>
@last-genius
Copy link
Contributor Author

client support for this call in mirage/ocaml-xenstore will come separately

@last-genius
Copy link
Contributor Author

NOTE: cache size is said to be configurable but isn't yet - I wasn't sure if it's worth doing it and if so, through what means. Whatever the size, it doesn't stop any calls from making progress, it just makes them potentially slower (if you are doing X+1 chains of calls with different nodes at the same time)

let child_length = ref 0 in
let i = ref offset in
let cache_length = String.length children in
while !buffer_length + !child_length < maxlen && !i < cache_length do
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be written as a fold to avoid the mutations. I think the goal is to add items, but only entire items as long as they fit, right?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've thought about expressing this in a nicer way, but all of the Array.fold, iter and such are implemented with mutations anyway, and I need to mutate several things so that the niceness isn't particularly maintained anyhow

oxenstored/store.ml Outdated Show resolved Hide resolved
…m0 with per-connection cache

Signed-off-by: Andrii Sultanov <[email protected]>
| Some (_, generation_count, _, _) ->
Int64.add generation_count 1L
| None ->
1L
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we need to remember what the largest previous value was and start there, or some other way to ensure it is unique. Otherwise if we have to evict something from the cache (because the size got exceeded), and we readd it, then it'll go back to 1 and the client may not notice that it has changed.

We could store a highest-evicted-counter, or a counter of how many operations we've done on the connection cache, etc.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I can do that, but then if the cache entry was evicted, the client will be forced to restart the sequence of calls as the generation count will have some new value. This is better than being oblivious to changes inbetween calls, of course.

in
String.blit generation_s 0 buffer 0 genlen ;
String.blit children offset buffer genlen !buffer_length ;
if last_chunk then
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should we remove the cached entry if this was the last chunk?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if the connection is kept, and another sequence of calls are issued, it's still useful


child_length := 0
) ;
i := !i + 1
Copy link

@contificate contificate Nov 12, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's incr, but perhaps that's less clear. Is there a reason why some variables are foolen but others are foo_length?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

some were created in the morning, others towards the evening...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh, genlen in particular is just a copy of the C's variable name

@last-genius last-genius changed the title Implement XS_DIRECTORY_PART call CP-52226: Implement XS_DIRECTORY_PART call Nov 12, 2024
Comment on lines 241 to 244
; transactions= Hashtbl.create 5
; directory_cache
; next_tid= initial_next_tid
; watches= Hashtbl.create 8

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is the size here indicating an intent? It's an implementation concern but it's basically redundant to give small an initial size of < 16. These will all start with the same initial size.

@@ -205,11 +224,22 @@ let create xbcon dom =
| Some _ ->
0
in
let directory_cache =
match dom with
| Some dom ->
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

could be expressed using when to make Dom0 its own isolated case.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

None and Some dom when ... = 0 would still need to be separate arms, because variable must occur on both sides of the pattern

| Some (cached_node, gen_count, cache) ->
(* Update the timestamp in the cache *)
Hashtbl.replace directory_cache path
(cached_node, gen_count, cache, Unix.gettimeofday ()) ;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd rather use a monotonic clock here, but it doesn't look like there's one in the standard library :(

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think we're limited to the standard library anymore now that oxenstored is forked. Maybe I'm missing something?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's good practise to keep the dependency cone as small as it's practical

@last-genius
Copy link
Contributor Author

These changes passed Ring3 BST+BVT tests (Run ID #207701)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants