don't reschedule auction's price notifier if we already have one #10615

Chris-Hibbert · 2024-12-04T00:48:34Z

Description

It was observed, after applying upgrade 18 to EmeryNet, that a whole slew of promises resolved when a price was provided. The current belief is that observeQuoteNotifier will be called every auction round until a price is available, and that creates a new observer with each call that wait until a price is published, and then they all continue waiting for each successive update.

This change adds an interlock, so if there's already a notifier waiting, we don't add a new one.

Security Considerations

No security implication.

Scaling Considerations

Processing about 19000 actions waiting on a notifier in EmeryNet took several hours. If we're correct that the notifiers will continue to cycle, we expect to see a similar wait for each price update on that currency. That's unsustainable.

The only current theory about dropping all those actions waiting for notifiers is to kill the vat. We can't kill the priceAuthority vats that hold the notifiers, but we might be able to cleanly kill the abandoned auctioneers.

Documentation Considerations

Not needed.

Testing Considerations

Tough to test in unit tests. It's conceivable that we could recreate the situation in a3p-integration, though it would be hard to observe the results.

Upgrade Considerations

We probably shouldn't ship upgrade 18 with something to address this problem.

warner · 2024-12-04T01:03:27Z

packages/inter-protocol/src/auction/auctionBook.js

@@ -464,6 +465,8 @@ export const prepareAuctionBook = (baggage, zcf, makeRecorderKit) => {
            AmountMath.make(collateralBrand, QUOTE_SCALE),
            bidBrand,
          );
+          baggage.set(QUOTE_NOTIFIER, quoteNotifier);


I think that's storing a promise in (durable) baggage, which isn't allowed (also that const quoteNotifier = should probably be quoteNotifierP = to make that more obvious). Would it hurt anything to await and store the actual Presence instead?

No.

I also notice that we don't seem to make a new baggage for each book. I need to figure out if that's true, and either make new baggages, or make unique names for each notifier

Oh, no, that wouldn't be a trivial change. It would make observeQuoteNotifier async, but also it wouldn't be setting the flag at the right time: if two callers invoke observeQuoteNotifier in the same turn, we'd still wind up with multiple outstanding calls to makeQuoteNotifier, and we'd get multiple observers again.

I think observerQuoteNotifier() needs to be named/intentioned "maybeObserveQuoteNotifier" or "maybeStartObservingQuoteNotifier". And it needs to be "async idempotent" (not sure if that's a term of art but you get the idea): no matter how many times you call it in a row, you still only wind up with a single outstanding async request, but if that request fails for any reason, it resets back to a state where you can call it again and it won't short-circuit.

The flag needs to be checked on entry to the function, during the synchronous prelude, before sending any messages. Then it needs to send off the makeQuoteNotifier, and if that fails it needs to clear the flag. If it succeeds then it can build the observer, which clears the flag if/when the observer fails or finishes.

updated via pairing.

cloudflare-workers-and-pages · 2024-12-04T17:34:21Z

Deploying agoric-sdk with Cloudflare Pages

Latest commit:	`466d3b8`
Status:	✅ Deploy successful!
Preview URL:	https://6c571184.agoric-sdk.pages.dev
Branch Preview URL:	https://cth-auction-pricenotifier.agoric-sdk.pages.dev

View logs

warner

submitting review now so @Chris-Hibbert can consider the map-vs-set thing, will re-review when that's settled

warner · 2024-12-04T19:22:28Z

packages/inter-protocol/src/auction/auctioneer.js

+// auctionBook changed to create a sub-baggage for each book (to distinguish
+// their quoteNotifier flags) so older auctions will not be upgradable to this
+// version. We believe this version is saving all the necesary state to be
+// upgraded, but that hasn't been verified, so we don't mark it as `canUpgrade'.


My understanding of those meta flags is that the first sentence ("older versions will not be upgradable to this version") is the reason we don't mark this as canUpgrade. The second sentence ("saving all the necessary state") is what justifies our use of canBeUpgraded. The fact that we haven't verified it is either good note to add but doesn't change the flags, or a reason to not use canBeUpgraded.

My opinion is that canBeUpgraded is appropriate, but I don't know how much of a stickler we are for "there must be tests to prove this claim before we're allowed to make it".

So maybe add ", so we do not mark this as canUpgrade" to the first sentence, and maybe make the second sentence be like "We believe.., so we mark this as canBeUpgraded (but this hasn't been verified with an a3p test)".

The fact that we haven't verified it is either good note to add but doesn't change the flags, or a reason to not use canBeUpgraded.

If we had verified it, it would be 'canUpgrade. The only time it should be canBeUpgraded` is when we hope it's upgraded, but haven't proven it.

I thought canUpgrade is backwards-looking, while canBeUpgraded is forward-looking, and that there was no way to express "probably upgradable but not tested".

If I understand it correctly, canUpgrade means that this code is capable of upgrading a previous version, which we know is false here because the sub-baggage we added.. (actually we just changed that, we're no longer modifying the way baggage is used, so maybe this could upgrade the previous version, but I imagine there are other reasons that might not work).

I never did know the difference. Ooh! we have docs now:

* - `canUpgrade` means this code can perform an upgrade * - `canBeUpgraded` means that the contract stores kinds durably such that the next version can upgrade

So I concur that canBeUpgraded is correct and that any test for canUpgrade from the previous version is certain to fail.

warner · 2024-12-04T19:27:19Z

packages/inter-protocol/src/auction/auctionBook.js

@@ -118,6 +122,12 @@ export const makeOfferSpecShape = (bidBrand, collateralBrand) => {
 export const prepareAuctionBook = (baggage, zcf, makeRecorderKit) => {
  const makeScaledBidBook = prepareScaledBidBook(baggage);
  const makePriceBook = preparePriceBook(baggage);
+  // a map from collateralBrand to true when the quoteNotifier has an observer
+  // the brand is absent when there's no observer.
+  const quoteNotifierFlags = provideDurableMapStore(


oh, hey, should this just be a DurableMapSet?

warner

Code looks good to me. Nits on the comments, might pull in @dckc on the meta flag question (and to do another pass in general).

warner · 2024-12-04T20:25:04Z

packages/inter-protocol/src/auction/auctionBook.js

@@ -645,8 +668,8 @@ export const prepareAuctionBook = (baggage, zcf, makeRecorderKit) => {

          trace(`capturing oracle price `, state.updatingOracleQuote);
          if (!state.updatingOracleQuote) {
-            // if the price has feed has died, try restarting it.
-            facets.helper.observeQuoteNotifier();
+            // if the price feed has died, restart it.


maybe add "(or hasn't been started for this incarnation yet)"

warner · 2024-12-04T20:28:27Z

packages/inter-protocol/src/auction/auctioneer.js

+// auctionBook changed to create a sub-baggage for each book (to distinguish
+// their quoteNotifier flags) so older auctions will not be upgradable to this
+// version. We believe this version is saving all the necesary state to be
+// upgraded, but that hasn't been verified, so we don't mark it as `canUpgrade'.


I thought canUpgrade is backwards-looking, while canBeUpgraded is forward-looking, and that there was no way to express "probably upgradable but not tested".

If I understand it correctly, canUpgrade means that this code is capable of upgrading a previous version, which we know is false here because the sub-baggage we added.. (actually we just changed that, we're no longer modifying the way baggage is used, so maybe this could upgrade the previous version, but I imagine there are other reasons that might not work).

Chris-Hibbert · 2024-12-04T21:28:42Z

The declaration of ContractMeta says

  /**
   * - `none` means that the contract is not upgradable.
   * - `canUpgrade` means this code can perform an upgrade
   * - `canBeUpgraded` means that the contract stores kinds durably such that the next version can upgrade
   */

I'd be happy to take it out. It's not required.

dckc

FWIW...

dckc · 2024-12-04T20:40:22Z

packages/inter-protocol/src/auction/auctionBook.js

+        // Ensure that there is an observer monitoring the quoteNotifier. We
+        // assume that all failure modes for quoteNotifier eventually lead to
+        // fail or finish.


Should I read this as "We rely on SOMETHING to see that all failure modes for quoteNotifier eventually lead to fail or finish"? What is the SOMETHING?

If we don't rely on something else, we have to ensure it here, no?

The assumption is baked into the design of the notifiers and observers. There's no way to interrupt it from the consumer end, and no way to verify that it's still alive. Its contract says if it every ceases to continue producing new values it will either call fail or finish, or return an exception because the vat died.

dckc · 2024-12-04T21:45:21Z

packages/inter-protocol/src/auction/auctionBook.js

+  // Brands that have or are making active quoteNotifier Observers
+  const observedBrands = new Set();


I'm struggling to get the relevant invariant in my head.

Something that combines observedBrands with state.updatingOracleQuote

state.updatingOracleQuote has a value when we've received an update since the last auction. observedBrands contains the brand if the priceNotifier observer has been started and hasn't returned an error or called finish.

In the normal case, observedBrands doesn't change, but state.updatingOracleQuote gets reset every auction.

The interesting cases are startup and if the notifier ever breaks. The problem with startup is that it can take an arbitrary timespan for the oracles to start reporting prices after the auction starts. We want to start the observer, but we don't want to do it multiple time (as we were previously doing) because each creates a new observer that persists.

dckc · 2024-12-04T21:53:42Z

packages/inter-protocol/src/auction/auctioneer.js

+// auctionBook changed to create a sub-baggage for each book (to distinguish
+// their quoteNotifier flags) so older auctions will not be upgradable to this
+// version. We believe this version is saving all the necesary state to be
+// upgraded, but that hasn't been verified, so we don't mark it as `canUpgrade'.


I never did know the difference. Ooh! we have docs now:

* - `canUpgrade` means this code can perform an upgrade * - `canBeUpgraded` means that the contract stores kinds durably such that the next version can upgrade

So I concur that canBeUpgraded is correct and that any test for canUpgrade from the previous version is certain to fail.

mhofman

It's in one of these cases I wish state with shapes could grow new props.

mhofman · 2024-12-04T21:45:45Z

packages/inter-protocol/src/auction/auctionBook.js

@@ -118,6 +118,8 @@ export const makeOfferSpecShape = (bidBrand, collateralBrand) => {
 export const prepareAuctionBook = (baggage, zcf, makeRecorderKit) => {
  const makeScaledBidBook = prepareScaledBidBook(baggage);
  const makePriceBook = preparePriceBook(baggage);
+  // Brands that have or are making active quoteNotifier Observers
+  const observedBrands = new Set();


Can we explain why it's ok to have this be a heap Set? Both from a cardinality and upgradability pov.

Brian and I talked through this. If we decide to upgrade auctions, then the restart won't have an observer running, but it won't have reset any tracking variable we create, so it's important that the tracking go away on restart.

I think the cardinality question is about the fact that we're sharing a Set across auctoinBooks. We don't have ephemeral per-object state, but we can have ephemeral shared state. The number of brands handled by the auction is small and seldom grows, so the set won't get too big. We're already keeping all the auctionBooks in a single vat.

I figured as much for the upgradability side, but I would prefer this to be written out as comment.

Sorry, we lost the race with the merge. Please let me know if you think it's worth a new PR to add the comment. I'm happy to do it if you ask.

mhofman · 2024-12-04T21:48:59Z

packages/inter-protocol/src/auction/auctionBook.js

+                },
+              }),
+            e => {
+              trace('makeQuoteNotifier failed, resetting', e);
              state.updatingOracleQuote = null;


Why is it necessary to reset the state here?

It's safe because we'll check for a notifier again at the next auction. It's a good idea because we can't count on the notifier continuing to produce events if there's an error at this level.

I just don't quite understand in which case this isn't already null. This is an error making the notifier, so afaiu we would never have updated this state in the first place?

We didn't explore it deeply, but I think if a previous version was happily updating the state, then it upgrades, then anything which samples the price at the beginning of the new incarnation (before any new updates happen) will get the old price. If an auction then starts and initiates the observer process, but fails (maybe because the price feed vat is getting upgraded too?), then we'll hit this case, clear the ephemeral flag (so the next auction can initiate a new one), and also clear the durable state.updatingOracleQuote, which at least will tell clients to not rely on the old quote.

I suspect that's a bit weird. Not sure if it's a problem though.

On line 469, we add the brand to the set, and on line 472, we attempt to create the notifier. On line 477, we await the notifier. If the creation promise fails, we end up here, and want to ensure the brand is removed from the set, or we'll never try again to create a notifier.

ensure the brand is removed from the set, or we'll never try again to create a notifier.

I have no problem with the ephemeral set being reset on error. My concern is with how the durable updatingOracleQuote state is managed. Resetting it at this state doesn't seem correct to me, and might be an indication something else is amiss.

Brian mentions a case where this durable state may not be null: after an upgrade. I suppose the product question is whether we want to maintain or not the old "updating" quote while we attempt to restart the observer.

But beyond that question, the problem is that the ensureQuoteNotifierObserved call is gated on !updatingOracleQuote in the first place, so really it seems like restarting the vat wouldn't trigger the observer creation anyway because that state wouldn't be reset by the upgrade, and the contract would think it still has a notifier.

IMO, we need to switch the call to ensureQuoteNotifierObserved to be unconditional, return true if the observer is in the Set, and if not immediately reset the updatingOracleQuote (if we want to avoid pretending we have an updating quote), start the notifier, and return false so that captureOraclePriceForRound can continue bailing out early. In that case updatingOracleQuote wouldn't need to be reset if we fail to make the observer since we'd be guaranteed for it to already be null (again if we don't want to pretend we have an updating quote while re-establishing the observer).

Regarding the early bail out of captureOraclePriceForRound, I'm a little surprised that we would skip capturing a price for round when the observer isn't yet setup instead of delaying the capture, but that's a different topic.

Oh, yeah, if the auctionBook has a price (in durable state.updatingOracleQuote), then gets restarted/upgraded, the new incarnation won't start a new observer at startup (because nothing special happens at restart, it's not like we walk through all the pre-existing brands and do something for each one), and when the auction timer fires and it calls captureOraclePriceForRound(), that will see the old price and assume it's ok, and won't start a replacement observer then either.

I agree with @mhofman that the captureOraclePriceForRound() needs to check the ephemeral set instead of the durable state, or simply always call ensureQuoteNotifierObserved() and rely on its internal check to prevent duplicates.

mhofman · 2024-12-04T21:50:31Z

packages/inter-protocol/src/auction/auctionBook.js

          const { state, facets } = this;
          const { collateralBrand, bidBrand, priceAuthority } = state;

+          if (observedBrands.has(collateralBrand)) {


Why are we using the collateralBrand as key instead of the identity of this auctionBook? Are we guaranteed there is only ever a 1:1 relationship between auctionBook and collateralBrand?

The rest of auctionBook and auctioneer use the collateralBrand, so we're already reliant on it being 1:1.

So there is never multiple bid brands for a given collateral? I'm just wary of assumptions like this that aren't super clear or documented in code comments.

All of the bids are in IST, so there's no chance of that.

With Vaults, we explicitly intended that we might want to have multiple vaultManagers for a given currency that could have different minimums, or other requirements, and charge different rates.

With auctions, it's a mistake to split the bidding equity across multiple auctions, so the design was always that each auctionBook would handle all the bids and asks for a particular asset.

Yeah I think having a single bidding denom is the assumption that wasn't clear to me.

) ## Description It was observed, after applying upgrade 18 to EmeryNet, that a whole slew of promises resolved when a price was provided. The current belief is that `observeQuoteNotifier` will be called every auction round until a price is available, and that creates a new observer with each call that wait until a price is published, and then they all continue waiting for each successive update. This change adds an interlock, so if there's already a notifier waiting, we don't add a new one. ### Security Considerations No security implication. ### Scaling Considerations Processing about 19000 actions waiting on a notifier in EmeryNet took several hours. If we're correct that the notifiers will continue to cycle, we expect to see a similar wait for each price update on that currency. That's unsustainable. The only current theory about dropping all those actions waiting for notifiers is to kill the vat. We can't kill the priceAuthority vats that hold the notifiers, but we might be able to cleanly kill the abandoned auctioneers. ### Documentation Considerations Not needed. ### Testing Considerations Tough to test in unit tests. It's conceivable that we could recreate the situation in `a3p-integration`, though it would be hard to observe the results. ### Upgrade Considerations We probably shouldn't ship upgrade 18 with something to address this problem.

### Description Cherry-picks the following commits from master: - #10551 (9e19321) - #10635 (ad4e83e) - #10615 (e596a01 ) - #10634 (a1856f3) Since we plan to verify this rc on devnet rather than emerynet, there is no apparent need for a new upgrade name. Not aware of any deployments on devnet before so can reuse the previous upgrade name. However, skipping emerynet is dependent on comms with the validators so I have added a new upgrade name `agoric-ugprade-18-emerynet-rc3` just in case. Only added to bypass the need for a new rc if for some reason emerynet validation is needed anyways - best case scenario is that it remains unused. commits added using git cherry-pick

refs: #10660 ## Description in testing upgrade 18 candidates, we realized that vaultManager was subject to the same error as auctions fixed in #10615. ### Security Considerations No Security implications. ### Scaling Considerations Not fixing this would add a new notifier to the vaultManager's quote watcher for every hour that elapsed between upgrade and the oracles starting to provide price updates. ### Documentation Considerations Unnecessary. ### Testing Considerations This was detected by trawling slogs. It's not clear that we can do better than that to verify the fix. ### Upgrade Considerations It would probably be a mistake to upgrade vaults without this fix.

feat: don't reschedule notifier if we already have one

3e19377

Chris-Hibbert added bug Something isn't working resource-exhaustion Threats to availability from resource exhaustion attacks contract-upgrade auction labels Dec 4, 2024

Chris-Hibbert requested a review from warner December 4, 2024 00:48

Chris-Hibbert self-assigned this Dec 4, 2024

Chris-Hibbert requested a review from a team as a code owner December 4, 2024 00:48

warner reviewed Dec 4, 2024

View reviewed changes

feat: robustify startup of the quoteNotifier observer

a12a965

feat: use a map based on collateralBrand for each auctionBook

ba64933

Chris-Hibbert force-pushed the cth-auction-priceNotifier branch from b68ac12 to ba64933 Compare December 4, 2024 18:38

Chris-Hibbert requested review from warner December 4, 2024 18:57

warner reviewed Dec 4, 2024

View reviewed changes

Chris-Hibbert added 2 commits December 4, 2024 11:52

chore: drop the map; use an instance variable in state

77f2260

refactor: use a non-durable set to reflect the observers' lifetime

ee6c0a1

warner approved these changes Dec 4, 2024

View reviewed changes

chore: drop meta declaration

bff3484

Chris-Hibbert added the automerge:squash Automatically squash merge label Dec 4, 2024

Chris-Hibbert and others added 2 commits December 4, 2024 13:31

chore: update comment

59406eb

Merge branch 'master' into cth-auction-priceNotifier

e88aebe

dckc reviewed Dec 4, 2024

View reviewed changes

mhofman reviewed Dec 4, 2024

View reviewed changes

Merge branch 'master' into cth-auction-priceNotifier

466d3b8

mergify bot merged commit e596a01 into master Dec 4, 2024
81 checks passed

mergify bot deleted the cth-auction-priceNotifier branch December 4, 2024 23:44

mujahidkay mentioned this pull request Dec 9, 2024

chore: upgrade 18 cherrypicks round 4 #10645

Merged

mhofman mentioned this pull request Dec 9, 2024

chore(release): publish upgrade-18-rc3 #10656

Merged

warner mentioned this pull request Dec 9, 2024

auctioneer will create duplicate QuoteNotifiers until the first price is established #10660

Open

Chris-Hibbert mentioned this pull request Dec 10, 2024

Don't reschedule vault's priceNotifier if one exists #10668

Merged

		// Brands that have or are making active quoteNotifier Observers
		const observedBrands = new Set();

don't reschedule auction's price notifier if we already have one #10615

don't reschedule auction's price notifier if we already have one #10615

Conversation

Chris-Hibbert commented Dec 4, 2024

Description

Security Considerations

Scaling Considerations

Documentation Considerations

Testing Considerations

Upgrade Considerations

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloudflare-workers-and-pages bot commented Dec 4, 2024 • edited Loading

Deploying agoric-sdk with Cloudflare Pages

warner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

warner left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Chris-Hibbert commented Dec 4, 2024

dckc left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhofman left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

mhofman Dec 4, 2024 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

cloudflare-workers-and-pages bot commented Dec 4, 2024 •

edited

Loading

mhofman Dec 4, 2024 •

edited

Loading