Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Prevent users from staking on machines that cannot calculate proofs in time #4849

Open
crtahlin opened this issue Oct 4, 2024 · 5 comments

Comments

@crtahlin
Copy link
Contributor

crtahlin commented Oct 4, 2024

Summary

Some machines are too slow to calculate proofs required for storage incentive lottery, in time. In that case, although they are properly staking and might have all the data, they are failing to participate fruitfully in the lottery.

There is a way to check the performance of the machines, as described in docs with rchash. But user's might not know they can/should do that. Either they miss it in the docs, or they are using some other method of installation (e.g. DAppNode).

The proposed new feature would be to required the users to run at least one rchash calculation on a fully synced node as a prerequisite, before being allowed to stake. Which could be overridden by the users, allowing them to stake even if thi prerequisite is not met.

Motivation

A user might run a full staking Bee node, investing time and resources (stake, electricity, hardware, time) and not winning anything, while not even knowing why - that their resources might be insufficient (CPU speed, storage speed).

Implementation

A full Bee node already knows what the result of last rchash calculation was - how long it took to calculate:
image

The same, or another field, could be used to store the duration of a rchash calculation. Whichever field is used, its value should be retrieved when a staking call is executed, and if the duration is too long, the staking should fail with an informative message, that the machine would probably not be able to play the lottery. If there is no information in the field, the staking should fail, with an informative message, instructing the user to run rchash.

The user can call the staking endpoint with a special argument, overriding the above mentioned check of rchash duration.

Also, when running rchash call, the returned information should also include a user friendly field, explicitly stating that the calculation was done in required time needed to play in the lottery (perhaps this specific field could be referenced by the logic allowing staking).

Drawbacks

It could affect the onboarding UX negatively, esp. if done improperly - if stopping the user in the onboarding process without proper explanation.

@n00b21337
Copy link

yes, I missed it as it's just an endpoint that wasn't of particular interest at the time. I only realised that yesterday, when it was brought up that nodes, could miss a lottery. should be in docs with some highest warning on staking part. @NoahMaizels

think it would be also useful to make this a more rigorous feature, that nodes are tested periodically and output is saved to node and displayed on "status" or something, with average and latest, worst time etc. Why?
because from my example I run it on server with different things on it, and it affects performances, the more other things I install the less resources is for bee, so i might start with enough but could end up with not enough. think some kind of monitoring of this would be good way for node operators to have things in check.

@ldeffenb
Copy link
Collaborator

ldeffenb commented Oct 4, 2024

I would vote against periodically running this rchash check in any internal fashion. If you've ever watched the host performance when a node is calculating the hash, it's a very noticeable load on both the CPU and storage subsystems (read: SSD). If two co-resident nodes in different neighborhoods happen to do an rchash check at the same time, they'll both be longer than they should be, possibly to the point of reporting themselves as "bad" or whatever the intent is for too-long rchash times.

Outside-the-swarm variable host load is one thing, but intentionally adding such a load (random rchash calculation) on every single swarm node is unnecessary, gains nothing, and wastes resources IMHO.

@n00b21337
Copy link

yes, didnt consider those situations. well maybe you are right and my suggerstions is overkill. but we should make it a strong warning for all node operators to do checks on this regularly and that they must be aware of this as part of the "node health"

@n00b21337
Copy link

also make human readable outout of duration currently its this nanoseconds

image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants
@crtahlin @ldeffenb @n00b21337 and others