You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
What should be the goal of this tool? Should it just stick to pointing troubling areas or also dump data of the troubling areas and can that data be trusted at the face value?
How much logs should the tool collect, if at all. Just enough or all of it so that further debugging is done by grep-ing outputs in an editor of choice or some back-and-forth commands?
What should be a baseline assumption for this tool(it's turtles all the way down, which turtle should be this tool's last one)? Is it a good idea to assume, K8s is supposed to be healthy and is managed perfectly by the admin?
Background
Right now we have super preliminary support for debugging Cstor volumes, it'd be good if we can think on something on the lines of debugging + creating a github issue + dogfooding, etc.
Right now, the cstor volume debugging, just points to places which seems off, it'd be good to sort of plan and implement, debugging in stages, i.e. it helps narrow down the search space by pointing out what's right, what isn't & what may not be
Identify a list of things, which needs to be checked(is the storage-engine replicated?, should NDM agents failing affect this volume/pool?)
K8s APIserver is up & healthy
K8s kube-system components are up, is kubelet container(for certain setups) up, how does node-heartbeat look like for concerned nodes(are they alive and kicking, do they have any X-Pressure)?
Networking isn't down(imp for replicated storage engines)
Relevant OpenEBS components are up(as identified in step-1)
There are some limitations to the tool, it might be hard to figure out(at first) if the application is failing because of storage or vice versa.
It might be a good ask to think of using the same tool to collect useful information on cluster-destruction, which is likely what happens when an E2E test fails. It might be useful as a replacement of a bunch of kubectl & shell commands.
Questions
Background
kube-system
components are up, is kubelet container(for certain setups) up, how does node-heartbeat look like for concerned nodes(are they alive and kicking, do they have any X-Pressure)?Goals
Pre-requisites issues for this task:
The text was updated successfully, but these errors were encountered: