Skip to content

Commit

Permalink
Update 018-2024-06-26.md
Browse files Browse the repository at this point in the history
added transcript
chore: add recording link

Signed-off-by: Drinkwater <[email protected]>
  • Loading branch information
brewsterdrinkwater authored Jul 19, 2024
1 parent 39d4602 commit 1dea266
Showing 1 changed file with 137 additions and 1 deletion.
138 changes: 137 additions & 1 deletion sig-providers/meetings/018-2024-06-26.md
Original file line number Diff line number Diff line change
Expand Up @@ -64,5 +64,141 @@
- Track provider build upgrades and ongoing testing via the product and engineering project board.
- Attendees to review detailed updates from the SIG clients meeting.
- Damir to create a GitHub repository for the Prometheus and Grafana deployment project.
## Transcript

# **Transcript**

_This editable transcript was computer generated and might contain errors. People can also change the text after it was created._

Tyler Wright: All right, welcome everybody to providers monthly meeting number 18. It is June 26 2024 During the special interest group for providers the Akash Community discusses anything related to the provider on Akash, at the last month's meeting there was a discussion following up on all things happening with the Predator acquisition and next steps as an integrate with console.

Tyler Wright: There was some follow-up discussion around the work on that. Jigarette and Deval are doing around console 2.0 during sick clients yesterday. So again, there was much detail that went into next steps and how Prater is going to be involved there. I'll give some time to the Prater team to talk a little bit about it here. But again, I urge anybody that wants to go into the details to watch the video that will be available shortly from sick clients just yesterday. After that damir had something that he wanted to share and then I wanted to open it up to any other discussions that anybody wanted to have.

Tyler Wright: But again, I'll just get it started Digger if there's anything that you wanted to mention in terms of how things are going greater as it stands today and some of the next steps, or Deval feel free to chime in.

Tyler Wright: Thank I'm gonna drop in the issue that's been created as of tracking around. Some of the provider builds upgrades have been happening. I know that again Scott made some major changes to how providers are built and has connected members of the community as well as members of the core team to test these things out. There's also again, I'll drop in the chat.

Tyler Wright: A number of items. I've been put together by Andre over a little while in terms of testing that is now underway and this testing that jigar just mentioned. So feel free to continue to track a number of items in the product and engineering project board related to this. I think there's one in testing around the provider build upgrades and then again there's a company issue that lays out all the specific testing that's going on related to that issue.

Tyler Wright: Cool again, I would urge anybody that wants to dive deep because Jigar, the ball the Maxes spent some time yesterday during sick clients. I'm going through what's going on next in a little bit more detail and more detail around console 2.0 and some of the five main areas of focus in the short term.

Tyler Wright: Cool, if there's no questions there then we'll continue on with the agenda. I know damir had mentioned wanting to spend some time to discuss something. So I will hand the floor over to damir to talk a little bit about something that he's been working on.

Damir Simpovic: All right. Thanks y So share my not screen, but just say

Damir Simpovic: just the window.

00:05:00

Damir Simpovic: can you guys see my window all…

Tyler Wright: Yes.

Damir Simpovic: so I just started working on it maybe a few days ago. So basically what I found was that there's a Prometheus and grafana Community film charts already available for deployment on the kubernetes cluster. And I toyed around with it and managed to deploy it on your cash provider. so not everything is working yet I just basically started working on it, but most of the features do work, so if you want to

Damir Simpovic: Actually, let me show you grafana dots Euro. - ai.net so basically yeah, it's running on my test provider Euro ai.net. And the dashboards are already created basically, so they have been pulled by Helm charts and they've been automatically installed.

Damir Simpovic: Yeah, I will work on cleaning this up a bit because we don't need all of them. But yeah. most of the stuff already works

Damir Simpovic: it's super simple and Yeah. For instance I don't know the networking. Stuff already works you can sort by.

Damir Simpovic: by namespace and see which pods actually Consumes what amount of traffic is being consumed but by which pod I don't know. Some stuff doesn't work some of these dashboards doesn't work yet. But yeah, I will clean this up. So yeah, if you want to test it I made you can go to this URL. I will leave it running. And I created the test user which is username and Both are test user. So you can see for yourself. So yeah, there you go.

Damir Simpovic: Yeah, if anybody has any desire to help. And contribute. Yeah, please do. I think it would be. A very efficient way of Basic monitoring tools for providers. There's also the alert manager here. It's also automatically installed basically can be configured to send alerts about provider. basically whatever we configure it to ert. So yeah on Discord on email. wherever So yeah, there you go. Not much to show yet. But yeah, it does.

Damir Simpovic: It does work. So yeah, there you go. Thank you.

Damir Simpovic: it's actually very good that you brought this up…

Andrey Arapov: Thank you.

Damir Simpovic: because I didn't know this until I actually tested this but some of the Akash Services provided pods are actually over utilizing the memory. So basically let me see where they find that.

Damir Simpovic: memory utilization 1800 percent so basically some of the ports are basically over utilizing the memory. So yeah. I think it's a Akash node. That's all utilizing. So yeah.

Andrey Arapov: It's interesting and interesting coach something definitely dig into and we could install it in more providers perhaps just to compare, the behavioral and it's also good to track it whenever we released a newer version of node software kind of compared with previous traffic instrument that previously we had the issue open by Andrew Mello and he's in the goal accident Thanks for opening it up that he noticed, the traffic Spike. So it's definitely good thing to have things.

00:10:00

Damir Simpovic: Yeah, I'll start a GitHub repo for this so. Basically on my own GitHub if anyone wants to take a look or whatever. feel free

Andrey Arapov: Yeah, cool. I think I'll just take some provider probably try to deploy there.

Damir Simpovic: I'll deploy first on your plots. Basically, I'll test on this provider on this which it's production but nobody's using it intentionally. or the first Production really? Let's call it version room. I'll test it on europods and go from there.

Andrey Arapov: Yeah, in the end of the day, it's just in all the metric so it's easy to kill it because I think it's been deployed this whole chart namespace you can just destroy it anytime, right?

Damir Simpovic: Yes, exactly. The helm charts is actually being maintained by the Prometheus community. And they're open source. You can just take them and modify them. we like but they work as they are. I only change a few things. I added the English controller. And yeah not a lot of work has been done from on my part yet, but

Andrew Mello: Hi guys, and hi damir. Thanks for taking a look at that. I have a little bit of feedback just from some experience with this. It's a question of really what is the problem the dashboards are looking to solve if you're looking to make a public dashboard for the cluster that's going to be available on a domain name. That is a good solve for it. However, if the cluster goes down, right or there's no network connectivity, then it doesn't really solve the problem for the user checking the status of the cluster. Right so you can look at that as a partial solve for kind of uptime and status but it still is really important to have an external public check basically of your accessibility on your ports and your endpoints for your cluster as So doing that against eight four four three, for example from an external check it's

Andrew Mello: Really important and that's something you should still definitely think about and kind of think through here as you're thinking through the solution because the dashboard you have worked great when the Clusters alive, but if the cluster is not accessible or there's an outage or the power goes out or the internet goes down. Then that is not gonna really do much or help from the outside perspective of…

Damir Simpovic: There are no absolutely I agree.

Andrew Mello: what your provider looks like. So I was just sharing this…

Damir Simpovic: Totally I think there is no dual be all solution,…

Andrew Mello: because again on my clusters, for example,…

Damir Simpovic: honestly, so this part would actually be Just a part of a providers monitoring.

Andrew Mello: there's always public dashboard. And then also again there are solutions for that kind of more local dashboard that would show if the cluster is up and running that is a nice to have…

Damir Simpovic: Let's call it the stack.

Andrew Mello: but you can't always rely on it basically and…

Damir Simpovic: And it will be completely optional,…

Andrew Mello: you can always rely on the external check.

Damir Simpovic: .

Andrew Mello: So just something for you to think through as you continue to develop that Sure. Yep. Got some feedback.

00:15:00

Damir Simpovic: I mean, yeah. Thank you.

Andrew Mello: Just some thoughts. Yeah again great great job.

Damir Simpovic: Thank you. Definitely. I agree with everything he said.

Tyler Wright: Excellent.

Tyler Wright: Yeah, our message in the chat. Thank you for that. Damir. Are there any other questions or comments and Andrew? Thanks for that feedback.

Tyler Wright: All I wanted to see if there's anything else on the provider side that anybody want to discuss or share again. We got an update in depth yesterday in Sig clients about console 2.0 and some of the work that will be involved from prador in that console 2.0 work and then we got again another update from jigar today damir gave a demo on some work that he's been doing around cluster monitoring, but I want to see if there's anything else that anybody wanted to discuss on the provider side. Go ahead Andrew.

Andrew Mello: Yeah, hi guys. So since we all last spoke or you heard from me one of the things that would affect basically everybody on this call and you may have noticed that a couple weeks ago, but a cost mining is well underway, and in the process of building that I have finalized and finished the deployment queue and what that means is some of your providers you may have noticed at one time or another were becoming completely full topped off and that's one of the actual effects and benefits of the queue is that it can size and right-size deployments to the available resources on a given provider. So again, some of you may have noticed every CPU getting full, that was by Design in the testing of the queue. So again, just a heads up to the providers that

Andrew Mello: when that Q runs you can expect a lot of deployments usage and leases basically coming from it. So I just a heads up there and normal activity from my end.

Tyler Wright: Excellent. Does anyone have any questions on that from Andrew?

Tyler Wright: All there's no active discussions that I see that affects providers again. I would ask that anybody that wants to get involved in discussions on GitHub do so to make your thoughts heard, but there's nothing that I can generally see it seems like everything is fairly business as usual on the provider side. I have seen this one user and MF having some real issues, but getting some support from the community and members the Insiders. Go ahead damir.

Damir Simpovic: Yeah about this user. He's basically learning kubernetes. So yeah, don't take it too seriously.

Tyler Wright: Yeah. Yeah. No, I know that documentation. Can you get reference? I think he's just getting literally step by step help hand holding from the community. So again, I do appreciate that. I did want to see if there's anything else on the provider side that anybody wants to discuss. We did talk a little bit on the six support call about updating documentation to make it very clear in terms of where some of the provider audits should go. That's something that is continuing to be handled. Again. There's another way that'll pass through soon. But again, we talked about that already during Sig supports a lookout for just some documentation updates as well as those that are insiders at the next meeting. We'll just do some clarification to make sure that it's very clear on where to send people for various things.

Tyler Wright: Outside of those topics. Is there anything else that anybody wants to just discuss? Otherwise again, I'll let you all go and we have the working group for the cash website in docs to look forward to tomorrow as well as the steering committee, but Anything else anybody wants to discuss before I let y'all go.

Tyler Wright: Terrific. again much appreciate the updates from Jigger Deval damier Andrew appreciate everyone's time on this call today. yeah take some time back and I'll drop in the notes here shortly, but please feel free to join in providers or the providers Discord channels if you have any questions between meetings, Much appreciate everyone's time today. Hope everyone has a great rest of the day. Thank you.

00:20:00

Andrey Arapov: They go.

Andrew Mello: Yeah.

Rodri R: you guys

Jigar Patel: Thanks guys.

Andrey Arapov: right

Meeting ended after 00:20:26 👋


0 comments on commit 1dea266

Please sign in to comment.