Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRPC support - in scope? #52

Open
justinsb opened this issue Feb 26, 2024 · 3 comments
Open

GRPC support - in scope? #52

justinsb opened this issue Feb 26, 2024 · 3 comments
Assignees
Labels
Feature New feature or request stat:awaiting response Status - Awaiting response from author

Comments

@justinsb
Copy link

I'd like to be able to run gemma.cpp on kubernetes. A first step in my rough plan is to add a client/server mode, and I thought I would add GRPC support. Is the project open to having a contrib directory where we can collaborate on this sort of thing? In future, I'm imagining we could put things like kubernetes manifests in there also.

I have a simple server (though my C++ is not good!) and an example client in golang which I will send as a WIP PR to make the discussion more concrete.

@austinvhuang
Copy link
Collaborator

Hi @justinsb thanks for taking the initiative. There's been interest in client/server capabilities and I think there's some obvious use cases + value in that.

There's a few things being worked out that intertwine with such an implementation:

  1. The server piece of this would probably be its own frontend entrypoint (basically in-place of run.cc), how should these alternative frontends be implemented? (I was working on some example demos, but they're on pause while we're traiging this first wave of post-launch PRs/issues)
  2. Should these implementations live in this repo (eg contrib/) or separate repos (like https://github.com/namtranase/gemma-cpp-python)?
  3. There's some minor decoupling code cleanup to better support gemma.cpp-as-a-library / alternative frontend use cases (eg we probably want to decouple the cli argument handling more cleanly).

My suggestion is keep your initial implementation light (some interfaces may change as a result of #3). Can use that as a basis for thinking through design gaps + cleanup needed. A meta q is where to have more involved design discussions w/ community (i've also opened up the discussions tab up top but haven't made use of it yet, may look into a discord).

@austinvhuang austinvhuang added the Feature New feature or request label Feb 26, 2024
@justinsb
Copy link
Author

There's been interest in client/server capabilities and I think there's some obvious use cases + value in that.

That's great news, and I agree!

There's a few things being worked out that intertwine with such an implementation:

  1. The server piece of this would probably be its own frontend entrypoint (basically in-place of run.cc), how should these alternative frontends be implemented? (I was working on some example demos, but they're on pause while we're traiging this first wave of post-launch PRs/issues)

I had a go, my alternative entrypoint server.cc basically copies and pastes run.cc and has started swapping out functionality so that it read & writes over GRPC instead of stdin/stdout. I will give it a cleanup pass, but I think the core gemma.cpp is very amenable to reuse already and there's not a ton of boilerplate between run.cc and server.cc, so that's a good sign that this is well architected IMO.

  1. Should these implementations live in this repo (eg contrib/) or separate repos (like https://github.com/namtranase/gemma-cpp-python)?

My 2c: putting it into the same repo simplifies any refactoring we want to do as part of (1). For example, if we want to make small changes e.g. to function signatures, or larger changes like supporting batching (I don't think that's there today?). Over time, the core will stabilize and contrib will grow, and we'll probably move things out of contrib into their own repos and encourage more work in other repos - we saw the same pattern in kubernetes. But when the project is getting started, if you want everything to be working, IMO you have to be able to make changes to the whole ecosystem at once, and one repo is the best solution I've found.

I don't think this should discourage people doing things in other repos; rather I think that having some consumers in the same repo acts both as an example and an existence proof.

  1. There's some minor decoupling code cleanup to better support gemma.cpp-as-a-library / alternative frontend use cases (eg we probably want to decouple the cli argument handling more cleanly).

Great - I hope that having a few "consumers" in the repo will help us easily see the impact on different frontends of these changes, because we'll hopefully make any required changes in the same PR. Consumers in other repos can then mirror those changes.

My suggestion is keep your initial implementation light (some interfaces may change as a result of #3). Can use that as a basis for thinking through design gaps + cleanup needed.

Ack - and that's exactly what I'm hoping to achieve by colocating it in this repo.

A meta q is where to have more involved design discussions w/ community

My view is that the best discussions normally happen in the issue comments and PR comments. PRs/code are also part of the conversation, for example maybe we host the GRPC frontend in contrib/ or examples/ while that is still an open area of discussion, but then over time (once the LLM community has converged on some RPC approach) we remove it. One thing we do need in contrib/ or examples/ is some indication that "the code in this directory is not part of the core and might be removed in future".

(i've also opened up the discussions tab up top but haven't made use of it yet, may look into a discord).

You might also consider hosting occasional video meetings ("office hours"), though usually those organically grow from having a few ad-hoc discussions as a core team emerges. I've not personally seen a lot of uptake of the github discussions feature. I know discord is big in the AI community so that might be a good option.

@jan-wassenberg
Copy link
Member

Hi @justinsb, after a huge delay, we are catching up on issues and suggestions this week :)

One update is that batching is now supported, but hasn't seen a lot of testing yet. It seems useful to have that supported in interfaces.

I like your idea of adding contrib, with the expectation that interfaces may change up to a certain time, and think that gRPC + HTTP would be great additions. Happy to discuss.

FYI there is also a discord, but personally I'd also prefer to discuss in issues/PRs as you suggest, easier to organize/search that way.

@KumarGitesh2024 KumarGitesh2024 added the stat:awaiting response Status - Awaiting response from author label Aug 19, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Feature New feature or request stat:awaiting response Status - Awaiting response from author
Projects
None yet
Development

No branches or pull requests

4 participants