GRPC support - in scope? #52

justinsb · 2024-02-26T13:36:30Z

I'd like to be able to run gemma.cpp on kubernetes. A first step in my rough plan is to add a client/server mode, and I thought I would add GRPC support. Is the project open to having a contrib directory where we can collaborate on this sort of thing? In future, I'm imagining we could put things like kubernetes manifests in there also.

I have a simple server (though my C++ is not good!) and an example client in golang which I will send as a WIP PR to make the discussion more concrete.

The text was updated successfully, but these errors were encountered:

austinvhuang · 2024-02-26T15:45:24Z

Hi @justinsb thanks for taking the initiative. There's been interest in client/server capabilities and I think there's some obvious use cases + value in that.

There's a few things being worked out that intertwine with such an implementation:

The server piece of this would probably be its own frontend entrypoint (basically in-place of run.cc), how should these alternative frontends be implemented? (I was working on some example demos, but they're on pause while we're traiging this first wave of post-launch PRs/issues)
Should these implementations live in this repo (eg contrib/) or separate repos (like https://github.com/namtranase/gemma-cpp-python)?
There's some minor decoupling code cleanup to better support gemma.cpp-as-a-library / alternative frontend use cases (eg we probably want to decouple the cli argument handling more cleanly).

My suggestion is keep your initial implementation light (some interfaces may change as a result of #3). Can use that as a basis for thinking through design gaps + cleanup needed. A meta q is where to have more involved design discussions w/ community (i've also opened up the discussions tab up top but haven't made use of it yet, may look into a discord).

justinsb · 2024-02-28T14:46:13Z

There's been interest in client/server capabilities and I think there's some obvious use cases + value in that.

That's great news, and I agree!

There's a few things being worked out that intertwine with such an implementation:

The server piece of this would probably be its own frontend entrypoint (basically in-place of run.cc), how should these alternative frontends be implemented? (I was working on some example demos, but they're on pause while we're traiging this first wave of post-launch PRs/issues)

I had a go, my alternative entrypoint server.cc basically copies and pastes run.cc and has started swapping out functionality so that it read & writes over GRPC instead of stdin/stdout. I will give it a cleanup pass, but I think the core gemma.cpp is very amenable to reuse already and there's not a ton of boilerplate between run.cc and server.cc, so that's a good sign that this is well architected IMO.

Should these implementations live in this repo (eg contrib/) or separate repos (like https://github.com/namtranase/gemma-cpp-python)?

My 2c: putting it into the same repo simplifies any refactoring we want to do as part of (1). For example, if we want to make small changes e.g. to function signatures, or larger changes like supporting batching (I don't think that's there today?). Over time, the core will stabilize and contrib will grow, and we'll probably move things out of contrib into their own repos and encourage more work in other repos - we saw the same pattern in kubernetes. But when the project is getting started, if you want everything to be working, IMO you have to be able to make changes to the whole ecosystem at once, and one repo is the best solution I've found.

I don't think this should discourage people doing things in other repos; rather I think that having some consumers in the same repo acts both as an example and an existence proof.

There's some minor decoupling code cleanup to better support gemma.cpp-as-a-library / alternative frontend use cases (eg we probably want to decouple the cli argument handling more cleanly).

Great - I hope that having a few "consumers" in the repo will help us easily see the impact on different frontends of these changes, because we'll hopefully make any required changes in the same PR. Consumers in other repos can then mirror those changes.

My suggestion is keep your initial implementation light (some interfaces may change as a result of #3). Can use that as a basis for thinking through design gaps + cleanup needed.

Ack - and that's exactly what I'm hoping to achieve by colocating it in this repo.

A meta q is where to have more involved design discussions w/ community

My view is that the best discussions normally happen in the issue comments and PR comments. PRs/code are also part of the conversation, for example maybe we host the GRPC frontend in contrib/ or examples/ while that is still an open area of discussion, but then over time (once the LLM community has converged on some RPC approach) we remove it. One thing we do need in contrib/ or examples/ is some indication that "the code in this directory is not part of the core and might be removed in future".

(i've also opened up the discussions tab up top but haven't made use of it yet, may look into a discord).

You might also consider hosting occasional video meetings ("office hours"), though usually those organically grow from having a few ad-hoc discussions as a core team emerges. I've not personally seen a lot of uptake of the github discussions feature. I know discord is big in the AI community so that might be a good option.

jan-wassenberg · 2024-07-15T14:51:28Z

Hi @justinsb, after a huge delay, we are catching up on issues and suggestions this week :)

One update is that batching is now supported, but hasn't seen a lot of testing yet. It seems useful to have that supported in interfaces.

I like your idea of adding contrib, with the expectation that interfaces may change up to a certain time, and think that gRPC + HTTP would be great additions. Happy to discuss.

FYI there is also a discord, but personally I'd also prefer to discuss in issues/PRs as you suggest, easier to organize/search that way.

justinsb mentioned this issue Feb 26, 2024

WIP: Add grpc server to contrib #53

Closed

austinvhuang added the Feature New feature or request label Feb 26, 2024

namtranase mentioned this issue Mar 17, 2024

TODO features namtranase/gemma-cpp-python#1

Open

10 tasks

KumarGitesh2024 assigned jan-wassenberg Jun 7, 2024

KumarGitesh2024 added the stat:awaiting response Status - Awaiting response from author label Aug 19, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

GRPC support - in scope? #52

GRPC support - in scope? #52

justinsb commented Feb 26, 2024

austinvhuang commented Feb 26, 2024

justinsb commented Feb 28, 2024

jan-wassenberg commented Jul 15, 2024

GRPC support - in scope? #52

GRPC support - in scope? #52

Comments

justinsb commented Feb 26, 2024

austinvhuang commented Feb 26, 2024

justinsb commented Feb 28, 2024

jan-wassenberg commented Jul 15, 2024