Why does the response take so long? #108
-
Hi @tjake ! Thanks so much for your work on this project. I love it! Just trying it out for the first time. It's so slow! Is it something I'm doing wrong? Running locally on my MacBook Pro, responses take over 60 seconds. Deploying to production, responses take 20-30 seconds. public class LangchainAiAdapter implements AiGateway {
private final StreamingAssistant streamingAssistant;
public LangchainAiAdapter() {
StreamingChatLanguageModel model = JlamaStreamingChatModel.builder()
.modelName("tjake/TinyLlama-1.1B-Chat-v1.0-Jlama-Q4")
.temperature(0.3f)
.build();
this.streamingAssistant = AiServices.builder(StreamingAssistant.class)
.streamingChatLanguageModel(model)
.systemMessageProvider(o -> "system message")
.chatMemoryProvider(memoryId -> MessageWindowChatMemory.withMaxMessages(10))
.build();
}
@Override
public Flux<String> sendMessageAndReceiveReplies(UserMessageDto userMessageDto) {
return streamingAssistant.chat(userMessageDto.chatId(), userMessageDto.userMessageText());
}
} |
Beta Was this translation helpful? Give feedback.
Replies: 2 comments 13 replies
-
Hi @LiveNathan Are you including the native bindings? You need to explicitly add them https://github.com/langchain4j/langchain4j-examples/blob/main/jlama-examples/pom.xml#L21C1-L34C1 I don't have a x86 mac but I build the bindings and see they work: https://github.com/tjake/Jlama/actions/runs/11677523417/job/32515542812#step:4:11670 So try to see if thats the issue. Also what JVM are you using? |
Beta Was this translation helpful? Give feedback.
-
Ohhh, you're using the old version of jlama that went with langchain4j, you need to exclude that version and use the latest (sorry, this is the issue with bundling with langchain4j)
As for the jdk, use |
Beta Was this translation helpful? Give feedback.
Ohhh, you're using the old version of jlama that went with langchain4j, you need to exclude that version and use the latest (sorry, this is the issue with bundling with langchain4j)