abetlen / llama-cpp-python Public

Notifications You must be signed in to change notification settings
Fork 947
Star 8k

Code
Issues 445
Pull requests 71
Discussions
Actions
Projects
Security
Insights

Additional navigation options

Code
Issues
Pull requests
Discussions
Actions
Projects
Security
Insights

Issues: abetlen/llama-cpp-python

Roadmap for v0.2

#487 opened Jul 18, 2023 by abetlen

Open

Add batched inference

#771 opened Sep 30, 2023 by abetlen

Open 35

Improve installation process

#1178 opened Feb 12, 2024 by abetlen

Open 7

Labels 23 Milestones 0

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

445 Open 667 Closed

Author

Filter by author

Label

Filter by label

Use alt + click/return to exclude labels

or ⇧ + click/return for logical OR

Projects

Filter by project

Milestones

Filter by milestone

Assignee

Filter by who’s assigned

Assigned to nobody

Sort

Sort by

Newest Oldest Most commented Least commented Recently updated Least recently updated Best match

Most reactions

Issues list

low level examples broken after [feat: Update sampling API for llama.cpp (#1742)]

#1803 opened Oct 20, 2024 by mite51

Llama.from_pretrained should work with HF_HUB_OFFLINE=1

#1801 opened Oct 16, 2024 by davidgilbertson

top_p = 1 causes deterministic outputs

#1797 opened Oct 14, 2024 by oobabooga

Add reranking support

#1794 opened Oct 14, 2024 by donguyen32

Long Context Generation Crashes Google Colab Instance

#1792 opened Oct 12, 2024 by kazunator

server: chat completions returns wrong logprobs model

#1787 opened Oct 6, 2024 by domdomegg

4 tasks done

llama-cpp-python 0.3.1 didn't use GPU(

#1785 opened Oct 5, 2024 by blademoon

4 tasks done

Tool parser cannot analysis tool calls string from qwen2.5.

#1784 opened Oct 5, 2024 by hpx502766238

Why is this not working for the current release. UNABLE TO USE GPU

#1781 opened Oct 2, 2024 by AnirudhJM24

llama-cpp-python not using GPU on google colab

#1780 opened Oct 2, 2024 by AnirudhJM24

_logger.py: KeyError:5 [bugfix] [patch]

#1778 opened Oct 2, 2024 by themanyone

4 tasks done

Missing async llm call

#1774 opened Oct 1, 2024 by ivanstepanovftw

Setting temperature to 100000000000000000 does not affect output.

#1773 opened Oct 1, 2024 by ivanstepanovftw

Error when passing model to deepcopy in llama_cpp_python>=0.3.0

#1769 opened Sep 28, 2024 by sergey21000

[FEAT]: TLS Certificate Support

#1768 opened Sep 27, 2024 by isgallagher

Inference Speed is Extremely Slow for 72B Model with Long Contexts

#1767 opened Sep 27, 2024 by wrench1997

FileNotFoundError: Shared library with base name 'llama' not found

#1764 opened Sep 26, 2024 by HAOYON-666

Feature request: ability to tokenize a list of strings _or_ keep the tokenizer warm enhancement

New feature or request

#1763 opened Sep 25, 2024 by lsorber

Llama.embed crashes when n_batch > 512 bug

Something isn't working

#1762 opened Sep 25, 2024 by lsorber

Cannot load moondream2 model in colab

#1760 opened Sep 25, 2024 by phuc2272000

Server crash with exceed context | lib version >= v0.2.81

#1759 opened Sep 25, 2024 by carlostomazin

4 tasks done

Do llama.cpp support input_embeds?

#1757 opened Sep 24, 2024 by OswaldoBornemann

chatml-function-calling chat format fails to generate multi calls to the same tool

#1756 opened Sep 23, 2024 by jeffmaury

4 tasks done

Serverless inferencing, basic chatbot style

#1755 opened Sep 23, 2024 by ericcurtin

Change the command to CMAKE_ARGS="-DGGML_CUDA=on -DLLAVA_BUILD=off" pip install -U llama-cpp-python --force-reinstall --no-cache-dir solved the problem.

#1754 opened Sep 22, 2024 by yimuu

Previous 1 2 3 4 5 … 17 18 Next

Previous Next

ProTip! What’s not been updated in a month: updated:<2024-09-20.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly