merged from upstream #23

l3utterfly · 2024-06-12T08:10:17Z

Self Reported Review Complexity:
- Review Complexity : Low
- Review Complexity : Medium
- Review Complexity : High
I have read the contributing guidelines

* ggml : unify rope norm/neox (CPU) * ggml : fix compile warning * ggml : remove GLM rope mode ggml-ci * metal : better rope implementation ggml-ci * cuda : better rope implementation ggml-ci * naming : n_orig_ctx -> n_ctx_orig ggml-ci * dev : add reminders to update backends ggml-ci * vulkan : fix ggml_rope_ext() usage * cuda : fix array size + indents ggml-ci

* CUDA: refactor mmq, dmmv, mmvq * fix out-of-bounds write * struct for qk, qr, qi * fix cmake build * mmq_type_traits

* add openmp lib to dockerfiles * build only main and server in their docker images

* feat: add changes to handle jina v2 base code * fix: do not complicate things * fix: fix the usage of the code model * fix: fix comments * fix: fix linting issues * fix: remove ollama patches * style : minor --------- Co-authored-by: Georgi Gerganov <[email protected]>

* grammars: x{min,max} repetition operator + tweak +/*/? to avoid duplication of original over alternates * grammars: handle `x{n}` and fix `x{n,n}` * grammars: document new repetition operators * grammars: uniform use of int for min & max * grammars: refactor parser test * grammar: parsing tests w/ natural pretty print of updated expectations * grammars: much prettier print of expectations (+ TEST_GRAMMAR_PARSER_PRINT_ALL=1 to force all) * grammars: improve test pretty print again * grammars: pretty print rules and chars * grammars: fix copy rule skipping * grammars: disallow `a{,}` (not allowed in regexps) * Update common/grammar-parser.cpp Co-authored-by: Clint Herron <[email protected]> * grammars: fix copy rule skipping (again) & display of expectations * grammars: more test cases * grammars: update reps parsing to bring ? / * / + closer to before * json: use new GBNF repetitions{m,n} syntax * grammars: update performance gotchas w/ repetition advice * Update examples/json_schema_to_grammar.py Co-authored-by: Clint Herron <[email protected]> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <[email protected]> * grammars: comment on rule repetitions * grammars: ensure unambiguous number alternatives * grammar: nit typo switched error msgs * grammar: nit numbering in comment * json: update numeric rule to be unambiguous * Apply suggestions from code review Co-authored-by: Clint Herron <[email protected]> * Update examples/server/public/json-schema-to-grammar.mjs Co-authored-by: Clint Herron <[email protected]> * json: fix integral-part * grammar: add repetition tests --------- Co-authored-by: Clint Herron <[email protected]>

derievatives --> derivatives

…ov#6467) * Added support for . (any characer) token in grammar engine. * Add integration tests for any-character symbol.

* imatrix : migrate to gpt_params ggml-ci * imatrix : add --save-frequency cli arg * common : fix --no-ppl

* imatrix : detect nan/inf values * quantize : check imatrix for nan/inf values

* avoid to get prompt in infill mode and embedding mode * remove embedding mode * refactor format --------- Co-authored-by: wudexiang <[email protected]>

common depends on pthreads in Linux

* vulkan : reuse parent extra for views * Fix validation error when multiple compute contexts are used in a graph --------- Co-authored-by: 0cc4m <[email protected]>

…7728) * server : Smart selection of available slot using Longest Common Substring * add usage * remove trailing whitespaces * Use Longest Common Prefix (LCP) instead of LCS * Rename argument

* url: save -mu download to new cache location * url: fs_get_cache_file_path util * url: tweak sig of fs_get_cache_file

…ov#7682)" (ggerganov#7808) This reverts commit 9422c5e.

…nov#7827) Main changes of this PR is to consolidate GGUFWriter.add_key and GGUFWriter.add_val into GGUFWriter.add_key_value. In addition use_temp_file is now opt-in instead of opt-out defaulting to False. Also GGUFWriter now does not require output file name until when actually writing to it. And GGUFWriter doesn't really need to eagerly prepare the data layout of the metadata

@teleprint-me

In ggerganov#7075, to fix the conversion of (some) models using model-00001-of-00001.safetensors instead of model.safetensors for a single model part we simply used the same logic as the part count to get the part names. But this doesn't always work correctly, like when unusual additional model files like consolidated.safetensors in https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3 are present. This commit matching both the prefix and the suffix of the model part names should fix this problem without breaking any previously-supported upstream models. But according to report by @teleprint-me there is still some persistent problem, but shall do in the meantime.

…ov#7693) `--model-name` argument was added a while ago but did not do anything. This commit fixes this issue and enables this feature.

…gerganov#7830)

…and squash merges [no ci] (ggerganov#7700) This commit adds pull_request_template.md and CONTRIBUTING.md . It focuses on explaining to contributors the need to rate PR complexity level, when to add [no ci] and how to format PR title and descriptions. Co-authored-by: Brian <[email protected]> Co-authored-by: compilade <[email protected]>

Flake lock file updates: • Updated input 'nixpkgs': 'github:NixOS/nixpkgs/ad57eef4ef0659193044870c731987a6df5cf56b?narHash=sha256-SzDKxseEcHR5KzPXLwsemyTR/kaM9whxeiJohbL04rs%3D' (2024-05-29) → 'github:NixOS/nixpkgs/051f920625ab5aabe37c920346e3e69d7d34400e?narHash=sha256-4q0s6m0GUcN7q%2BY2DqD27iLvbcd1G50T2lv08kKxkSI%3D' (2024-06-07) Co-authored-by: github-actions[bot] <github-actions[bot]@users.noreply.github.com>

Signed-off-by: Ben Ashbaugh <[email protected]>

* CUDA: int8 tensor cores for MMQ (legacy quants) * fix out-of-bounds writes * __builtin_assume -> GGML_CUDA_ASSUME * fix writeback returning too early

…ar examples & converters (ggerganov#7841) * json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme

…etty print (ggerganov#7866)

* try to fix CUDA ci with --allow-unsupported-compiler * trigger when build.yml changes * another test * try exllama/bdashore3 method * install vs build tools before cuda toolkit * try win-2019

ggml-ci

* fix broken link in pr template * Update pull_request_template.md [no ci] --------- Co-authored-by: Brian <[email protected]>

* Update Vulkan RoPE implementation * Return nullptr on alloc_buffer when allocation fails, instead of throwing an exception Minor fixes * Fix segfault when running out of VRAM Co-authored-by: slaren <[email protected]> --------- Co-authored-by: slaren <[email protected]>

…gerganov#7582)

…#7794) [no ci] Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support

ggerganov and others added 30 commits June 5, 2024 11:29

CUDA: refactor mmq, dmmv, mmvq (ggerganov#7716)

7d1a378

* CUDA: refactor mmq, dmmv, mmvq * fix out-of-bounds write * struct for qk, qr, qi * fix cmake build * mmq_type_traits

Fix encoding in python scripts (ggerganov#7733)

7672ade

docker : add openmp lib (ggerganov#7780)

d67caea

docker : build only main and server in their images (ggerganov#7782)

2d08b7f

* add openmp lib to dockerfiles * build only main and server in their docker images

README minor fixes (ggerganov#7798) [no ci]

a143c04

derievatives --> derivatives

Added support for . (any character) token in grammar engine. (ggergan…

ad675e1

…ov#6467) * Added support for . (any characer) token in grammar engine. * Add integration tests for any-character symbol.

imatrix : migrate to gpt_params (ggerganov#7771)

f83351f

* imatrix : migrate to gpt_params ggml-ci * imatrix : add --save-frequency cli arg * common : fix --no-ppl

server : fix --threads-http arg (ggerganov#7801)

ee459f4

check for nans in imatrix and quantize (ggerganov#7807)

c9ee711

* imatrix : detect nan/inf values * quantize : check imatrix for nan/inf values

[SYCL] fix softmax r2r result wrong issue (ggerganov#7811)

d5c938c

server : do not get prompt in infill mode (ggerganov#7286)

a5cabd7

* avoid to get prompt in infill mode and embedding mode * remove embedding mode * refactor format --------- Co-authored-by: wudexiang <[email protected]>

server: update cache_prompt documentation [no ci] (ggerganov#7745)

7027b27

cmake : fix BUILD_SHARED_LIBS=ON build (ggerganov#7784)

27615f5

common depends on pthreads in Linux

gguf-split : change binary multi-byte units to decimal (ggerganov#7803)

c00fad7

vulkan : reuse parent extra for views (ggerganov#7806)

da799b4

* vulkan : reuse parent extra for views * Fix validation error when multiple compute contexts are used in a graph --------- Co-authored-by: 0cc4m <[email protected]>

server : smart slot selection using Longest Common Prefix (ggerganov#…

7a16ce7

…7728) * server : Smart selection of available slot using Longest Common Substring * add usage * remove trailing whitespaces * Use Longest Common Prefix (LCP) instead of LCS * Rename argument

url: save -mu downloads to new cache location (ggerganov#7826)

d4d915d

* url: save -mu download to new cache location * url: fs_get_cache_file_path util * url: tweak sig of fs_get_cache_file

Revert "[SYCL] Update rpc-server.cpp to include SYCL backend (ggergan…

fe1e391

…ov#7682)" (ggerganov#7808) This reverts commit 9422c5e.

convert-hf : set the model name based on cli arg, if present (ggergan…

2decf57

…ov#7693) `--model-name` argument was added a while ago but did not do anything. This commit fixes this issue and enables this feature.

CUDA: revise q8_1 data layout for mul_mat_q (ggerganov#7824)

42b53d1

server: do not remove whitespace at the start of a completion chunk (g…

3e2ee44

…gerganov#7830)

imatrix : handle partial entries (ggerganov#7833)

e95beeb

use the correct SYCL context for host USM allocations (ggerganov#7777)

af4ae50

Signed-off-by: Ben Ashbaugh <[email protected]>

JohannesGaessler and others added 16 commits June 10, 2024 11:45

CUDA: use tensor cores for MMQ (ggerganov#7676)

1f0dabd

* CUDA: int8 tensor cores for MMQ (legacy quants) * fix out-of-bounds writes * __builtin_assume -> GGML_CUDA_ASSUME * fix writeback returning too early

server : improve "prompt" handling (ggerganov#7847)

d9da0e4

examples : remove --instruct remnants (ggerganov#7846)

c28a839

ci : try win-2019 on server windows test (ggerganov#7854)

fd5ea0f

cmake : fix CMake requirement for CUDA (ggerganov#7821)

864a99e

json: document schema conversion in GBNF readme, align manual gramm…

396b18d

…ar examples & converters (ggerganov#7841) * json: fix char pattern in grammar converters * json: prevent number precision & whitespace runaways in example grammars * json: add doc to grammar readme

json: refine constraint for whitespace to avoid runaways yet allow pr…

b61eb96

…etty print (ggerganov#7866)

fix CUDA CI by using a windows-2019 image (ggerganov#7861)

c2ce6c4

* try to fix CUDA ci with --allow-unsupported-compiler * trigger when build.yml changes * another test * try exllama/bdashore3 method * install vs build tools before cuda toolkit * try win-2019

CUDA: int8 tensor cores for MMQ (q4_K, q5_K, q6_K) (ggerganov#7860)

bdcb8f4

tests : check the Python version (ggerganov#7872)

4bfe50f

ggml-ci

llama-bench: more compact markdown tables (ggerganov#7879)

148995e

github: move PR template to .github/ root (ggerganov#7868)

6fe42d0

fix broken link in pr template (ggerganov#7880) [no ci]

14f8352

* fix broken link in pr template * Update pull_request_template.md [no ci] --------- Co-authored-by: Brian <[email protected]>

vulkan: select only one device for single gpu with multiple drivers (g…

73bac2b

…gerganov#7582)

Fix a typo and add Fedora 40 pacakge to install for Vulkan (ggerganov…

f2b5764

…#7794) [no ci] Fix "appropiate" to "appropriate" and add Fedora 40 packages to install to compile with Vulkan support

l3utterfly merged commit 9e363ad into layla-build Jun 12, 2024
5 of 9 checks passed

github-actions bot added SYCL Nvidia GPU Vulkan testing build examples devops python server ggml Kompute labels Jun 12, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

merged from upstream #23

merged from upstream #23

l3utterfly commented Jun 12, 2024 •

edited

Loading

merged from upstream #23

merged from upstream #23

Conversation

l3utterfly commented Jun 12, 2024 • edited Loading

l3utterfly commented Jun 12, 2024 •

edited

Loading