FlexLLM server demo #1510

goliaro · 2024-09-27T15:37:40Z

Description of changes:

TODOs:

Streamlit app
Chat protocol
Add LLAMA-3.1 and Llama3.2 support & check alignment (in particular, RoPe)
Support LoRA in attention projections (Attention projections (QKV, O) disaggregation #1436)
Be able to add LoRA layers at runtime, and deallocate memory when done
Be able to set parameters (e.g. max sequence length) at runtime for each request
Be able to set generation configs (top_p, temperature, etc) at runtime for each request

Related Issues:

Linked Issues:

Issue #

Issues closed by this PR:

Closes #

This change is

goliaro added 30 commits September 25, 2024 02:19

init

470a40f

update

7f23188

update

a2d2ac0

update

f8c90e6

update

2906e57

add max new tokens parameter

d62d9be

backup

85797e0

update

bb08d69

backup

62275c2

lora configs serialize / deserialize into single file

88d60ca

backup

e453237

.

5c8c448

.

21f8cb9

.

c5e813b

.

aa57f98

frontend

53c408c

bug fix

1691100

fixes

7ff96d7

Merge branch 'inference' into streamlit

7eb953a

fix

92c2c37

updates

fbdf74e

Merge branch 'inference' into streamlit

754abd7

fix

10fb496

fix

79dc3a2

fix

4219806

Merge branch 'inference' into streamlit

61f79ad

small fix

f542fbb

fix

139b643

fix reset input grad for non-activated loras

b56ebd3

fix

3632754

goliaro added 11 commits November 8, 2024 16:13

Merge branch 'inference' into streamlit

39e47a5

update

fca3d95

demo fixes & readme

9a1eae5

load weights in parallel

c71c6b3

cleanup

d54fcf2

cleanup

f748515

load weights faster in inference test

266a1ed

fix

d771f6b

cleanup and fixes

fc626c6

linting

ab5aa4b

fix

7d99cf7

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

FlexLLM server demo #1510

FlexLLM server demo #1510

goliaro commented Sep 27, 2024 •

edited

Loading

FlexLLM server demo #1510

Are you sure you want to change the base?

FlexLLM server demo #1510

Conversation

goliaro commented Sep 27, 2024 • edited Loading

goliaro commented Sep 27, 2024 •

edited

Loading