[compute] Porting Q4_0 and Q8_0 weight FullyConnected compute library #13909

hseok-oh · 2024-09-03T02:25:22Z

What

Porting ggml compuation library: Q4_0 FullyConnected (mulmat) into compute/cker or better place

Why

Prepare to support block quantized FullyConnected layer in CPU backend

The text was updated successfully, but these errors were encountered:

glistening · 2024-09-04T05:32:18Z

We may start from single thread implementation. However, for near future:

How many threads fully connected kernel for block quantization will use?
I don't remember vividly how it works in other kernels. (e.g. ruy, eigen, ...).
As I remember, the number of threads is not controlled by core, but each kernel uses threads as it like.

_{🌳 Config.lst}

CONFIG(RUY_THREADS             , int          , "-1")
CONFIG(XNNPACK_THREADS         , int          , "-1")

Also, I don't remember what is the relationship between previous environment variable THREAD about 5 years ago.

hseok-oh · 2024-09-04T05:57:16Z

I think we can merge this two config to ONERT_THREADS

Usage example:

ONE/tools/stab/backend_scheduler.py

Lines 147 to 149 in c66e299

    
           cmd += [f"BACKENDS={';'.join(backend_list)}"] 
        
           cmd += [f"RUY_THREADS={self.num_threads}"] 
        
           cmd += [f"XNNPACK_THREADS={self.num_threads}"]

I'll make PR to merge config

hseok-oh added this to [ONE] onert - LLM support Aug 23, 2024

hseok-oh converted this from a draft issue Sep 3, 2024

hseok-oh added this to the ONERT LLM Milestone 1 milestone Sep 3, 2024

hseok-oh added the area/onert ONE runtime label Sep 3, 2024

hseok-oh moved this from Ready to Start to In Progress in [ONE] onert - LLM support Sep 4, 2024

hseok-oh mentioned this issue Sep 4, 2024

[onert] Introduce NUM_THREADS config #13929

Merged

hseok-oh assigned glistening Sep 5, 2024

This was referenced Sep 12, 2024

[onert/3rdparty] Introduce ggml #13995

Merged

[onert] Initialize ggml context in CPU ExternalContext #14011

Merged

hseok-oh mentioned this issue Oct 8, 2024

[onert] Support Q4_0 & Q8_0 FC weight #14182

Merged

hseok-oh closed this as completed Oct 11, 2024

github-project-automation bot moved this from In Progress to Done in [ONE] onert - LLM support Oct 11, 2024

glistening assigned hseok-oh and unassigned glistening Nov 14, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[compute] Porting Q4_0 and Q8_0 weight FullyConnected compute library #13909

[compute] Porting Q4_0 and Q8_0 weight FullyConnected compute library #13909

hseok-oh commented Sep 3, 2024

glistening commented Sep 4, 2024 •

edited

Loading

hseok-oh commented Sep 4, 2024

[compute] Porting Q4_0 and Q8_0 weight FullyConnected compute library #13909

[compute] Porting Q4_0 and Q8_0 weight FullyConnected compute library #13909

Comments

hseok-oh commented Sep 3, 2024

What

Why

glistening commented Sep 4, 2024 • edited Loading

hseok-oh commented Sep 4, 2024

glistening commented Sep 4, 2024 •

edited

Loading