enable gpt2, falcon has core dump error in PagedAttention.single_quer… #979

jiqing-feng · 2024-10-30T08:58:17Z

Falcon has bus error core dump bug in PagedAttention.single_query_cached_kv_attention.

(torch_new) [jiqingfe@sprocean workloads]$ ./run.sh -t text-generation -m tiiuae/falcon-7b-instruct --model_dtype bfloat16 --input_tokens 32 --output_tokens 32 --num_beams 1 --batch_size 1 --warm_up_steps 1 --run_steps 1 --optimum_intel True
OMP: Warning #42: KMP_BLOCKTIME: "INF" is an invalid value; ignored.
OMP: Warning #39: KMP_BLOCKTIME value "INF" is invalid.
OMP: Info #104: KMP_BLOCKTIME value "200" will be used.
INFO:root:args = Namespace(model_id='tiiuae/falcon-7b-instruct', autocast_dtype='float32', ipex_optimize=False, jit=False, torch_compile=False, model_dtype='bfloat16', quant_type='None', backend='inductor', device='cpu', batch_size=1, num_beams=1, input_tokens=32, output_tokens=32, ipex_optimize_transformers=False, warm_up_steps=1, run_steps=1, optimum_intel=True)
Loading checkpoint shards: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 2/2 [00:00<00:00,  5.41it/s]
INFO:root:input tokens length is 34
Setting `pad_token_id` to `eos_token_id`:None for open-end generation.
./run.sh: line 208: 2544181 Bus error               (core dumped) numactl -C '0-'${CORES} --membind 0 python $task_name/run_$task_name.py --model_id $model_id --model_dtype $model_dtype --quant_type $quant_type --jit $jit --ipex_optimize $ipex_optimize --autocast_dtype $autocast_dtype --torch_compile $torch_compile --backend $backend --device $device --batch_size $batch_size --num_beams $num_beams --input_tokens $input_tokens --output_tokens $output_tokens --ipex_optimize_transformers $ipex_optimize_transformers --warm_up_steps $warm_up_steps --run_steps $run_steps --optimum_intel $optimum_intel

…y_cached_kv_attention

jiqing-feng added 3 commits October 30, 2024 12:48

enable gpt2, falcon has core dump error in PagedAttention.single_quer…

e001be9

…y_cached_kv_attention

enable new_decoder_arch falcon

2b7484c

only keep 1 config

6380cfc

jiqing-feng marked this pull request as ready for review November 5, 2024 03:02

sywangyi merged commit 74eec8b into huggingface:paged_attn Nov 5, 2024
1 check passed

rm autocast

7ab91c5

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

enable gpt2, falcon has core dump error in PagedAttention.single_quer… #979

enable gpt2, falcon has core dump error in PagedAttention.single_quer… #979

jiqing-feng commented Oct 30, 2024

enable gpt2, falcon has core dump error in PagedAttention.single_quer… #979

enable gpt2, falcon has core dump error in PagedAttention.single_quer… #979

Conversation

jiqing-feng commented Oct 30, 2024