Commit
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.
PEFT support (inference/finetuning) (#1153)
* . * . * Update the default cublas behavior when CUDA_VERSION is not specified * fix bugs in IncMHA peft_bwd kernel * uncomment softmaxbackward * add layernorm to align test * add peft test scripts * fix import * fix * add code to convert peft models * add script to download peft for c++, fix bug * fix * add script to fine-tune models * implement loading lora configs/weights from file * remove peft_bwd assertion failure in embedding * fix download script * add peft dependencies in dockerfile * fix softmax backward * fix bc print indentation * Temporarily Revert "Update the default cublas behavior when CUDA_VERSION is not specified" This reverts commit 4ee710a. * Fix cublas default (#1220) * Fix Legion prebuild workflow (2) (#1208) * fix * fix * fix * fix * Fix Legion prebuild workflow (3) (#1210) * fix hip error * use CUBLAS_COMPUTE_FAST_16F for full-precision gemm --------- Co-authored-by: Zhihao Jia <[email protected]> * fix bugs, work on align opt-lora * update scripts * add code to output peft tensors in hf * update, fixes * linting * fix printing of tensors for numpy * update save_inference_tensors_to_file * linting * update * fix issue with save_inference_tensors_to_file * fix layer names for save_inference_tensors_to_file * fix peft * fix bwd bugs * linting * fixes * fix * fix * fix * add bc fields for peft training * linting * fix * remove ptr check * fix * implement save_operators for bwd * fix bug * implement save tensors for bwd * . * bug fix * fix * align linear * fix * bwd kernel updates * undo use of CUBLAS_COMPUTE_32F_FAST_16F for now * only send dataset entry once * update peft test scripts * loss * . * update generate/request api to take both inference and fine-tuning prompts * linting * alignment fixes in lora & linear layer * alignment fix * diagonal * fix * alignment fix ssm * sigmoid-silu-multi now fully aligned * rms norm kernel updates * fix * in-place residual rms * bug fix and linting * align backward of o_proj, attn_heads, qk_prods_softmax, and v_proj with huggingface * cleanup * finished all alignment fixes in attention backward kernel * fix * Update inc_multihead_self_attention.cu * Update inc_multihead_self_attention.cu * use grad to store peft in/output (#1241) * use grad to store peft in/output * format * . * format * enable peft request * several hacks for performance measurement; some of the changes should be reverted * Update sigmoid_silu_multi.cu * RoPE backward * PEFT bug fixes and alignment (#1269) * Revert "several hacks for performance measurement; some of the changes should be reverted" This reverts commit b9c3926. * backup * backup * updates * update * backup * backup * backup * fix * cleanup * linting * Fuse bias + relu in OPT (#1271) * fuse bias and relu in opt * fix * fix * fix * fix * Peft alignment & debugging tools (#1288) * Revert "several hacks for performance measurement; some of the changes should be reverted" This reverts commit b9c3926. * backup * backup * updates * update * backup * backup * backup * fix * cleanup * fix * fix * fix * update * simplify tensor names * fix * fixes and updates * fixes * fix * cleanup * . * restore softmax * cleanup * update alignment scripts * newline * fix legion aliasing error * fix warnings * fix * fix pipeline parallelism * fix tp issue in combine op * fix lora weight loading with tensor parallelism * fixes, implement Combine::peft_bwd_task * fix * replicate peft bwd * fixes * fix * fix combine and fwd-bwd pass dependencies * fix replicate bwd * fix * let user control amount of peft memory * only run peft_bwd if peft is enabled * fix rms norm inference region reqs * fix in-place fusion (part 1) * fix inplace fusion (part 2) * fix * disable automatic inplace rms norm for now * fix inf fusion inplace * fix rest input grads for peft without inplace residuals * fix * fix * fix residual rms * fix * fix * enable inf debugging in fusion bwd * hack to silence warning in fused bwd * fix * fix * fix build * fix * fix * add draft peft test * Peft python interface (#1306) * update script * less model renaming * fix * fix * fix * backup * . * update * . * fixes * fix * fix build * fix * fix * fix issues for downloading peft model * solved issues for download peft model * added printouts for debugging * fix * fix seg fault * add test, separate peft script in cpp * fix * fixes * fix * update peft python interface * update * update * update * updates * fix * fixes * fix * fixes --------- Co-authored-by: april-yyt <[email protected]> * fix * update * fix * fix to support prompts larger than max tokens per batch * fixes to support benchmarking of finetuning throughput * many upgrades and updates related to finetuning * add ttft statistics * add warmup phase * add benchmarking code * Add scripts for evaluation with Microsoft Azure trace (#1363) * Add scripts for evaluation * Add absolute request rate value * Fix script for target arrival rate * Fix cpp req rate benchmark * update to use new dataset * Fix infinite loop * update * add data --------- Co-authored-by: Remi Delacourt <[email protected]> Co-authored-by: Gabriele Oliaro <[email protected]> * fix * fix * add peft tests to ci * shellcheck * fix * fix python requirements * fix * fix * update ci test * update alignment doc * fix cross entropy loss bug * update alignment test * update test * add llama peft alignment test to ci * Fix values for unused params in incr_decoding * Add PEFTModelID NO_ID singleton instead of None * Fix PEFTModelID::NO_ID reference * reduce logging * fix * fix * Add peft demo * Add readme for demo * fix alignment issue * Peft optimizer (#1290) * add optimizer config, only allocate weights for training * sgd 1 * sgd 2 * update * fix * linting * . * . * fix * fix allreduce bug * update * update * add optimizer hook in hf * update * update script * . * fix * fwd * bwd * start grads * fix gradient misalignment! * update * Add support for llama3 * various fixes --------- Co-authored-by: Remi Delacourt <[email protected]> * Optimizers python interface (#1441) * python interface for optimizer * update lora linear config to support python interface * update python interface * finished lora python interface * fix * fix * update * update * more fixes * fix * initialize lora weights where needed * Add notebook * Update demo to use dataset * Fix' * Save weights after end of finetuning (#1446) * support accumulation of gradients without update * add code to save peft weights * fix * save configs * cleanup * Fully use notebook for demo * Parameterize generation and finetuning configs * Comment out inference for now * fix bug in lora inference only mode * fix * Add finetuning or inference only flags * fix * fix * fix * PEFT model upload (#1450) * upload test * fix * Make demo_class.py executable * fix * add base_model_name_or_path * fix * fix * support llama-3 tokenizer * print output tokens when not benchmarking * Use Llama3 in demo_class * Use Llama3 in demo * fix data loading for llama-3 * Add download models to demo * return/print loss at each finetuning step * fix * Adjust demo parameters * Fix for finetuning * pass finetuning losses to python interface * Update demo * Fix upload * Refactor demo * rename demo_class to demo * fix * remove epoch from loss print * Finish demo * fix test * rocm fixes * more rocm fixes * fix rocm build * docker fix * fix inference test * fix workflow * fix makefile * fix peft test * fix all-reduce issue with lora for TP scenario * fix bwd lm head * fixes * more fixes * update * fix alignment up to input ln * finished aligning all backward (tp>1) * align all peft * fix * fix broken link * formatting * fix * update * Revert "update" This reverts commit 90b2c87. * update * fix hip build * fix gpu ci * fix gpu ci * update default gpu ci version to 12.0 * update ci to 12.0 * fix * fix * update * fix * fix * update * fix * add cleanup * downgrade to cuda=11.8 --------- Co-authored-by: Gabriele Oliaro <[email protected]> Co-authored-by: xinhaoc <[email protected]> Co-authored-by: Xinhao Cheng <[email protected]> Co-authored-by: april-yyt <[email protected]> Co-authored-by: Remi <[email protected]> Co-authored-by: Remi Delacourt <[email protected]> Co-authored-by: Rémi Delacourt <[email protected]>
- Loading branch information