diff --git a/README.md b/README.md
index 74648f4..08b30d1 100644
--- a/README.md
+++ b/README.md
@@ -1,11 +1,6 @@
 colab一键训练脚本
 https://colab.research.google.com/drive/1MfP3vt9YrOkjg70dKPPFvB174PBnaPSB?usp=sharing
 
-unslo本地安装包下载
-百度网盘：https://pan.baidu.com/s/17XehOXC2LMbnLnVebV79lQ?pwd=rycn
-谷歌网盘：https://drive.google.com/drive/folders/1BhhBWfOSqCqhmpi8M_dq-nn0eMEZxR-I?usp=sharing
-训练的模型下载：https://drive.google.com/file/d/1REtJuRGg2dzRLZ8HyEqfJn8oYuClht8P/view?usp=sharing
-
 相关项目
 unsloth：https://github.com/unslothai/unsloth
 gpt4all：https://gpt4all.io/
@@ -18,15 +13,6 @@ Windows本地部署条件
 3、依赖软件：CUDA12.1+cuDNN8.9、Python11.9、Git、Visual Studio 2022、llvm(可选）
 4、HuggingFace账号，上传训练数据集
 
-Windows部署步骤
-一、下载安装包
-1、安装cuda12.1，配置cuDNN8.9
-2、安装Visual Studio 2022
-3、解压unsloth
-4、安装python11
-5、安装git
-6、设置llvm系统环境变量(可选）
-
 二、安装unsloth
 1、使用python11创建虚拟环境
 python311\python.exe -m venv venv
@@ -46,25 +32,43 @@ python -m bitsandbytes
 5、运行脚本
 test-unlora.py   测试微调之前推理
 fine-tuning.py   用数据集微调
-test-lora.py   测试微调之后推理
-save-16bit.py  合并保存模型16位
-save-gguf-4bit.py  4位量化gguf格式
 若本地运行fine-tuning.py出错，出现gcc.exe无法编译，可以尝试下载llvm-windows-x64.zip解压，在系统环境变量path路径里添加llvm下的bin路径
 三、4位量化需要安装llama.cpp，步骤如下：
-1、git clone https://github.com/ggerganov/llama.cpp
-2、按官方文档编译
-mkdir build
-cd build
-cmake .. -DLLAMA_CUBLAS=ON
-3、设置Visual Studio 2022中cmake路径到系统环境变量path里
-C:\Program Files\Microsoft Visual Studio\2022\Professional\Common7\IDE\CommonExtensions\Microsoft\CMake\CMake\bin
-C:\Program Files\Microsoft Visual Studio\2022\Professional
-4、编译llama.cpp
-cmake --build . --config Release
-5、如果上面这句编译命令无法执行，需要做以下操作：
-复制这个路径下的
-C:\Program Files\NVIDIA GPU Computing Toolkit\CUDA\v12.1\extras\visual_studio_integration\MSBuildExtensions
-4个文件，粘贴到以下目录里
-C:\Program Files (x86)\Microsoft Visual Studio\2022\BuildTools\MSBuild\Microsoft\VC\v170\BuildCustomizations
-6、编译好以后，把llama.cpp\build\bing\release目录下的所有文件复制到llama.cpp目录下
-7、重新运行fine-tuning.py微调保存为
\ No newline at end of file
+git clone https://github.com/ggerganov/llama.cpp
+cd llama.cpp
+make GGML_CUDA=1（没有gpu的linux使用make）
+
+# obtain the official LLaMA model weights and place them in ./models
+ls ./models
+llama-2-7b tokenizer_checklist.chk tokenizer.model
+# [Optional] for models using BPE tokenizers
+ls ./models
+<folder containing weights and tokenizer json> vocab.json
+# [Optional] for PyTorch .bin models like Mistral-7B
+ls ./models
+<folder containing weights and tokenizer json>
+
+# install Python dependencies
+python3 -m pip install -r requirements.txt
+
+# 转换模型为ggml FP16格式（cd ./llama.cpp)
+python convert-hf-to-gguf.py ../outputs    --outfile ./mymodel/namemv my	.gguf --outtype f16
+
+# 四位量化 (using Q4_K_M method)（cd  ./llama.cpp)
+./llama-quantize ./mymodel/ggml-model-f16.gguf ./mymodel/ggml-model-Q4_K_M.gguf Q4_K_M
+
+# update the gguf filetype to current version if older version is now unsupported
+./llama-quantize ./models/mymodel/ggml-model-Q4_K_M.gguf ./models/mymodel/ggml-model-Q4_K_M-v2.gguf COPY
+
+#直接使用模型
+./llama-cli -m ./models/mymodel/ggml-model-Q4_K_M.gguf -n 128
+
+交互模式：
+# default arguments using a 7B model
+./examples/chat.sh
+
+# advanced chat with a 13B model
+./examples/chat-13B.sh
+
+# custom arguments using a 13B model
+./llama-cli -m ./models/13B/ggml-model-q4_0.gguf -n 256 --repeat_penalty 1.0 --color -i -r "User:" -f prompts/chat-with-bob.txt