Skip to content

Commit

Permalink
Merge pull request #479 from TylunasLi/doc
Browse files Browse the repository at this point in the history
C++支持直接加载DeepSeek V2 Lite系列的HF模型
  • Loading branch information
ztxz16 authored Jul 21, 2024
2 parents c1ad08e + 67eba11 commit 506fccf
Show file tree
Hide file tree
Showing 12 changed files with 1,186 additions and 1,111 deletions.
24 changes: 13 additions & 11 deletions docs/models.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,20 +68,24 @@
| Qwen/Qwen2-7B-Instruct | [](#其它模型) | [](#qwen模型导出) ||
| Qwen/Qwen2-72B-Instruct | | [](#qwen模型导出) ||

> 注3: 需要更新,检查 tokenizer_config.json 是否为最新版本
> 注3: 需要更新,检查 `tokenizer_config.json` 是否为最新版本
### DeepSeek系列

| 模型 | 加载后转换 | 离线转换 | 直接读取 |
|-------------------------------------------: |------------|------------|------------|
| deepseek-ai/Deepseek-Coder-1.3B-Instruct | [](llama_cookbook.md#deepseek-coder) | [](llama_cookbook.md#deepseek-coder) | ❌<sup>4</sup> |
| deepseek-ai/Deepseek-Coder-6.7B-Instruct | [](llama_cookbook.md#deepseek-coder) | [](llama_cookbook.md#deepseek-coder) | ❌<sup>4</sup> |
| deepseek-ai/Deepseek-Coder-7B-Instruct v1.5 | [](llama_cookbook.md#deepseek-coder) | [](llama_cookbook.md#deepseek-coder) | ❌<sup>4</sup> |
| deepseek-ai/deepseek-coder-33b-instruct | [](llama_cookbook.md#deepseek-coder) | [](llama_cookbook.md#deepseek-coder) | ❌<sup>4</sup> |
| deepseek-ai/DeepSeek-V2-Chat ||| √<sup>4</sup> |
| deepseek-ai/DeepSeek-V2-Lite-Chat ||| √<sup>4</sup> |
| deepseek-ai/DeepSeek-Coder-V2-Instruct ||| √<sup>4</sup> |
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct ||| √<sup>4</sup> |
| deepseek-ai/Deepseek-Coder-1.3B-Instruct | [](llama_cookbook.md#deepseek-coder) | [](llama_cookbook.md#deepseek-coder) | ✔<sup>4</sup><sup>5</sup> |
| deepseek-ai/Deepseek-Coder-6.7B-Instruct | [](llama_cookbook.md#deepseek-coder) | [](llama_cookbook.md#deepseek-coder) | ✔<sup>4</sup><sup>5</sup> |
| deepseek-ai/Deepseek-Coder-7B-Instruct v1.5 | [](llama_cookbook.md#deepseek-coder) | [](llama_cookbook.md#deepseek-coder) | ✔<sup>4</sup> |
| deepseek-ai/deepseek-coder-33b-instruct | [](llama_cookbook.md#deepseek-coder) | [](llama_cookbook.md#deepseek-coder) | ✔<sup>4</sup> |
| deepseek-ai/DeepSeek-V2-Chat ||||
| deepseek-ai/DeepSeek-V2-Lite-Chat ||||
| deepseek-ai/DeepSeek-Coder-V2-Instruct ||||
| deepseek-ai/DeepSeek-Coder-V2-Lite-Instruct ||||

> 注4: Python ftllm用AutoTokenizer而不使用Fastllm Tokenizer可以实现加载,但是C++程序尚不支持加载该模型的Tokenizer。
> 注5: C++端仅支持最早的几个 `tokenizer_config.json` 版本

### LLaMA类模型

Expand All @@ -107,8 +111,6 @@
| meta-llama/Meta-Llama-3-8B-Instruct | | [](tools/scripts/llama3_to_flm.py) ||
| meta-llama/Meta-Llama-3-70B-Instruct | | [](tools/scripts/llama3_to_flm.py) ||

> 注4: Python ftllm用AutoTokenizer而不使用Fastllm Tokenizer可以实现加载,但是C++程序尚不支持加载该模型的Tokenizer。
### 其它模型

| 模型 | 加载后转换 | 离线转换 | 直接读取 |
Expand Down
4 changes: 4 additions & 0 deletions example/Win32Demo/Win32Demo.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -104,6 +104,7 @@
<ConformanceMode>true</ConformanceMode>
<RuntimeLibrary>MultiThreadedDebug</RuntimeLibrary>
<AdditionalOptions>/arch:AVX /source-charset:utf-8 %(AdditionalOptions)</AdditionalOptions>
<AdditionalIncludeDirectories>$(ProjectDir)..\..\third_party\json11;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
Expand All @@ -122,6 +123,7 @@
<ConformanceMode>true</ConformanceMode>
<RuntimeLibrary>MultiThreadedDebug</RuntimeLibrary>
<AdditionalOptions>/arch:AVX /source-charset:utf-8 %(AdditionalOptions)</AdditionalOptions>
<AdditionalIncludeDirectories>$(ProjectDir)..\..\third_party\json11;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
Expand All @@ -141,6 +143,7 @@
<PreprocessorDefinitions>WIN32;NDEBUG;_CONSOLE;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
<RuntimeLibrary>MultiThreaded</RuntimeLibrary>
<AdditionalIncludeDirectories>$(ProjectDir)..\..\third_party\json11;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
Expand All @@ -162,6 +165,7 @@
<PreprocessorDefinitions>NDEBUG;_CONSOLE;_CRT_SECURE_NO_WARNINGS;__AVX__;NOMINMAX;%(PreprocessorDefinitions)</PreprocessorDefinitions>
<ConformanceMode>true</ConformanceMode>
<RuntimeLibrary>MultiThreaded</RuntimeLibrary>
<AdditionalIncludeDirectories>$(ProjectDir)..\..\third_party\json11;%(AdditionalIncludeDirectories)</AdditionalIncludeDirectories>
</ClCompile>
<Link>
<SubSystem>Console</SubSystem>
Expand Down
3 changes: 3 additions & 0 deletions example/Win32Demo/fastllm-gpu.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -237,6 +237,9 @@
<ClCompile Include="..\..\src\models\deepseekv2.cpp" />
<ClCompile Include="..\..\src\models\glm.cpp" />
<ClCompile Include="..\..\src\models\graphllm.cpp" />
<ClCompile Include="..\..\src\models\graph\fastllmjson.cpp" />
<ClCompile Include="..\..\src\models\graph\qwen2.cpp" />
<ClCompile Include="..\..\src\models\graph\telechat.cpp" />
<ClCompile Include="..\..\src\models\internlm2.cpp" />
<ClCompile Include="..\..\src\models\llama.cpp" />
<ClCompile Include="..\..\src\models\minicpm.cpp" />
Expand Down
12 changes: 12 additions & 0 deletions example/Win32Demo/fastllm-gpu.vcxproj.filters
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
<Filter Include="源文件\models">
<UniqueIdentifier>{ad23d9cc-65a3-4c41-b87f-5826ce7a6dca}</UniqueIdentifier>
</Filter>
<Filter Include="源文件\models\graph">
<UniqueIdentifier>{8ea8c4cc-b75e-4c3a-a4e9-1e4900de6380}</UniqueIdentifier>
</Filter>
<Filter Include="头文件\utils">
<UniqueIdentifier>{1da1956b-b0ea-4a15-9b4a-11a431319b7d}</UniqueIdentifier>
</Filter>
Expand Down Expand Up @@ -185,6 +188,15 @@
<ClCompile Include="..\..\src\models\qwen.cpp">
<Filter>源文件\models</Filter>
</ClCompile>
<ClCompile Include="..\..\src\models\graph\fastllmjson.cpp">
<Filter>源文件\models\graph</Filter>
</ClCompile>
<ClCompile Include="..\..\src\models\graph\qwen2.cpp">
<Filter>源文件\models\graph</Filter>
</ClCompile>
<ClCompile Include="..\..\src\models\graph\telechat.cpp">
<Filter>源文件\models\graph</Filter>
</ClCompile>
<ClCompile Include="..\..\src\devices\cpu\cpudevice.cpp">
<Filter>源文件\devices\cpu</Filter>
</ClCompile>
Expand Down
3 changes: 3 additions & 0 deletions example/Win32Demo/fastllm.vcxproj
Original file line number Diff line number Diff line change
Expand Up @@ -211,6 +211,9 @@
<ClCompile Include="..\..\src\models\deepseekv2.cpp" />
<ClCompile Include="..\..\src\models\glm.cpp" />
<ClCompile Include="..\..\src\models\graphllm.cpp" />
<ClCompile Include="..\..\src\models\graph\fastllmjson.cpp" />
<ClCompile Include="..\..\src\models\graph\qwen2.cpp" />
<ClCompile Include="..\..\src\models\graph\telechat.cpp" />
<ClCompile Include="..\..\src\models\internlm2.cpp" />
<ClCompile Include="..\..\src\models\llama.cpp" />
<ClCompile Include="..\..\src\models\minicpm.cpp" />
Expand Down
12 changes: 12 additions & 0 deletions example/Win32Demo/fastllm.vcxproj.filters
Original file line number Diff line number Diff line change
Expand Up @@ -19,6 +19,9 @@
<Filter Include="源文件\models">
<UniqueIdentifier>{ad23d9cc-65a3-4c41-b87f-5826ce7a6dca}</UniqueIdentifier>
</Filter>
<Filter Include="源文件\models\graph">
<UniqueIdentifier>{8ea8c4cc-b75e-4c3a-a4e9-1e4900de6380}</UniqueIdentifier>
</Filter>
<Filter Include="头文件\utils">
<UniqueIdentifier>{1da1956b-b0ea-4a15-9b4a-11a431319b7d}</UniqueIdentifier>
</Filter>
Expand Down Expand Up @@ -179,6 +182,15 @@
<ClCompile Include="..\..\src\models\qwen.cpp">
<Filter>源文件\models</Filter>
</ClCompile>
<ClCompile Include="..\..\src\models\graph\fastllmjson.cpp">
<Filter>源文件\models\graph</Filter>
</ClCompile>
<ClCompile Include="..\..\src\models\graph\qwen2.cpp">
<Filter>源文件\models\graph</Filter>
</ClCompile>
<ClCompile Include="..\..\src\models\graph\telechat.cpp">
<Filter>源文件\models\graph</Filter>
</ClCompile>
<ClCompile Include="..\..\src\devices\cpu\cpudevice.cpp">
<Filter>源文件\devices\cpu</Filter>
</ClCompile>
Expand Down
4 changes: 2 additions & 2 deletions include/devices/cuda/fastllm-cuda.cuh
Original file line number Diff line number Diff line change
@@ -1,12 +1,12 @@
#include "fastllm.h"

std::vector <long long> FastllmCudaGetFreeSizes();

#ifdef __cplusplus
extern "C" {
#endif
void FastllmInitCublas(void);

std::vector <long long> FastllmCudaGetFreeSizes();

void FastllmCudaMallocBigBuffer(size_t size);
void FastllmCudaClearBigBuffer();
void *FastllmCudaMalloc(size_t size);
Expand Down
11 changes: 8 additions & 3 deletions include/template.h
Original file line number Diff line number Diff line change
Expand Up @@ -49,9 +49,9 @@ namespace fastllm {
// 词法分析后的Token
struct JinjaToken {
enum JinjaToKenType {
JinjaTokenID = 0, JinjaTokenNUM, JinjaTokenSTRING, JinjaTokenDOT,
JinjaTokenID = 0, JinjaTokenBOOL, JinjaTokenNUM, JinjaTokenSTRING, JinjaTokenDOT,
JinjaTokenLMB, JinjaTokenRMB, JinjaTokenLSB, JinjaTokenRSB,
JinjaTokenSet, JinjaTokenFor, JinjaTokenEndFor, JinjaTokenIf, JinjaTokenElse, JinjaTokenEndif,
JinjaTokenSet, JinjaTokenFor, JinjaTokenEndFor, JinjaTokenIf, JinjaTokenElse, JinjaTokenElseIf, JinjaTokenEndif,
JinjaTokenIn,
JinjaTokenAssign, JinjaTokenNotEqual, JinjaTokenEqual, JinjaTokenAdd, JinjaTokenSub, JinjaTokenMul, JinjaTokenDiv,
JinjaTokenNot, JinjaTokenAnd, JinjaTokenOr,
Expand Down Expand Up @@ -86,19 +86,24 @@ namespace fastllm {
{"for", JinjaToken::JinjaToKenType::JinjaTokenFor},
{"endfor", JinjaToken::JinjaToKenType::JinjaTokenEndFor},
{"if", JinjaToken::JinjaToKenType::JinjaTokenIf},
{"elif", JinjaToken::JinjaToKenType::JinjaTokenElseIf},
{"else", JinjaToken::JinjaToKenType::JinjaTokenElse},
{"endif", JinjaToken::JinjaToKenType::JinjaTokenEndif},
{"set", JinjaToken::JinjaToKenType::JinjaTokenSet},
{"in", JinjaToken::JinjaToKenType::JinjaTokenIn},
{"is", JinjaToken::JinjaToKenType::JinjaTokenIn},
{"true", JinjaToken::JinjaToKenType::JinjaTokenBOOL},
{"false", JinjaToken::JinjaToKenType::JinjaTokenBOOL},
{"and", JinjaToken::JinjaToKenType::JinjaTokenAnd},
{"or", JinjaToken::JinjaToKenType::JinjaTokenOr},
{"not", JinjaToken::JinjaToKenType::JinjaTokenNot}
};

// 一个Jinja块
struct JinjaBlock {
enum JinjaBlockType {
JinjaBlockOriginal = 0, JinjaBlockEmpty, JinjaBlockVar, JinjaBlockFor,
JinjaBlockEndFor, JinjaBlockIf, JinjaBlockElse, JinjaBlockEndIf,
JinjaBlockEndFor, JinjaBlockIf, JinjaBlockElseIf, JinjaBlockElse, JinjaBlockEndIf,
JinjaBlockSet
};

Expand Down
16 changes: 10 additions & 6 deletions src/devices/cuda/fastllm-cuda.cu
Original file line number Diff line number Diff line change
Expand Up @@ -119,7 +119,7 @@ std::vector <long long> FastllmCudaGetFreeSizes() {
}
std::vector <long long> ret;

// 遍历所有设备
// 遍历所有设备
for (int i = 0; i < deviceCount; ++i) {
cudaDeviceProp prop;
error = cudaGetDeviceProperties(&prop, i);
Expand All @@ -128,7 +128,7 @@ std::vector <long long> FastllmCudaGetFreeSizes() {
// printf(" Compute capability: %d.%d\n", prop.major, prop.minor);
// printf(" Total global memory: %zu bytes\n", prop.totalGlobalMem);

// 获取当前设备的显存使用情况
// 获取当前设备的显存使用情况
size_t free = 0, total = 0;
cudaMemGetInfo(&free, &total);
ret.push_back(free);
Expand Down Expand Up @@ -447,8 +447,8 @@ __global__ void FastllmSiluKernel(float* a, float *b, int len) {
__global__ void FastllmSiluKernel(half* a, half *b, int len) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx < len) {
float x = (float)a[idx];
b[idx] = (half)(x / (1.0 + expf(-x)));
float x = __half2float(a[idx]);
b[idx] = __float2half(x / (1.0 + expf(-x)));
}
}

Expand Down Expand Up @@ -531,7 +531,11 @@ __global__ void FastllmMulToKernel(float* a, float *b, float alpha, int len) {
__global__ void FastllmMulToKernel(half* a, half *b, float alpha, int len) {
int idx = threadIdx.x + blockIdx.x * blockDim.x;
if (idx < len) {
#ifdef CUDA_NO_TENSOR_CORE
a[idx] = __float2half(__half2float(b[idx]) * alpha * __half2float(a[idx]));
#else
a[idx] *= (half)((float)b[idx] * alpha);
#endif
}
}

Expand Down Expand Up @@ -2904,14 +2908,14 @@ void FastllmCudaClearBigBuffer() {
auto &bigBuffers = it.second;
std::vector <CudaMemoryBuffer> temp;
long long littleMemSum = 0;
long long littleMemSumLimit = 300 * 1024 * 1024; // 留一小部分复用
long long littleMemSumLimit = 300 * 1024 * 1024; // 留一小部分复用
std::vector <std::pair <std::size_t, int > > v;
for (int i = 0; i < bigBuffers.size(); i++) {
if (!bigBuffers[i].busy) {
v.push_back(std::make_pair(bigBuffers[i].size, i));
}
}
sort(v.begin(), v.end());
std::sort(v.begin(), v.end());
std::set <int> littleMemIds;
for (int i = 0; i < v.size(); i++) {
littleMemSum += v[i].first;
Expand Down
2 changes: 1 addition & 1 deletion src/model.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -449,7 +449,7 @@ namespace fastllm {
if (!model->weight.tokenizer.chatTemplate.empty() && model->weight.dicts.find("chat_template") == model->weight.dicts.end())
model->weight.AddDict("chat_template", model->weight.tokenizer.chatTemplate);
std::string tokenizerClass = tokenizerConfig["tokenizer_class"].string_value();
if (tokenizerClass == "PreTrainedTokenizerFast"
if (tokenizerClass == "PreTrainedTokenizerFast" || tokenizerClass == "LlamaTokenizerFast"
|| tokenizerClass == "Qwen2Tokenizer"
|| tokenizerClass == "BloomTokenizer") {
// PreTrainedTokenizerFast
Expand Down
Loading

0 comments on commit 506fccf

Please sign in to comment.