Comment
Author: Admin | 2025-04-28
DescriptionCPU: Intel(R) Xeon(R) Silver 4210 CPU @ 2.20GHzGPU: 2x Tesla V100-PCIE-32GB命令: ktransformers --port 10002 --gguf_path /models/ --model_path /models/DeepSeek-R1/使用Docker版本时候,出现GLIBCXX_3.4.30 not found,通过使用Release中的avx512版本解决,但是随后出现CUDA Error。使用参数--optimize_config_path myrule/DeepSeek-V3-Chat-multi-gpu.yaml后并没有解决。按照README和 CUDA error: No kernel image is available for execution on the device #44 修改配置并重新编译(bash install.sh)也无解决。(修改了DeepSeek-V3-Chat和DeepSeek-V3-Chat-multi-gpu配置)myrule/DeepSeek-V3-Chat-multi-gpu.yaml的内容如下:- match: name: "^model.embed_tokens" replace: class: "default" kwargs: generate_device: "cpu" prefill_device: "cpu"- match: name: "^model\\.layers\\.(0|[1-9]|[12][0-9])\\." class: ktransformers.models.modeling_deepseek_v3.DeepseekV3RotaryEmbedding replace: class: ktransformers.operators.RoPE.YarnRotaryEmbeddingV3 kwargs: generate_device: "cuda:0" prefill_device: "cuda:0"- match: name: "^model\\.layers\\.([3456][0-9])\\." class: ktransformers.models.modeling_deepseek_v3.DeepseekV3RotaryEmbedding replace: class: ktransformers.operators.RoPE.YarnRotaryEmbeddingV3 kwargs: generate_device: "cuda:1" prefill_device: "cuda:1"- match: name: "^model\\.layers\\.(0|[1-9]|[12][0-9])\\.(?!self_attn\\.kv_b_proj).*$" # regular expression class: torch.nn.Linear # only match modules matching name and class simultaneously replace: class: ktransformers.operators.linear.KTransformersLinear # optimized Kernel on quantized data types kwargs: generate_device: "cuda:0" prefill_device: "cuda:0" generate_op: "KLinearTorch" prefill_op: "KLinearTorch"- match: name: "^model\\.layers\\.([3456][0-9])\\.(?!self_attn\\.kv_b_proj).*$" # regular expression class: torch.nn.Linear # only match modules matching name and class simultaneously replace: class: ktransformers.operators.linear.KTransformersLinear # optimized Kernel on quantized data types kwargs: generate_device: "cuda:1" prefill_device: "cuda:1" generate_op: "KLinearTorch" prefill_op: "KLinearTorch" - match: name: "^model\\.layers\\.(0|[1-9]|[12][0-9])\\.mlp$" class: ktransformers.models.modeling_deepseek_v3.DeepseekV3MoE replace: class: ktransformers.operators.experts.KDeepseekV3MoE # mlp module with custom forward function kwargs: generate_device: "cuda:0" prefill_device: "cuda:0"- match: name: "^model\\.layers\\.([3456][0-9])\\.mlp$" class: ktransformers.models.modeling_deepseek_v3.DeepseekV3MoE replace: class: ktransformers.operators.experts.KDeepseekV3MoE # mlp module with custom forward function kwargs: generate_device: "cuda:1" prefill_device: "cuda:1"- match: name: "^model\\.layers\\.(0|[1-9]|[12][0-9])\\.mlp\\.gate$" class: ktransformers.models.modeling_deepseek_v3.MoEGate replace: class: ktransformers.operators.gate.KMoEGate kwargs: generate_device: "cuda:0" prefill_device: "cuda:0"- match: name: "^model\\.layers\\.([3456][0-9])\\.mlp\\.gate$" class: ktransformers.models.modeling_deepseek_v3.MoEGate replace: class: ktransformers.operators.gate.KMoEGate # mlp module with custom forward function kwargs: generate_device: "cuda:1" prefill_device: "cuda:1"- match: name: "^model\\.layers\\.(0|[1-9]|[12][0-9])\\.mlp\\.experts$" replace: class: ktransformers.operators.experts.KTransformersExperts # custom MoE Kernel with expert paralleism kwargs: prefill_device: "cuda:0" prefill_op: "KExpertsTorch" generate_device: "cpu" generate_op: "KExpertsCPU" out_device: "cuda:0" recursive: False # don't recursively inject submodules of this module- match: name: "^model\\.layers\\.([3456][0-9])\\.mlp\\.experts$" replace: class: ktransformers.operators.experts.KTransformersExperts # custom MoE Kernel with expert paralleism kwargs: prefill_device: "cuda:1" prefill_op: "KExpertsTorch" generate_device: "cpu" generate_op: "KExpertsCPU" out_device: "cuda:1" recursive: False # don't recursively inject submodules of this module- match: name: "^model\\.layers\\.(0|[1-9]|[12][0-9])\\.self_attn$" replace: class: ktransformers.operators.attention.KDeepseekV2Attention # optimized MLA implementation kwargs: generate_device: "cuda:0" prefill_device: "cuda:0"- match: name: "^model\\.layers\\.([3456][0-9])\\.self_attn$" replace: class: ktransformers.operators.attention.KDeepseekV2Attention # optimized MLA implementation kwargs: generate_device: "cuda:1" prefill_device: "cuda:1"- match: name: "^model$" replace: class: "ktransformers.operators.models.KDeepseekV2Model" kwargs: per_layer_prefill_intput_threshold: 0 # 0 is close layer wise prefill transfer_map: 30: "cuda:1"- match: name: "^model\\.layers\\.(0|[1-9]|[12][0-9])\\." replace: class: "default" kwargs: generate_device: "cuda:0" prefill_device: "cuda:0"- match: name: "(^model\\.layers\\.([3456][0-9])\\.)|(model.norm)|(lm_head)" replace: class: "default" kwargs: generate_device: "cuda:1" prefill_device: "cuda:1"已经参考issue #117、#44请问如何解决,非常感谢
Add Comment