AMCT DeepSeek-V3.2量化实验

张

张建站

2026/6/6 5:44:07

10分钟阅读

model quantization sample【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amctLatest News[2025/12] DeepSeek-V3.2 now supports block-by-block quantization inferenceOverviewThe experimental directory contains typical model samples for LLM quantization and inference. It is framework-independent and achieves quantization model accuracy close to bf16 through advanced PTQ quantization algorithms.Directory Structure Description├── docs # Documentation directory | ├── models # Model documentation directory | | ├── deepseek-v3.2 # DeepSeek-V3.2 related documentation | | └── ... ├── cores # Core algorithm directory | ├── calibrator # Block-by-block quantization learning interface | ├── models # Qwen3-MoE model scripts and execution configurations | | ├── deepseek-v3.2 # DeepSeek-V3.2 related model definitions | ├── quantization # Quantization layer related definitions | ├── utils # Common interfaces ├── pp # Operator directory | ├── forward # Multi-card serial inference | ├── run_pp_wiki.py # Compute wikitext ppl └── eval.py # wikitext accuracy calculation └── extract_calib_data.py # dump block-by-block data └── main.py # Block-by-block learning └── deploy.py # Generate quantization model └── README.md └── ...Usage InstructionsWe provide corresponding scripts in./scripts/. Examples are as follows:During the training phase, modifyw_bits,a_bits,q_bits,k_bits, andv_bitsaccording to actual needs. For C8 training, ensure that cls passes c8; otherwise, the MLA part training parameters will have no gradients. When training MoE, please adjust cls to bf16.During the testing phase, modifyw_bits,a_bits,q_bits,k_bits, andv_bitsaccording to actual needs. Also modify train_mode, which is divided intomla,moe,block, andoriginaccording to quantizing only MLA, quantizing only MoE, quantizing both MLAMoE, and not quantizing, respectively.Data Extractionpython3 extract_calib_data.py --model $model_path --output_dir $output_dirTrainingBlock-by-block C8 trainingpython ./main.py \ --model $model_path \ --w_bits 8 --a_bits 8 \ --q_bits 8 --k_bits 8 --v_bits 8 \ --cali_bsz 1 --epoch 25 --base_lr 1e-2 \ --lwc --lac \ --cls c8 \ --output_dir $output_path --data_dir $data_path \ --start_block_idx $start --end_block_idx $end --train_mode mla --dev 0Expert-by-expert training# Switch between A8W8 or A8W4 according to w_bits python ./main.py \ --model $model_path \ --w_bits 8 --a_bits 8 \ --q_bits 8 --k_bits 8 --v_bits 8 \ --cali_bsz 1 --epoch 25 --base_lr 1e-2 \ --lwc --lac \ --cls bf16 \ --output_dir $output_path --data_dir $data_path \ --start_block_idx $start --end_block_idx $end --train_mode moe --dev 0Testingpython3 ./eval.py \ --a_bits 8 \ --w_bits 8 \ --seq_len 4096 \ --cls c8 \ --model $model_path \ --train_mode block \ --output_dir $output_path \ --wikitext_final_out $wikitext_out \ --lac --lwc \ --start_block_idx 0 --end_block_idx 61 \ --mla_param_dir $mla_param_dir \ --moe_param_dir $moe_param_dirAccuracyQuantization Model Accuracy PerformanceModelPPLDeepSeek-V3.2-BF162.9987DeepSeek-V3.2-Exp-W8A8C83.0304DeepSeek-V3.2-Exp-W4A8C83.2320Main Function Parameter Descriptioneval.pygroup: Divide all blocks into group groups and execute in parallel in groupsbegin: Block sequence number start, usually 0end: Block sequence number end, such as 60 in DeepSeek-V3.2args.seq_len: Length of each text segmentargs.output_dir: Output save pathnum_npus: Number of NPU cards used, defaults to all NPU cards visible in the current window. Single card memory requirement is 64Gmain.pyargs.data_dir: Save path for dumped datatrain_mode: Select mla/moe trainingmodel_path: Model file save pathcls: Can be c8 or bf16. Please select bf16 when training moe and c8 when training mladeploy.pyinput_weight_path: Weight path to be converted (FP8/BF16)output_weight_path: Converted weight save pathquant_type: Quantized weight type (currently supports bfloat16, w8a8c16, w8a8c8, w4a8c16, w4a8c8)clip: Whether clip was done during trainingmla_param_path: MLA training result save pathmoe_param_path: MoE training result save path【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct创作声明：本文部分内容由AI辅助生成（AIGC），仅供参考

别再被名字骗了！用5个实际代码例子彻底搞懂C++ std::move到底‘移’了什么

别再被名字骗了！用5个实际代码例子彻底搞懂C std::move到底‘移’了什么在C11引入的移动语义中，std::move可能是最容易被误解的关键字之一。许多开发者第一次看到这个名称时，会下意识地认为它执行某种"移动"操作，但实际…...

2026/6/6 5:43:17 阅读更多 →

实现Beyond Compare 5企业级授权管理的完整方案

实现Beyond Compare 5企业级授权管理的完整方案【免费下载链接】BCompare_Keygen Keygen for BCompare 5 项目地址: https://gitcode.com/gh_mirrors/bc/BCompare_Keygen 在软件工程和IT运维领域，Beyond Compare 5作为专业的文件对比工具，其30天…...

2026/6/6 5:42:46 阅读更多 →

Java毕业设计用的宿舍+家居双场景智能管理源码（SpringBoot完整工程）

本文还有配套的精品资源，点击获取简介：一套开箱即用的Java智能管理源码，专为高校毕业设计打造，覆盖学生宿舍和普通家庭两大实际使用环境。系统基于SpringBoot开发，JDK8及以上可直接运行，数据库默认适配…...

2026/6/6 5:41:52 阅读更多 →

掌握Markdown实时预览：打造高效写作工作流的3个关键策略

掌握Markdown实时预览：打造高效写作工作流的3个关键策略【免费下载链接】markn Lightweight markdown viewer. 项目地址: https://gitcode.com/gh_mirrors/ma/markn 在当今数字创作时代，Markdown已成为技术文档、博客文章和个人笔记的首选格式。…...

2026/6/5 8:33:56 阅读更多 →

Win10/Win11下Realtek 8188GU网卡驱动感叹号？别急着扔，试试这个手动安装的野路子

Realtek 8188GU网卡驱动故障深度修复指南：从原理到实战当设备管理器里那个顽固的黄色感叹号挥之不去，而你已经尝试了所有"标准操作"——Windows自动更新、第三方驱动工具、甚至重启大法——却依然无济于事时，是时候换个思路了。这篇…...

2026/6/5 5:07:10 阅读更多 →

前轮驱动自行车机器人建模与自适应控制策略优化【附代码】

✨ 长期致力于自行车机器人、前轮驱动、Lagrange方程、自适应模糊控制、RBF网络自适应控制研究工作，擅长数据搜集与处理、建模仿真、程序编写、仿真设计。 ✅ 专业定制毕设、代码 ✅ 如需沟通交流，点击《获取方式》 （1）基于瞬时转…...

2026/6/5 5:07:29 阅读更多 →

ModTheSpire终极指南：5分钟安全安装《杀戮尖塔》模组管理器

ModTheSpire终极指南：5分钟安全安装《杀戮尖塔》模组管理器【免费下载链接】ModTheSpire External mod loader for Slay The Spire 项目地址: https://gitcode.com/gh_mirrors/mo/ModTheSpire 还在为《杀戮尖塔》模组安装的复杂流程而头疼吗？Mod…...

2026/6/4 8:10:02 阅读更多 →