AMCT DeepSeek-V3.2量化实验
model quantization sample【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amctLatest News[2025/12] DeepSeek-V3.2 now supports block-by-block quantization inferenceOverviewThe experimental directory contains typical model samples for LLM quantization and inference. It is framework-independent and achieves quantization model accuracy close to bf16 through advanced PTQ quantization algorithms.Directory Structure Description├── docs # Documentation directory | ├── models # Model documentation directory | | ├── deepseek-v3.2 # DeepSeek-V3.2 related documentation | | └── ... ├── cores # Core algorithm directory | ├── calibrator # Block-by-block quantization learning interface | ├── models # Qwen3-MoE model scripts and execution configurations | | ├── deepseek-v3.2 # DeepSeek-V3.2 related model definitions | ├── quantization # Quantization layer related definitions | ├── utils # Common interfaces ├── pp # Operator directory | ├── forward # Multi-card serial inference | ├── run_pp_wiki.py # Compute wikitext ppl └── eval.py # wikitext accuracy calculation └── extract_calib_data.py # dump block-by-block data └── main.py # Block-by-block learning └── deploy.py # Generate quantization model └── README.md └── ...Usage InstructionsWe provide corresponding scripts in./scripts/. Examples are as follows:During the training phase, modifyw_bits,a_bits,q_bits,k_bits, andv_bitsaccording to actual needs. For C8 training, ensure that cls passes c8; otherwise, the MLA part training parameters will have no gradients. When training MoE, please adjust cls to bf16.During the testing phase, modifyw_bits,a_bits,q_bits,k_bits, andv_bitsaccording to actual needs. Also modify train_mode, which is divided intomla,moe,block, andoriginaccording to quantizing only MLA, quantizing only MoE, quantizing both MLAMoE, and not quantizing, respectively.Data Extractionpython3 extract_calib_data.py --model $model_path --output_dir $output_dirTrainingBlock-by-block C8 trainingpython ./main.py \ --model $model_path \ --w_bits 8 --a_bits 8 \ --q_bits 8 --k_bits 8 --v_bits 8 \ --cali_bsz 1 --epoch 25 --base_lr 1e-2 \ --lwc --lac \ --cls c8 \ --output_dir $output_path --data_dir $data_path \ --start_block_idx $start --end_block_idx $end --train_mode mla --dev 0Expert-by-expert training# Switch between A8W8 or A8W4 according to w_bits python ./main.py \ --model $model_path \ --w_bits 8 --a_bits 8 \ --q_bits 8 --k_bits 8 --v_bits 8 \ --cali_bsz 1 --epoch 25 --base_lr 1e-2 \ --lwc --lac \ --cls bf16 \ --output_dir $output_path --data_dir $data_path \ --start_block_idx $start --end_block_idx $end --train_mode moe --dev 0Testingpython3 ./eval.py \ --a_bits 8 \ --w_bits 8 \ --seq_len 4096 \ --cls c8 \ --model $model_path \ --train_mode block \ --output_dir $output_path \ --wikitext_final_out $wikitext_out \ --lac --lwc \ --start_block_idx 0 --end_block_idx 61 \ --mla_param_dir $mla_param_dir \ --moe_param_dir $moe_param_dirAccuracyQuantization Model Accuracy PerformanceModelPPLDeepSeek-V3.2-BF162.9987DeepSeek-V3.2-Exp-W8A8C83.0304DeepSeek-V3.2-Exp-W4A8C83.2320Main Function Parameter Descriptioneval.pygroup: Divide all blocks into group groups and execute in parallel in groupsbegin: Block sequence number start, usually 0end: Block sequence number end, such as 60 in DeepSeek-V3.2args.seq_len: Length of each text segmentargs.output_dir: Output save pathnum_npus: Number of NPU cards used, defaults to all NPU cards visible in the current window. Single card memory requirement is 64Gmain.pyargs.data_dir: Save path for dumped datatrain_mode: Select mla/moe trainingmodel_path: Model file save pathcls: Can be c8 or bf16. Please select bf16 when training moe and c8 when training mladeploy.pyinput_weight_path: Weight path to be converted (FP8/BF16)output_weight_path: Converted weight save pathquant_type: Quantized weight type (currently supports bfloat16, w8a8c16, w8a8c8, w4a8c16, w4a8c8)clip: Whether clip was done during trainingmla_param_path: MLA training result save pathmoe_param_path: MoE training result save path【免费下载链接】amctAMCT是CANN提供的昇腾AI处理器亲和的模型压缩工具仓。项目地址: https://gitcode.com/cann/amct创作声明:本文部分内容由AI辅助生成(AIGC),仅供参考