ソースを参照

Update README

Sengxian 2 年 前
コミット
f6e869c2f7
1 ファイル変更3 行追加1 行削除
  1. 3 1
      docs/quantization.md

+ 3 - 1
docs/quantization.md

@@ -15,7 +15,9 @@ python tools/convert_tp.py \
     --target-tp 4
 ```
 
-Finally, change the model config file from `configs/model_glm_130b.sh` to `configs/model_glm_130b_{int4/int8}.sh` in your scripts (e.g. `scripts/generate.sh`), then run your scripts just as normal.
+Finally, change the model config file from `configs/model_glm_130b.sh` to `configs/model_glm_130b_{int4/int8}.sh` in your scripts (e.g. `scripts/generate.sh`), then run your scripts just as normal.
+ 
+By default, the full precision checkpoint is expected to be loaded. Run the conversion script with `--quantization-bit-width <4 or 8>` will produce quantized model weights. To load from a quantized checkpoint, you should add `--from-quantized-checkpoint` in your model config file.
 
 ## Evaluation Results