소스 검색

Update inference-with-fastertransformer.md

papersnake 2 년 전
부모
커밋
62db1c9119
1개의 변경된 파일2개의 추가작업 그리고 0개의 파일을 삭제
  1. 2 0
      docs/inference-with-fastertransformer.md

+ 2 - 0
docs/inference-with-fastertransformer.md

@@ -8,6 +8,8 @@ We adapted the GLM-130B based on Fastertransformer for fast inference, with deta
 
 
 See [Get Model](/README.md#environment-setup).
 See [Get Model](/README.md#environment-setup).
 
 
+To run in int4 or int8 mode, please run [convert_tp.py](/tools/convert_tp.py) to generate the quantmized ckpt.
+
 ## Recommend: Run With Docker
 ## Recommend: Run With Docker
 
 
 Use Docker to quickly build a Flask API application for GLM-130B.
 Use Docker to quickly build a Flask API application for GLM-130B.