Quellcode durchsuchen

Update inference-with-fastertransformer.md

papersnake vor 2 Jahren
Ursprung
Commit
ef029153bc
1 geänderte Dateien mit 52 neuen und 10 gelöschten Zeilen
  1. 52 10
      docs/inference-with-fastertransformer.md

+ 52 - 10
docs/inference-with-fastertransformer.md

@@ -4,7 +4,55 @@
 
 We adapted the GLM-130B based on Fastertransformer for fast inference, with details in [benchmark](#benchmark) section.
 
-## Setup
+## Download the Model
+
+See [Get Model](/README.md#environment-setup).
+
+## Recommend: Run With Docker
+
+Use Docker to quickly build a Flask API application for GLM-130B.
+
+### Requirements
+
+- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
+
+### Build Container Image
+
+```bash
+git clone https://github.com/THUDM/FasterTransformer.git
+cd FasterTransformer
+bash docker/build.sh
+```
+
+### Run API With Checkpoints
+
+Set MPSIZE to the number of gpus needed for the checkpoints, and DATA_TYPE to checkpoints precision.
+
+If checkpoints exist, MPSIZE can be automatically identified.
+
+```bash
+docker run -it --rm --gpus all --shm-size=10g -p 5000:5000 \
+           -v <your path to checkpoints>/49300:/checkpoints:ro \
+           -e MPSIZE=8 -e DATA_TYPE=int4 \
+           ftglm:latest
+```
+
+### Test
+
+#### Benchmark
+
+```bash
+python3 examples/pytorch/glm/glm_server_test.py
+```
+
+#### Web Demo
+
+```bash
+pip install gradio
+python3 examples/pytorch/glm/glm_server_frontend_test.py
+```
+
+## Manual Configuration
 
 ### Requirements
 
@@ -16,10 +64,8 @@ We adapted the GLM-130B based on Fastertransformer for fast inference, with deta
 
 ### Setup Using Docker
 
-We recommend use nvcr image like `nvcr.io/nvidia/pytorch:21.09-py3` with [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
-
 ```bash
-docker run -it --rm --gpus all nvcr.io/nvidia/pytorch:21.09-py3 /bin/bash
+docker run -it --rm --gpus all nvcr.io/nvidia/pytorch:22.09-py3 /bin/bash
 conda install -y pybind11
 ```
 
@@ -65,10 +111,6 @@ cmake -DSM=80 -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON ..
 make -j
 ```
 
-### Download the Model
-
-See [Get Model](/README.md#environment-setup).
-
 ### Run GLM-130B
 
 Generate the `gemm_config.in` file.
@@ -78,10 +120,10 @@ Generate the `gemm_config.in` file.
 ./bin/gpt_gemm 1 1 128 96 128 49152 150528 1 8
 ```
 
-Running GLM_130B in Pytorch.
+Running GLM_130B in Pytorch and Flask.
 
 ```bash
-bash ../examples/pytorch/glm/benchmark-generation.sh
+bash ../examples/pytorch/glm/glm-server.sh
 ```
 
 You need to check and edit this file to set arguments such as `CHECKPOINT_PATH`.