|
@@ -4,7 +4,55 @@
|
|
|
|
|
|
We adapted the GLM-130B based on Fastertransformer for fast inference, with details in [benchmark](#benchmark) section.
|
|
|
|
|
|
-## Setup
|
|
|
+## Download the Model
|
|
|
+
|
|
|
+See [Get Model](/README.md#environment-setup).
|
|
|
+
|
|
|
+## Recommend: Run With Docker
|
|
|
+
|
|
|
+Use Docker to quickly build a Flask API application for GLM-130B.
|
|
|
+
|
|
|
+### Requirements
|
|
|
+
|
|
|
+- [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html)
|
|
|
+
|
|
|
+### Build Container Image
|
|
|
+
|
|
|
+```bash
|
|
|
+git clone https://github.com/THUDM/FasterTransformer.git
|
|
|
+cd FasterTransformer
|
|
|
+bash docker/build.sh
|
|
|
+```
|
|
|
+
|
|
|
+### Run API With Checkpoints
|
|
|
+
|
|
|
+Set MPSIZE to the number of gpus needed for the checkpoints, and DATA_TYPE to checkpoints precision.
|
|
|
+
|
|
|
+If checkpoints exist, MPSIZE can be automatically identified.
|
|
|
+
|
|
|
+```bash
|
|
|
+docker run -it --rm --gpus all --shm-size=10g -p 5000:5000 \
|
|
|
+ -v <your path to checkpoints>/49300:/checkpoints:ro \
|
|
|
+ -e MPSIZE=8 -e DATA_TYPE=int4 \
|
|
|
+ ftglm:latest
|
|
|
+```
|
|
|
+
|
|
|
+### Test
|
|
|
+
|
|
|
+#### Benchmark
|
|
|
+
|
|
|
+```bash
|
|
|
+python3 examples/pytorch/glm/glm_server_test.py
|
|
|
+```
|
|
|
+
|
|
|
+#### Web Demo
|
|
|
+
|
|
|
+```bash
|
|
|
+pip install gradio
|
|
|
+python3 examples/pytorch/glm/glm_server_frontend_test.py
|
|
|
+```
|
|
|
+
|
|
|
+## Manual Configuration
|
|
|
|
|
|
### Requirements
|
|
|
|
|
@@ -16,10 +64,8 @@ We adapted the GLM-130B based on Fastertransformer for fast inference, with deta
|
|
|
|
|
|
### Setup Using Docker
|
|
|
|
|
|
-We recommend use nvcr image like `nvcr.io/nvidia/pytorch:21.09-py3` with [NVIDIA Container Toolkit](https://docs.nvidia.com/datacenter/cloud-native/container-toolkit/install-guide.html).
|
|
|
-
|
|
|
```bash
|
|
|
-docker run -it --rm --gpus all nvcr.io/nvidia/pytorch:21.09-py3 /bin/bash
|
|
|
+docker run -it --rm --gpus all nvcr.io/nvidia/pytorch:22.09-py3 /bin/bash
|
|
|
conda install -y pybind11
|
|
|
```
|
|
|
|
|
@@ -65,10 +111,6 @@ cmake -DSM=80 -DCMAKE_BUILD_TYPE=Release -DBUILD_PYT=ON -DBUILD_MULTI_GPU=ON ..
|
|
|
make -j
|
|
|
```
|
|
|
|
|
|
-### Download the Model
|
|
|
-
|
|
|
-See [Get Model](/README.md#environment-setup).
|
|
|
-
|
|
|
### Run GLM-130B
|
|
|
|
|
|
Generate the `gemm_config.in` file.
|
|
@@ -78,10 +120,10 @@ Generate the `gemm_config.in` file.
|
|
|
./bin/gpt_gemm 1 1 128 96 128 49152 150528 1 8
|
|
|
```
|
|
|
|
|
|
-Running GLM_130B in Pytorch.
|
|
|
+Running GLM_130B in Pytorch and Flask.
|
|
|
|
|
|
```bash
|
|
|
-bash ../examples/pytorch/glm/benchmark-generation.sh
|
|
|
+bash ../examples/pytorch/glm/glm-server.sh
|
|
|
```
|
|
|
|
|
|
You need to check and edit this file to set arguments such as `CHECKPOINT_PATH`.
|