|
@@ -1,7 +1,7 @@
|
|
|
<img src="resources/7D6433A42D189E2E6FBC62BE066BCE91.png">
|
|
|
|
|
|
<p align="center">
|
|
|
- 🌐 <a href="http://keg.cs.tsinghua.edu.cn/glm-130b/posts/glm-130b/" target="_blank">Blog</a> • ⏬ <a href="https://docs.google.com/forms/d/e/1FAIpQLSehr5Dh_i3TwACmFFi8QEgIVNYGmSPwV0GueIcsUev0NEfUug/viewform" target="_blank">Download Model</a> • 🪧 <a href="https://huggingface.co/spaces/THUDM/GLM-130B" target="_blank">Demo</a> • ✉️ <a href="mailto:glm-130b@googlegroups.com">Email</a> • 📃 <a href="http://arxiv.org/abs/2210.02414" target="_blank">Paper</a><br>
|
|
|
+ 🌐 <a href="http://keg.cs.tsinghua.edu.cn/glm-130b/posts/glm-130b/" target="_blank">Blog</a> • ⏬ <a href="https://docs.google.com/forms/d/e/1FAIpQLSehr5Dh_i3TwACmFFi8QEgIVNYGmSPwV0GueIcsUev0NEfUug/viewform" target="_blank">Download Model</a> • 🪧 <a href="https://huggingface.co/spaces/THUDM/GLM-130B" target="_blank">Demo</a> • ✉️ <a href="mailto:glm-130b@googlegroups.com">Email</a> • 📃 <a href="https://arxiv.org/abs/2210.02414" target="_blank">Paper</a><br>
|
|
|
</p>
|
|
|
|
|
|
<p align="center">
|
|
@@ -21,8 +21,8 @@ GLM-130B is an open bilingual (English & Chinese) bidirectional dense model with
|
|
|
|
|
|
If you find our work and our open-sourced model useful, starring our repo to encourage our following development! :)
|
|
|
|
|
|
-## News
|
|
|
-
|
|
|
+## News
|
|
|
+
|
|
|
- **[2022.10.06]** Our [paper](http://arxiv.org/abs/2210.02414) for GLM-130B is out!
|
|
|
- **[2022.08.24]** We are proud to publish the quantized version for GLM-130B. While preserving the activation precision as FP16, the model weights can be quantized to as low as **INT4 with almost no degradation of performance**, further reducing the hardware requirements of the GLM-130B to **a single server with 4 * RTX 3090 (24G)**! See [Quantization of GLM-130B](docs/quantization.md) for details.
|
|
|
|
|
@@ -50,8 +50,8 @@ It is recommended to use the an A100 (40G * 8) server, as all GLM-130B evaluatio
|
|
|
The GLM-130B code is built on the top of [SAT](https://github.com/THUDM/SwissArmyTransformer). We recommend using [Miniconda](https://docs.conda.io/en/latest/miniconda.html) to manage your environment and installing additional dependencies via `pip install -r requirements.txt`. Here are the recommended environment configurations:
|
|
|
|
|
|
- Python 3.9+ / CUDA 11+ / PyTorch 1.10+ / DeepSpeed 0.6+ / Apex (**installation with CUDA and C++ extensions is required, see [here](https://github.com/NVIDIA/apex/#linux)**)
|
|
|
-- SwissArmyTransformer>=0.2.11 is required for quantization
|
|
|
-
|
|
|
+- SwissArmyTransformer>=0.2.11 is required for quantization
|
|
|
+
|
|
|
#### Model weights
|
|
|
|
|
|
Download the GLM-130B’s model checkpoint from [here](https://docs.google.com/forms/d/e/1FAIpQLSehr5Dh_i3TwACmFFi8QEgIVNYGmSPwV0GueIcsUev0NEfUug/viewform?usp=sf_link), make sure all 60 chunks are downloaded completely, then use the following command to merge them into a single archive file and extract it:
|
|
@@ -166,9 +166,9 @@ See [Evaluate Your Own Tasks](docs/evaluate-your-own-tasks.md) for details on ho
|
|
|
### 2.5X faster Inference using FasterTransformer
|
|
|
|
|
|
By adapting the GLM-130B model to [FasterTransfomer](https://github.com/NVIDIA/FasterTransformer), a highly optimized transformer model library by NVIDIA, we can reach up to 2.5X speedup on generation, see [Inference with FasterTransformer](docs/inference-with-fastertransformer.md) for details.
|
|
|
-
|
|
|
-
|
|
|
-
|
|
|
+
|
|
|
+
|
|
|
+
|
|
|
|
|
|
<details>
|
|
|
<summary><b>Acknowledgement</b></summary>
|