3 lat temu · 4e863af209
--- a/README.md
+++ b/README.md
@@ -1,7 +1,7 @@
 
				 <img src="resources/7D6433A42D189E2E6FBC62BE066BCE91.png">

			
 
				 

			
 
				 <p align="center">

			
 
				-   🌐 <a href="http://keg.cs.tsinghua.edu.cn/glm-130b/posts/glm-130b/" target="_blank">Blog</a> • ⏬ <a href="https://docs.google.com/forms/d/e/1FAIpQLSehr5Dh_i3TwACmFFi8QEgIVNYGmSPwV0GueIcsUev0NEfUug/viewform" target="_blank">Download Model</a> • 🪧 <a href="https://huggingface.co/spaces/THUDM/GLM-130B" target="_blank">Demo</a> • ✉️ <a href="mailto:glm-130b@googlegroups.com">Email</a> • 📃 <a href="http://arxiv.org/abs/2210.02414" target="_blank">Paper</a><br>

			
 
				+   🌐 <a href="http://keg.cs.tsinghua.edu.cn/glm-130b/posts/glm-130b/" target="_blank">Blog</a> • ⏬ <a href="https://docs.google.com/forms/d/e/1FAIpQLSehr5Dh_i3TwACmFFi8QEgIVNYGmSPwV0GueIcsUev0NEfUug/viewform" target="_blank">Download Model</a> • 🪧 <a href="https://huggingface.co/spaces/THUDM/GLM-130B" target="_blank">Demo</a> • ✉️ <a href="mailto:glm-130b@googlegroups.com">Email</a> • 📃 <a href="https://arxiv.org/abs/2210.02414" target="_blank">Paper</a><br>

			
 
				 </p>

			
 
				 

			
 
				 <p align="center">

			
@@ -21,8 +21,8 @@ GLM-130B is an open bilingual (English & Chinese) bidirectional dense model with
 
				 

			
 
				 If you find our work and our open-sourced model useful, starring our repo to encourage our following development! :)

			
 
				 

			
 
				-## News
			
 
				-
			
 
				+## News

			
 
				+

			
 
				 - **[2022.10.06]** Our [paper](http://arxiv.org/abs/2210.02414) for GLM-130B is out!

			
 
				 - **[2022.08.24]** We are proud to publish the quantized version for GLM-130B.  While preserving the activation precision as FP16, the model weights can be quantized to as low as **INT4 with almost no degradation of performance**, further reducing the hardware requirements of the GLM-130B to **a single server with 4 * RTX 3090 (24G)**! See [Quantization of GLM-130B](docs/quantization.md) for details.

			
 
				 

			
@@ -50,8 +50,8 @@ It is recommended to use the an A100 (40G * 8) server, as all GLM-130B evaluatio
 
				 The GLM-130B code is built on the top of [SAT](https://github.com/THUDM/SwissArmyTransformer). We recommend using [Miniconda](https://docs.conda.io/en/latest/miniconda.html) to manage your environment and installing additional dependencies via `pip install -r requirements.txt`. Here are the recommended environment configurations:

			
 
				 

			
 
				 - Python 3.9+ / CUDA 11+ / PyTorch 1.10+ / DeepSpeed 0.6+ / Apex (**installation with CUDA and C++ extensions is required, see [here](https://github.com/NVIDIA/apex/#linux)**)

			
 
				-- SwissArmyTransformer>=0.2.11 is required for quantization
			
 
				-
			
 
				+- SwissArmyTransformer>=0.2.11 is required for quantization

			
 
				+

			
 
				 #### Model weights

			
 
				 

			
 
				 Download the GLM-130B’s model checkpoint from [here](https://docs.google.com/forms/d/e/1FAIpQLSehr5Dh_i3TwACmFFi8QEgIVNYGmSPwV0GueIcsUev0NEfUug/viewform?usp=sf_link), make sure all 60 chunks are downloaded completely, then use the following command to merge them into a single archive file and extract it:

			
@@ -166,9 +166,9 @@ See [Evaluate Your Own Tasks](docs/evaluate-your-own-tasks.md) for details on ho
 
				 ### 2.5X faster Inference using FasterTransformer

			
 
				 

			
 
				 By adapting the GLM-130B model to [FasterTransfomer](https://github.com/NVIDIA/FasterTransformer), a highly optimized transformer model library by NVIDIA, we can reach up to 2.5X speedup on generation, see [Inference with FasterTransformer](docs/inference-with-fastertransformer.md) for details.

			
 
				-
			
 
				-
			
 
				-
			
 
				+

			
 
				+

			
 
				+

			
 
				 

			
 
				 <details>

			
 
				 <summary><b>Acknowledgement</b></summary>