|
há 2 anos atrás | |
---|---|---|
configs | há 3 anos atrás | |
cuda | há 3 anos atrás | |
docs | há 2 anos atrás | |
evaluation | há 2 anos atrás | |
generation | há 2 anos atrás | |
kernels | há 3 anos atrás | |
logs | há 2 anos atrás | |
quantization | há 3 anos atrás | |
resources | há 2 anos atrás | |
scripts | há 2 anos atrás | |
tasks | há 2 anos atrás | |
tools | há 2 anos atrás | |
.gitignore | há 3 anos atrás | |
LICENSE | há 3 anos atrás | |
MODEL_LICENSE | há 3 anos atrás | |
README.md | há 2 anos atrás | |
README_zh.md | há 3 anos atrás | |
benchmark.py | há 2 anos atrás | |
evaluate.py | há 3 anos atrás | |
generate.py | há 2 anos atrás | |
initialize.py | há 2 anos atrás | |
requirements.txt | há 2 anos atrás |
🌐 Blog • ⏬ Download Model • 🪧 Demo • ✉️ Email • 📃 Paper [ICLR 2023]
💬 Google Group (Updates) or Wechat Group or Slack channel (Discussions)
GLM-130B is an open bilingual (English & Chinese) bidirectional dense model with 130 billion parameters, pre-trained using the algorithm of General Language Model (GLM). It is designed to support inference tasks with the 130B parameters on a single A100 (40G * 8) or V100 (32G * 8) server. With INT4 quantization, the hardware requirements can further be reduced to a single server with 4 * RTX 3090 (24G) with almost no performance degradation. As of July 3rd, 2022, GLM-130B has been trained on over 400 billion text tokens (200B each for Chinese and English) and it has the following unique features:
This repository mainly focus on the evaluation of GLM-130B, the training part is open for research purposes, please send an email to glm-130b@googlegroups.com to apply for access. If you find our work and our open-sourced efforts useful, ⭐️ to encourage our following development! :)
For smaller models, please find monolingual GLMs (English: 10B/2B/515M/410M/335M/110M, Chinese: 10B/335M) and an 1B multilingual GLM (104 languages).
Hardware | GPU Memory | Quantization | Weight Offload |
---|---|---|---|
8 * A100 | 40 GB | No | No |
8 * V100 | 32 GB | No | Yes (BMInf) |
8 * V100 | 32 GB | INT8 | No |
8 * RTX 3090 | 24 GB | INT8 | No |
4 * RTX 3090 | 24 GB | INT4 | No |
8 * RTX 2080 Ti | 11 GB | INT4 | No |
It is recommended to use the an A100 (40G * 8) server, as all GLM-130B evaluation results (~30 tasks) reported can be easily reproduced with a single A100 server in about half a day. With INT8/INT4 quantization, efficient inference on a single server with 4 * RTX 3090 (24G) is possible, see Quantization of GLM-130B for details. Combining quantization and weight offloading techniques, GLM-130B can also be inferenced on servers with even smaller GPU memory, see Low-Resource Inference for details.
The GLM-130B code is built on the top of SAT. We recommend using Miniconda to manage your environment and installing additional dependencies via pip install -r requirements.txt
. Here are the recommended environment configurations:
Download the GLM-130B’s model checkpoint from here, make sure all 60 chunks are downloaded completely, then use the following command to merge them into a single archive file and extract it:
cat glm-130b-sat.tar.part_* > glm-130b-sat.tar
tar xvf glm-130b-sat.tar
Set CHECKPOINT_PATH
in configs/model_glm_130b.sh
to the path of the extracted folder. Since the checkpoint file is up to 260G, it is recommended to use the SSD or RAM disk to reduce the checkpoint loading time. Since the checkpoint we distribute is in 8-way tensor parallel, a conversion scripts is also provided if you need to change the tensor parallel dimension.
python tools/convert_tp.py \
--input-folder <SRC_CKPT_PATH> \
--output-folder <DST_CKPT_PATH> \
--target-tp <TARGET_TP>
bash scripts/generate.sh --input-source interactive
You can also specify an input file by --input-source input.txt
.
GLM-130B uses two different mask tokens: [MASK]
for short blank filling and [gMASK]
for left-to-right long text generation. When the input does not contain any MASK token, [gMASK]
will be automatically appended to the end of the text.
We use the YAML file to define tasks. Specifically, you can add multiple tasks or folders at a time for evaluation, and the evaluation script will automatically collect all YAML files under those folders recursively.
bash scripts/evaluate.sh task1.yaml task2.yaml dir1 dir2 ...
Download our evaluation dataset here, and set DATA_PATH
in scripts/evaluate.sh
to your local dataset directory. The task folder contains the YAML files for 30+ tasks we evaluated for GLM-130B. Take the CoLA task for example, run bash scripts/evaluate.sh tasks/bloom/glue_cola.yaml
, which outputs an accuracy of ~65% for the best prompt and ~57% for the median.
Multi-node evaluation can be configured by setting HOST_FILE_PATH
(required by the DeepSpeed lanucher) in scripts/evaluate_multiple_node.sh
. Set DATA_PATH
in scripts/evaluate_multiple_node.sh
and run the following command to evaluate all the tasks in ./task
directory.
bash scripts/evaluate_multiple_node.sh ./tasks
See Evaluate Your Own Tasks for details on how to add new tasks.
By adapting the GLM-130B model to FasterTransfomer, a highly optimized transformer model library by NVIDIA, we can reach up to 2.5X speedup on generation, see Inference with FasterTransformer for details.
This repository is licensed under the Apache-2.0 license. The use of GLM-130B model weights is subject to the Model License.
If you find our work useful, please consider citing GLM-130B:
@inproceedings{
zeng2023glm-130b,
title={{GLM}-130B: An Open Bilingual Pre-trained Model},
author={Aohan Zeng and Xiao Liu and Zhengxiao Du and Zihan Wang and Hanyu Lai and Ming Ding and Zhuoyi Yang and Yifan Xu and Wendi Zheng and Xiao Xia and Weng Lam Tam and Zixuan Ma and Yufei Xue and Jidong Zhai and Wenguang Chen and Zhiyuan Liu and Peng Zhang and Yuxiao Dong and Jie Tang},
booktitle={The Eleventh International Conference on Learning Representations (ICLR)},
year={2023},
url={https://openreview.net/forum?id=-Aw0rrrPUF}
}
You may also consider GLM's original work in your reference:
@inproceedings{du2022glm,
title={GLM: General Language Model Pretraining with Autoregressive Blank Infilling},
author={Du, Zhengxiao and Qian, Yujie and Liu, Xiao and Ding, Ming and Qiu, Jiezhong and Yang, Zhilin and Tang, Jie},
booktitle={Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers)},
pages={320--335},
year={2022}
}