Переглянути джерело

Update README.md with transformers usage

Yoach Lacombe 1 рік тому
батько
коміт
488d53e513
1 змінених файлів з 52 додано та 0 видалено
  1. 52 0
      README.md

+ 52 - 0
README.md

@@ -19,6 +19,7 @@ Links:
 - [Paper](https://dl.fbaipublicfiles.com/seamless/seamless_m4t_paper.pdf)
 - [Demo](https://seamless.metademolab.com/)
 - [🤗 Hugging Face space](https://huggingface.co/spaces/facebook/seamless_m4t)
+- [🤗 Hugging Face SeamlessM4T's docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t)
 
 # Quick Start
 ## Installation
@@ -97,6 +98,57 @@ Apart from Seamless-M4T large (2.3B) and medium (1.2B) models, we are also relea
 ## SeamlessAlign mined dataset
 We open-source the metadata to SeamlessAlign, the largest open dataset for multimodal translation, totaling 270k+ hours of aligned Speech and Text data. The dataset can be rebuilt by the community based on the [SeamlessAlign readme](docs/m4t/seamless_align_README.md).
 
+## 🤗 Transformers Usage
+
+SeamlessM4T is available in the Transformers library, requiring minimal dependencies. Steps to get started:
+
+1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main and [sentencepiece](https://github.com/google/sentencepiece):
+
+```
+pip install git+https://github.com/huggingface/transformers.git sentencepiece
+```
+
+2. Run the following Python code to generate speech samples. Here the target language is Russian:
+
+```py
+from transformers import AutoProcessor, SeamlessM4TModel
+
+processor = AutoProcessor.from_pretrained("facebook/hf-seamless-m4t-medium")
+model = SeamlessM4TModel.from_pretrained("facebook/hf-seamless-m4t-medium")
+
+# from audio
+audio = ... # must be a 16 kHz waveform array (list or numpy array)
+audio_inputs = processor(audios=audio, return_tensors="pt")
+audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
+
+# from text
+text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
+audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
+```
+
+3. Listen to the audio samples either in an ipynb notebook:
+
+```py
+from IPython.display import Audio
+
+sample_rate = model.sampling_rate
+Audio(audio_array_from_text, rate=sample_rate)
+# Audio(audio_array_from_audio, rate=sample_rate)
+```
+
+Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
+
+```py
+import scipy
+
+sample_rate = model.sampling_rate
+scipy.io.wavfile.write("out_from_text.wav", rate=sample_rate, data=audio_array_from_text)
+# scipy.io.wavfile.write("out_from_audio.wav", rate=sample_rate, data=audio_array_from_audio)
+```
+
+For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the 
+[SeamlessM4T docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t).
+
 # Citation
 If you use SeamlessM4T in your work or any models/datasets/artifacts published in SeamlessM4T, please cite :