1 år sedan · 4d7b7e1ddf
--- a/README.md
+++ b/README.md
@@ -10,9 +10,9 @@ SeamlessM4T models support the tasks of:
 
				 - Text-to-text translation (T2TT)
			
 
				 - Automatic speech recognition (ASR)
			
 
				 
			
 
				-:star2: We are releasing SemalessM4T v2, an updated version with our novel *UnitY2* architecture. This new model improves over SeamlessM4T v1 in quality as well as inference latency in speech generation tasks.
			
 
				+:star2: We are releasing SeamlessM4T v2, an updated version with our novel *UnitY2* architecture. This new model improves over SeamlessM4T v1 in quality as well as inference latency in speech generation tasks.
			
 
				 
			
 
				-To learn more about the collection of SeamlessM4T models, the approach used in each, their language coverage and their performance, visit the [SeamlessM4T README](docs/m4t/README.md) or [🤗 Model Card](https://huggingface.co/facebook/seamless-m4t-v2-large)
			
 
				+To learn more about the collection of SeamlessM4T models, the approach used in each, their language coverage and their performance, visit the [SeamlessM4T README](docs/m4t/README.md) or [🤗 Model Card](https://huggingface.co/facebook/seamless-m4t-v2-large).
			
 
				 
			
 
				 ## SeamlessExpressive
			
 
				 
			
@@ -124,7 +124,7 @@ You can also run the demo locally, by cloning the space from [here](https://hugg
 
				 
			
 
				 ## Running SeamlessM4T & SeamlessExpressive [Gradio](https://github.com/gradio-app/gradio) demos locally
			
 
				 
			
 
				-To launch the same space demo we host on HuggingFace locally,
			
 
				+To launch the same demo Space we host on Hugging Face locally:
			
 
				 
			
 
				 ```bash
			
 
				 cd demo
			
@@ -132,6 +132,9 @@ pip install -r requirements.txt
 
				 python app.py
			
 
				 ```
			
 
				 
			
 
				+Seamless M4T is also available in the 🤗 Transformers library. For more details, refer to the [SeamlessM4T docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2) 
			
 
				+or this hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/explanatory_notebooks/blob/main/seamless_m4t_hugging_face.ipynb).
			
 
				+
			
 
				 # Resources and usage
			
 
				 ## Model
			
 
				 ### SeamlessM4T models
			
--- a/docs/m4t/README.md
+++ b/docs/m4t/README.md
@@ -13,6 +13,8 @@ This unified model enables multiple tasks without relying on multiple separate m
 
				 - Text-to-text translation (T2TT)
			
 
				 - Automatic speech recognition (ASR).
			
 
				 
			
 
				+> [!NOTE]
			
 
				+> SeamlessM4T v2 and v1 are also supported in the 🤗 Transformers library, more on it [in the dedicated section below](#transformers-usage).
			
 
				 
			
 
				 ## SeamlessM4T v1
			
 
				 The v1 version of SeamlessM4T is a multitask adaptation of the *UnitY* architecture [(Inaguma et al., 2023)](https://aclanthology.org/2023.acl-long.872/). 
			
@@ -23,7 +25,6 @@ The v1 version of SeamlessM4T is a multitask adaptation of the *UnitY* architect
 
				 The v2 version of SeamlessM4T is a multitask adaptation of our novel *UnitY2* architecture. 
			
 
				 *Unity2* with its hierarchical character-to-unit upsampling and non-autoregressive text-to-unit decoding considerably improves over SeamlessM4T v1 in quality and inference speed.
			
 
				 
			
 
				-
			
 
				 ![SeamlessM4T architectures](seamlessm4t_arch.svg)
			
 
				 
			
 
				 ## SeamlessM4T  models
			
@@ -162,6 +163,60 @@ The `target` column specifies whether a language is supported as target speech (
 
				 
			
 
				 Note that seamlessM4T-medium supports 200 languages in the text modality, and is based on NLLB-200 (see full list in [asset card](src/seamless_communication/cards/unity_nllb-200.yaml))
			
 
				 
			
 
				+## Transformers usage
			
 
				+
			
 
				+SeamlessM4T is available in the 🤗 Transformers library, requiring minimal dependencies. Steps to get started:
			
 
				+
			
 
				+1. First install the 🤗 [Transformers library](https://github.com/huggingface/transformers) from main and [sentencepiece](https://github.com/google/sentencepiece):
			
 
				+
			
 
				+```
			
 
				+pip install git+https://github.com/huggingface/transformers.git sentencepiece
			
 
				+```
			
 
				+
			
 
				+2. Run the following Python code to generate speech samples. Here the target language is Russian:
			
 
				+
			
 
				+```py
			
 
				+from transformers import AutoProcessor, SeamlessM4Tv2Model
			
 
				+
			
 
				+processor = AutoProcessor.from_pretrained("facebook/seamless-m4t-v2-large")
			
 
				+model = SeamlessM4Tv2Model.from_pretrained("facebook/seamless-m4t-v2-large")
			
 
				+
			
 
				+# from text
			
 
				+text_inputs = processor(text = "Hello, my dog is cute", src_lang="eng", return_tensors="pt")
			
 
				+audio_array_from_text = model.generate(**text_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
			
 
				+
			
 
				+# from audio
			
 
				+audio, orig_freq =  torchaudio.load("https://www2.cs.uic.edu/~i101/SoundFiles/preamble10.wav")
			
 
				+audio =  torchaudio.functional.resample(audio, orig_freq=orig_freq, new_freq=16_000) # must be a 16 kHz waveform array
			
 
				+audio_inputs = processor(audios=audio, return_tensors="pt")
			
 
				+audio_array_from_audio = model.generate(**audio_inputs, tgt_lang="rus")[0].cpu().numpy().squeeze()
			
 
				+```
			
 
				+
			
 
				+3. Listen to the audio samples either in an ipynb notebook:
			
 
				+
			
 
				+```py
			
 
				+from IPython.display import Audio
			
 
				+
			
 
				+sample_rate = model.sampling_rate
			
 
				+Audio(audio_array_from_text, rate=sample_rate)
			
 
				+# Audio(audio_array_from_audio, rate=sample_rate)
			
 
				+```
			
 
				+
			
 
				+Or save them as a `.wav` file using a third-party library, e.g. `scipy`:
			
 
				+
			
 
				+```py
			
 
				+import scipy
			
 
				+
			
 
				+sample_rate = model.sampling_rate
			
 
				+scipy.io.wavfile.write("out_from_text.wav", rate=sample_rate, data=audio_array_from_text)
			
 
				+# scipy.io.wavfile.write("out_from_audio.wav", rate=sample_rate, data=audio_array_from_audio)
			
 
				+```
			
 
				+
			
 
				+> [!NOTE]  
			
 
				+> For more details on using the SeamlessM4T model for inference using the 🤗 Transformers library, refer to the 
			
 
				+[SeamlessM4T v2 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t_v2), the 
			
 
				+[SeamlessM4T v1 docs](https://huggingface.co/docs/transformers/main/en/model_doc/seamless_m4t) or to this hands-on [Google Colab](https://colab.research.google.com/github/ylacombe/scripts_and_notebooks/blob/main/v2_seamless_m4t_hugging_face.ipynb).
			
 
				+
			
 
				 ## Citation
			
 
				 For *UnitY*, please cite :
			
 
				 ```bibtex