# Inference with SeamlessM4T models SeamlessM4T models currently support five tasks: - Speech-to-speech translation (S2ST) - Speech-to-text translation (S2TT) - Text-to-speech translation (T2ST) - Text-to-text translation (T2TT) - Automatic speech recognition (ASR) ## Quick start: Inference is run with the CLI, from the root directory of the repository. The model can be specified with `--model_name` `seamlessM4T_large` or `seamlessM4T_medium`: **S2ST**: ```bash python scripts/m4t/predict/predict.py s2st --output_path --model_name seamlessM4T_large ``` **S2TT**: ```bash python scripts/m4t/predict/predict.py s2tt ``` **T2TT**: ```bash python scripts/m4t/predict/predict.py t2tt --src_lang ``` **T2ST**: ```bash python scripts/m4t/predict/predict.py t2st --src_lang --output_path ``` **ASR**: ```bash python scripts/m4t/predict/predict.py asr ``` ## Inference breakdown Inference calls for the `Translator` object instantiated with a Multitasking UnitY model with the options: - `seamlessM4T_large` - `seamlessM4T_medium` and a vocoder `vocoder_36langs` ```python import torch import torchaudio from seamless_communication.models.inference import Translator # Initialize a Translator object with a multitask model, vocoder on the GPU. translator = Translator("seamlessM4T_large", "vocoder_36langs", torch.device("cuda:0")) ``` Now `predict()` can be used to run inference as many times on any of the supported tasks. Given an input audio with `` or an input text `` in ``, we can translate into `` as follows: ## S2ST and T2ST: ```python # S2ST translated_text, wav, sr = translator.predict(, "s2st", ) # T2ST translated_text, wav, sr = translator.predict(, "t2st", , src_lang=) ``` Note that `` must be specified for T2ST. The generated units are synthesized and the output audio file is saved with: ```python wav, sr = translator.synthesize_speech(, ) # Save the translated audio generation. torchaudio.save( , wav[0].cpu(), sample_rate=sr, ) ``` ## S2TT, T2TT and ASR: ```python # S2TT translated_text, _, _ = translator.predict(, "s2tt", ) # ASR # This is equivalent to S2TT with `=`. transcribed_text, _, _ = translator.predict(, "asr", ) # T2TT translated_text, _, _ = translator.predict(, "t2tt", , src_lang=) ``` Note that `` must be specified for T2TT