|
@@ -1,4 +1,4 @@
|
|
|
-# Instructions to run inference with SeamlessM4T models
|
|
|
+# Inference with SeamlessM4T models
|
|
|
|
|
|
SeamlessM4T models currenlty support five tasks:
|
|
|
- Speech-to-speech translation (S2ST)
|
|
@@ -7,6 +7,40 @@ SeamlessM4T models currenlty support five tasks:
|
|
|
- Text-to-text translation (T2TT)
|
|
|
- Automatic speech recognition (ASR)
|
|
|
|
|
|
+
|
|
|
+
|
|
|
+## Quick start:
|
|
|
+Inference is run with the CLI, from the root directory of the repository.
|
|
|
+
|
|
|
+The model can be specified with `--model_name` `multitask_unity_large` or `multitask_unity_medium`:
|
|
|
+
|
|
|
+**S2ST**:
|
|
|
+```bash
|
|
|
+python scripts/m4t/predict/predict.py <path_to_input_audio> s2st <tgt_lang> --output_path <path_to_save_audio> --model_name multitask_unity_large
|
|
|
+```
|
|
|
+
|
|
|
+**S2TT**:
|
|
|
+```bash
|
|
|
+python scripts/m4t/predict/predict.py <path_to_input_audio> s2tt <tgt_lang>
|
|
|
+```
|
|
|
+
|
|
|
+**T2TT**:
|
|
|
+```bash
|
|
|
+python scripts/m4t/predict/predict.py <input_text> t2tt <tgt_lang> --src_lang <src_lang>
|
|
|
+```
|
|
|
+
|
|
|
+**T2ST**:
|
|
|
+```bash
|
|
|
+python scripts/m4t/predict/predict.py <input_text> t2st <tgt_lang> --src_lang <src_lang> --output_path <path_to_save_audio>
|
|
|
+```
|
|
|
+
|
|
|
+**ASR**:
|
|
|
+```bash
|
|
|
+python scripts/m4t/predict/predict.py <path_to_input_audio> asr <tgt_lang>
|
|
|
+```
|
|
|
+
|
|
|
+## Inference breadown
|
|
|
+
|
|
|
Inference calls for the `Translator` object instanciated with a Multitasking UnitY model with the options:
|
|
|
- `multitask_unity_large`
|
|
|
- `multitask_unity_medium`
|
|
@@ -67,34 +101,3 @@ translated_text, _, _ = translator.predict(<input_text>, "t2tt", <tgt_lang>, src
|
|
|
|
|
|
```
|
|
|
Note that `<src_lang>` must be specified for T2TT
|
|
|
-
|
|
|
-
|
|
|
-
|
|
|
-# Inference using the CLI, from the root directory of the repository:
|
|
|
-
|
|
|
-The model can be specified with e.g., `--model_name multitask_unity_large`:
|
|
|
-
|
|
|
-S2ST:
|
|
|
-```
|
|
|
-python scripts/m4t/predict/predict.py <path_to_input_audio> s2st <tgt_lang> --output_path <path_to_save_audio> --model_name multitask_unity_large
|
|
|
-```
|
|
|
-
|
|
|
-S2TT:
|
|
|
-```
|
|
|
-python scripts/m4t/predict/predict.py <path_to_input_audio> s2tt <tgt_lang>
|
|
|
-```
|
|
|
-
|
|
|
-T2TT:
|
|
|
-```
|
|
|
-python scripts/m4t/predict/predict.py <input_text> t2tt <tgt_lang> --src_lang <src_lang>
|
|
|
-```
|
|
|
-
|
|
|
-T2ST:
|
|
|
-```
|
|
|
-python scripts/m4t/predict/predict.py <input_text> t2st <tgt_lang> --src_lang <src_lang> --output_path <path_to_save_audio>
|
|
|
-```
|
|
|
-
|
|
|
-ASR:
|
|
|
-```
|
|
|
-python scripts/m4t/predict/predict.py <path_to_input_audio> asr <tgt_lang>
|
|
|
-```
|