1 年之前 · 82f9432bdb
--- a/src/seamless_communication/cli/m4t/finetune/README.md
+++ b/src/seamless_communication/cli/m4t/finetune/README.md
@@ -8,7 +8,7 @@ The trainer and dataloader were designed mainly for demonstration purposes. Thei
 
				 
			
 
				 M4T training dataset is a multimodal parallel corpus. Each training sample has four parts: audio and text representation of the sample in the source language, and its corresponding audio and text representation in the target language.
			
 
				 
			
 
				-That kind of dataset can be prepared using `dataset.py` script that downloads FLEURS dataset from [HuggingFace datastes hub](https://huggingface.co/datasets/google/fleurs), (optionally) extracts units from the target audio samples, and prepares a manifest consumable by `finetune.py`. Manifest is a text file where each line represents information about a single dataset sample, serialized in JSON format.
			
 
				+That kind of dataset can be prepared using `dataset.py` script that downloads FLEURS dataset from [HuggingFace datasets hub](https://huggingface.co/datasets/google/fleurs), (optionally) extracts units from the target audio samples, and prepares a manifest consumable by `finetune.py`. Manifest is a text file where each line represents information about a single dataset sample, serialized in JSON format.
			
 
				 
			
 
				 List of input arguments for `dataset.py`:
			
 
				 
			
@@ -18,7 +18,7 @@ List of input arguments for `dataset.py`:
 
				   --target_lang TARGET_LANG
			
 
				                         M4T langcode of the dataset TARGET language
			
 
				   --split SPLIT         Dataset split/shard to download (`train`, `test`)
			
 
				-  --save_dir SAVE_DIR   Directory where the datastets will be stored with HuggingFace datasets cache files
			
 
				+  --save_dir SAVE_DIR   Directory where the datasets will be stored with HuggingFace datasets cache files
			
 
				 ```
			
 
				 
			
 
				 Language codes should follow the notation adopted by M4T models.