2 年前 · cd34d582a0
--- a/README.md
+++ b/README.md
@@ -3,8 +3,8 @@
 
				 SeamlessM4T is designed to provide high quality translation, allowing people from different linguistic communities to communicate effortlessly through speech and text.
			
 
				 
			
 
				 SeamlessM4T covers:
			
 
				-- 📥 101 languages for speech input
			
 
				-- ⌨️ 96 Languages for text input/output
			
 
				+- 📥 101 languages for speech input.
			
 
				+- ⌨️ 96 Languages for text input/output.
			
 
				 - 🗣️ 35 languages for speech output.
			
 
				 
			
 
				 This unified model enables multiple tasks without relying on multiple separate models:
			
@@ -50,12 +50,12 @@ Seamless Communication depends on 3 libraries developed by Meta.
 
				 fairseq2 is our next-generation open-source library of sequence modeling components that provides researchers and developers with building blocks for machine translation, language modeling, and other sequence generation tasks. All SeamlessM4T models in this repository are powered by fairseq2.
			
 
				 
			
 
				 ## [SONAR and BLASER 2.0](https://github.com/facebookresearch/SONAR)
			
 
				-SONAR, Sentence-level multimOdal and laNguage-Agnostic Representations is a new multilingual and -modal sentence embedding space which outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multilingual similarity search tasks. SONAR provides text and speech encoders for many languages. SeamlessAlign was mined based on SONAR embeddings.
			
 
				+SONAR, Sentence-level multimodal and language-Agnostic Representations is a new multi-lingual and modal sentence embedding space which outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multi-lingual similarity search tasks. SONAR provides text and speech encoders for many languages. SeamlessAlign was mined based on SONAR embeddings.
			
 
				 
			
 
				 BLASER 2.0 is our latest model-based evaluation metric for multimodal translation. It is an extension of BLASER, supporting both speech and text. It operates directly on the source signal, and as such, does not require any intermediate ASR sytem like ASR-BLEU. As in the first version, BLASER 2.0 leverages the similarity between input and output sentence embeddings. SONAR is the underlying embedding space for BLASER 2.0. Scripts to run evaluation with BLASER 2.0 can be found in the [SONAR repo](https://github.com/facebookresearch/SONAR).
			
 
				 
			
 
				 ## [stopes](https://github.com/facebookresearch/stopes)
			
 
				-As part of the seamless communication project, we've extended the stopes library. Version 1 provided a text-text mining tool to build training dataset for translation models. Version 2 has been extended thanks to SONAR to support tasks around training large speech translation models. In particular, we provide tools to read/write the fairseq audiozip datasets and a new mining pipeline that can do speech-speech, text-speech, speech-text and text-text mining, all based on the new SONAR embedding space.
			
 
				+As part of the seamless communication project, we've extended the stopes library. Version 1 provided a text-to-text mining tool to build training dataset for translation models. Version 2 has been extended thanks to SONAR, to support tasks around training large speech translation models. In particular, we provide tools to read/write the fairseq audiozip datasets and a new mining pipeline that can do speech-to-speech, text-to-speech, speech-to-text and text-to-text mining, all based on the new SONAR embedding space.
			
 
				 
			
 
				 
			
 
				 # Resources and usage