浏览代码

added SONAR and SeamlessAlign to readme

alexmourachko 2 年之前
父节点
当前提交
3694542b79
共有 1 个文件被更改,包括 10 次插入8 次删除
  1. 10 8
      README.md

+ 10 - 8
README.md

@@ -50,23 +50,25 @@ Seamless Communication depends on 3 libraries developed by Meta.
 ## [fairseq2](https://github.com/facebookresearch/fairseq2)
 fairseq2 is our next-generation open-source library of sequence modeling components that provides researchers and developers with building blocks for machine translation, language modeling, and other sequence generation tasks. All SeamlessM4T models in this repository are powered by fairseq2.
 
+## [SONAR]()
+SONAR, Sentence-level multimOdal and laNguage-Agnostic Representations is a new multilingual and -modal sentence embedding space which outperforms existing sentence embeddings such as LASER3 and LabSE on the xsim and xsim++ multilingual similarity search tasks. SONAR provides [text and speech encoders for many languages](https://github.com/facebookresearch/SONAR). SeamlessAlign was mined based on SONAR embeddings.
+
 ## [stopes](https://github.com/facebookresearch/stopes)
 As part of the seamless communication project, we've extended the stopes library. Version 1 provided a text-text mining tool to build training dataset for translation models. Version 2 has been extended thanks to SONAR to support tasks around training large speech translation models. In particular, we provide tools to read/write the fairseq audiozip datasets and a new mining pipeline that can do speech-speech, text-speech, speech-text and text-text mining, all based on the new SONAR embedding space.
 
 ## [BLASER 2.0](https://github.com/facebookresearch/SONAR)
-BLASER 2.0 is our latest model-based evaluation metric for multimodal translation. It is an extension of BLASER, supporting both speech and text. It operates directly on the source signal, and as such, does not require any intermediate ASR sytem like ASR-BLEU. As in the first version, BLASER 2.0 leverages the similarity between input and output sentence embeddings. SONAR is the underlying embedding space for BLASER 2.0. Scripts to run evaluation with BLASER 2.0 can be found in the [SONAR repo](https://github.com/facebookresearch/SONAR)
+BLASER 2.0 is our latest model-based evaluation metric for multimodal translation. It is an extension of BLASER, supporting both speech and text. It operates directly on the source signal, and as such, does not require any intermediate ASR sytem like ASR-BLEU. As in the first version, BLASER 2.0 leverages the similarity between input and output sentence embeddings. SONAR is the underlying embedding space for BLASER 2.0. Scripts to run evaluation with BLASER 2.0 can be found in the [SONAR repo](https://github.com/facebookresearch/SONAR).
 
 
 # Resources and usage
 ## SeamlessM4T models
-| Model Name | #params | checkpoint |  metrics |
-| - | - | - | - |
-| SeamlessM4T-Large | 2.3B |[model](https://dl.fbaipublicfiles.com/seamlessM4T/models/large/seamlessM4T_large.pt) | [metrics](https://dl.fbaipublicfiles.com/seamlessM4T/metrics/seamlessM4T_large.zip) |
-| SeamlessM4T-Medium | 1.2B |[model](https://dl.fbaipublicfiles.com/seamlessM4T/models/medium/seamlessM4T_medium.pt) | [metrics](https://dl.fbaipublicfiles.com/seamlessM4T/metrics/seamlessM4T_medium.zip) |
+| Model Name         | #params | checkpoint                                                                              | metrics                                                                              |
+| ------------------ | ------- | --------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------ |
+| SeamlessM4T-Large  | 2.3B    | [model](https://dl.fbaipublicfiles.com/seamlessM4T/models/large/seamlessM4T_large.pt)   | [metrics](https://dl.fbaipublicfiles.com/seamlessM4T/metrics/seamlessM4T_large.zip)  |
+| SeamlessM4T-Medium | 1.2B    | [model](https://dl.fbaipublicfiles.com/seamlessM4T/models/medium/seamlessM4T_medium.pt) | [metrics](https://dl.fbaipublicfiles.com/seamlessM4T/metrics/seamlessM4T_medium.zip) |
 
 We provide the extensive evaluation results of seamlessM4T-Large and SeamlessM4T-Medium reported in the paper (as averages) in the `metrics` files above.
 
-
 ## Evaluating SeamlessM4T models
 To reproduce our results, or to evaluate using the same metrics over your own test sets, please check out [README here](https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/eval_README.md).
 
@@ -77,8 +79,8 @@ TODO
 ## On-device models
 Apart from Seamless-M4T large (2.3B) and medium (1.2B) models, we are also releasing a small model (281M) targeted for on-device inference. To learn more about the usage and model details check out [README here](https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/on_device_README.md)
 
-## Data
-We open-source metadata for reconstructing the dataset we used for training our models. You can find the data and how to rebuild the dataset [here](https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/data_README.md).
+## SeamlessAlign mined dataset
+We open-source the metadata to SeamlessAlign, the largest open dataset for multimodal translation, totaling 270k+ hours of aligned Speech and Text data. The dataset can be rebuilt by the community based on the [SeamlessAlign readme](https://github.com/facebookresearch/seamless_communication/blob/main/docs/m4t/seamless_align_README.md).
 
 # Citation
 If you use SeamlessM4T in your work or any models/datasets/artifacts published in SeamlessM4T, please cite :