Преглед на файлове

aligner readme added with examples (#168)

* aligner readme added with examples

* Update docs/m4t/unity2_aligner_README.md

* Update docs/m4t/unity2_aligner_README.md
Ilia Kulikov преди 1 година
родител
ревизия
df2816adf3
променени са 3 файла, в които са добавени 76 реда и са изтрити 0 реда
  1. BIN
      docs/m4t/en_alignment.png
  2. BIN
      docs/m4t/ru_alignment.png
  3. 76 0
      docs/m4t/unity2_aligner_README.md

BIN
docs/m4t/en_alignment.png


BIN
docs/m4t/ru_alignment.png


+ 76 - 0
docs/m4t/unity2_aligner_README.md

@@ -0,0 +1,76 @@
+# UnitY2 forced alignment extractor
+
+Please refer to Section 3.3.2 of the paper to read more details about aligner design & training.
+
+We provide a light-weight wrapper to extract alignments between given text and acoustic unit sequences. Unit extractor is also available from the wrapper itself. 
+
+## Alignment extractor codebase
+
+The entire codebase is located in `/src/seamless_communication/models/aligner`. It is built using fairseq2 library. This time we release a mutlilingual (38 languages following SeamlessM4Tv2 target languages) checkpoint to load the alignment toolkit. This checkpoint corresponds to `nar_t2u_aligner` asset card.
+
+## Usage examples
+
+For large-scale alignment extraction offline unit extraction is preferred. Refer to `/src/seamless_communication/cli/m4t/audio_to_units` for more details on offline unit extraction.
+
+**Alignment extractor initialization:**
+
+```python
+from seamless_communication.models.aligner.alignment_extractor import AlignmentExtractor
+from fairseq2.typing import Device
+import torch
+
+extractor = AlignmentExtractor(
+    aligner_model_name_or_card="nar_t2u_aligner",
+    unit_extractor_model_name_or_card="xlsr2_1b_v2",
+    unit_extractor_output_layer=35,
+    unit_extractor_kmeans_model_uri="https://dl.fbaipublicfiles.com/seamlessM4T/models/unit_extraction/kmeans_10k.npy",
+)
+```
+* large unit extractor checkpoint will be downloaded, this takes time
+
+* by default cpu device is used, but fp16 (`dtype=torch.float16`) & cuda (`device=Device("cuda")`) are supported, see class constructor for details
+
+
+
+
+**Extracting alignment**
+
+Ru audio example:
+
+* audio link: `https://models.silero.ai/denoise_models/sample0.wav` (thanks Silero team for public audio samples)
+
+* ru_transcription: `первое что меня поразило это необыкновенно яркий солнечный свет похожий на электросварку`
+
+```python
+alignment_durations, _, tokenized_text_tokens = extractor.extract_alignment("sample0.wav", ru_transcription, plot=True, add_trailing_silence=True)
+```
+* audio will be resampled to 16kHz for unit extraction
+
+* `alignment_durations` contains number of units (20ms frames) aligned per each token from `tokenized_text_tokens`.
+
+* `add_trailing_silence` sets extra silence token in the end of the given text sequence. That is useful when there is no terminal punctuation provided in the text itself.
+
+Ru alignment plot:
+![Ru alignment pic](ru_alignment.png)
+
+En audio example: 
+
+* audio link: `https://dl.fbaipublicfiles.com/seamlessM4T/LJ037-0171_sr16k.wav`
+
+* en_transcription: `the examination and testimony of the experts enabled the commision to conclude that five shots may have been fired.`
+
+```python
+alignment_durations, _, tokenized_text_tokens = extractor.extract_alignment("LJ037-0171_sr16k.wav", ru_transcription, plot=True, add_trailing_silence=False)
+```
+* here we set `add_trailing_silence` to False since terminal punctuation exists, but True will also work
+
+En alignment plot:
+![En alignment pic](en_alignment.png)
+
+## Integration test
+
+If you encounter issues with produced alignments, please run integration test with the alignment extraction toolkit to make sure that your environment works good.
+
+Run from the repo root:
+
+`pytest -vv tests/integration/models/test_unity2_aligner.py`