Browse Source

Updating audio_to_units README to reflect links to XLSR paper.

Kaushik Ram Sadagopan 2 năm trước cách đây
mục cha
commit
89c993cca0

+ 2 - 2
scripts/m4t/audio_to_units/README.md

@@ -3,8 +3,8 @@
 Raw audio needs to be converted to units to train UnitY models and vocoders. Units act as supervision for UnitY models, and are the input to the vocoders which synthesize speech from these units.
 Raw audio needs to be converted to units to train UnitY models and vocoders. Units act as supervision for UnitY models, and are the input to the vocoders which synthesize speech from these units.
 
 
 The unit extraction pipeline comprises the following steps:
 The unit extraction pipeline comprises the following steps:
-- Compute features from layer 35 (determined empirically) of the pretrained XLSR v2 model, which is a wav2vec2 model at the core.
-- Assign features for each timestep to a collection of precomputed K-Means centroids to produce a sequence of units.
+- Compute features from layer 35 (determined empirically) of the pretrained XLSR v2 model ([paper](https://arxiv.org/abs/2111.09296)), which is a wav2vec2 model at the core.
+- Assign features for each timestep to a collection of precomputed K-Means centroids to produce a sequence of units similar to extracting Hubert units as described in this [paper](https://arxiv.org/pdf/2107.05604.pdf).
 
 
 
 
 ## Quick start:
 ## Quick start: