12345678910111213141516171819202122232425262728293031323334353637383940414243444546474849505152535455565758596061626364656667686970717273747576777879808182838485868788899091929394959697989910010110210310410510610710810911011111211311411511611711811912012112212312412512612712812913013113213313413513613713813914014114214314414514614714814915015115215315415515615715815916016116216316416516616716816917017117217317417517617717817918018118218318418518618718818919019119219319419519619719819920020120220320420520620720820921021121221321421521621721821922022122222322422522622722822923023123223323423523623723823924024124224324424524624724824925025125225325425525625725825926026126226326426526626726826927027127227327427527627727827928028128228328428528628728828929029129229329429529629729829930030130230330430530630730830931031131231331431531631731831932032132232332432532632732832933033133233333433533633733833934034134234334434534634734834935035135235335435535635735835936036136236336436536636736836937037137237337437537637737837938038138238338438538638738838939039139239339439539639739839940040140240340440540640740840941041141241341441541641741841942042142242342442542642742842943043143243343443543643743843944044144244344444544644744844945045145245345445545645745845946046146246346446546646746846947047147247347447547647747847948048148248348448548648748848949049149249349449549649749849950050150250350450550650750850951051151251351451551651751851952052152252352452552652752852953053153253353453553653753853954054154254354454554654754854955055155255355455555655755855956056156256356456556656756856957057157257357457557657757857958058158258358458558658758858959059159259359459559659759859960060160260360460560660760860961061161261361461561661761861962062162262362462562662762862963063163263363463563663763863964064164264364464564664764864965065165265365465565665765865966066166266366466566666766866967067167267367467567667767867968068168268368468568668768868969069169269369469569669769869970070170270370470570670770870971071171271371471571671771871972072172272372472572672772872973073173273373473573673773873974074174274374474574674774874975075175275375475575675775875976076176276376476576676776876977077177277377477577677777877978078178278378478578678778878979079179279379479579679779879980080180280380480580680780880981081181281381481581681781881982082182282382482582682782882983083183283383483583683783883984084184284384484584684784884985085185285385485585685785885986086186286386486586686786886987087187287387487587687787887988088188288388488588688788888989089189289389489589689789889990090190290390490590690790890991091191291391491591691791891992092192292392492592692792892993093193293393493593693793893994094194294394494594694794894995095195295395495595695795895996096196296396496596696796896997097197297397497597697797897998098198298398498598698798898999099199299399499599699799899910001001100210031004100510061007100810091010101110121013101410151016101710181019102010211022102310241025102610271028102910301031103210331034103510361037103810391040104110421043104410451046104710481049105010511052105310541055105610571058105910601061106210631064106510661067106810691070107110721073107410751076107710781079108010811082108310841085108610871088108910901091109210931094109510961097109810991100110111021103110411051106110711081109111011111112111311141115111611171118111911201121112211231124112511261127112811291130113111321133113411351136113711381139114011411142114311441145114611471148114911501151115211531154115511561157115811591160116111621163116411651166116711681169117011711172117311741175117611771178117911801181118211831184118511861187118811891190119111921193119411951196119711981199120012011202120312041205120612071208120912101211121212131214121512161217121812191220122112221223122412251226122712281229123012311232123312341235123612371238123912401241124212431244124512461247124812491250125112521253125412551256125712581259126012611262126312641265126612671268126912701271127212731274127512761277127812791280128112821283128412851286128712881289129012911292129312941295129612971298129913001301130213031304130513061307130813091310131113121313131413151316131713181319132013211322132313241325132613271328132913301331133213331334133513361337133813391340134113421343134413451346134713481349135013511352135313541355135613571358135913601361 |
- <!DOCTYPE html>
- <html>
- <head>
- <meta charset="UTF-8">
- <title>Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data Augmentation</title>
- <link rel="stylesheet" type="text/css" href="styles.css">
- <script src="jquery-3.5.js"></script>
- </head>
- <body>
- <div class="container">
- <div id="text1">Enhanced Direct Speech-to-Speech Translation Using Self-supervised Pre-training and Data
- Augmentation</div>
- <div id="intro">
- <br>
- <p>
- Sravya Popuri<sup>☆</sup>, Peng-Jen Chen<sup>☆</sup>, Changhan
- Wang, Juan Pino, Yossi Adi,
- Jiatao Gu, Wei-Ning Hsu<sup>†</sup>, Ann Lee<sup>†</sup> <br>
- <font size="-1">(☆ = Equal contribution and † = Equal supervision)</font>
- </p>
- </p>
- <p>
- [<a href="https://arxiv.org/abs/2204.02967">paper</a>]
- </p>
- </div>
- </div>
- <div class="content-container">
- <p>
- We explore self-supervised pre-training with unlabeled speech data and data augmentation to improve direct
- speech-to-speech model training. We take advantage of a recently proposed speech-to-unit translation (S2UT)
- framework that encodes
- target
- speech into discrete representations, and study both speech encoder and discrete unit decoder pre-training
- as well as
- efficient partial finetuning methods. We conduct experiments under various data setups and show that
- self-supervised
- pre-training consistently improves model performance compared with multitask learning and is complementary
- to data
- augmentation techniques that apply ASR and MT models to create weakly supervised training data.
- </p>
- <ul>
- <li><a style="color:rgb(90, 4, 83)" href="#ES-EN Comparison with Baselines">Spanish To English</a></li>
- <ul>
- <li><a style="color:rgb(90, 4, 83)" href="#ES-EN Comparison with Baselines">Comparison with
- Baselines</a></li>
- <li><a style="color:rgb(90, 4, 83)" href="#ES-EN Different Data Setups">Different Data Setups</a></li>
- </ul>
- <li><a style="color:rgb(90, 4, 83)" href="#EN-ES Comparison with Baselines">English To Spanish</a></li>
- <ul>
- <li><a style="color:rgb(90, 4, 83)" href="#EN-ES Comparison with Baselines">Comparison with
- Baselines</a></li>
- <li><a style="color:rgb(90, 4, 83)" href="#EN-ES Different Data Setups">Different Data Setups</a></li>
- </ul>
- </ul>
- </div>
- <link rel="stylesheet" href="https://maxcdn.bootstrapcdn.com/font-awesome/4.3.0/css/font-awesome.min.css">
- <div id="ES-EN Comparison with Baselines" class="content-container">
- <script src="wavesurfer.js"></script>
- <div class="content-title">
- <font size="+5">Spanish To English</font>
- </div>
- <div class="content-subtitle">Comparison with Baselines
- </div>
- <p> We provide ground truth source and target audios with the corresponding reference text,
- as well as audio samples from three systems: <br>
- (1) <strong>S2UT+LNA-D</strong>: the proposed direct speeech-to-unit translation
- system initialized with wav2vec 2.0 encoder, unit mBART decoder and finetuned using LNA-D strategy<br>
- (2) <strong>Supervised S2UT</strong>: a baseline direct speech-to-unit translation system trained with
- source and target text as auxiliary task targets.
- <br>
- (3) <strong>S2T+TTS:</strong> a baseline cascaded system with a speech-to-text translation model initialized
- with wav2vec 2.0 encoder and a randomly initialized decoder, followed by a text-to-speech synthesis model. <br>
- Both (1) and (2) use an open sourced HiFi-GAN vocoder to convert units to waveforms.
- </p>
- <table border="0" class="inlineTable">
- <tr>
- <th></th>
- <th colspan="2">Ground truth</th>
- <th colspan="3">Predictions</th>
- </tr>
- <tr>
- <th></th>
- <th>Source (Spanish)</th>
- <th>Target (English)</th>
- <th>S2UT+LNA-D</th>
- <th>Supervised S2UT</th>
- <th>S2T+TTS</th>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 1: S2UT+LNAD performs best</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src_waveform_1"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var src_1 = WaveSurfer.create({ container: '#src_waveform_1', waveColor: 'violet', progressColor: 'purple' });
- src_1.load('./audios/es-en/set1/source/11375_cv.wav'); </script>
- </th>
- <th>
- <div id="target_waveform_1"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var tgt_1 = WaveSurfer.create({ container: '#target_waveform_1', waveColor: 'violet', progressColor: 'purple' });
- tgt_1.load('./audios/es-en/set1/target/11375_cv.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform_1"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad_1 = WaveSurfer.create({ container: '#s2ut_waveform_1', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad_1.load('./audios/es-en/set1/s2ut_lnd/11375_cv.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform_1"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_mt_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_mt_1 = WaveSurfer.create({ container: '#translatotron_waveform_1', waveColor: 'violet', progressColor: 'purple' });
- s2ut_mt_1.load('./audios/es-en/set1/s2ut_mt/11375_cv.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform_1"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="cas_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var cas_1 = WaveSurfer.create({ container: '#s2ttts_waveform_1', waveColor: 'violet', progressColor: 'purple' });
- cas_1.load('./audios/es-en/set1/s2t_tts/11375_cv.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>autobuses adicionales normalmente proporcionados por go south coast
- van desde bristol al festival
- </td>
- <td STYLE="text-transform:lowercase"> ADDITIONAL BUSES USUALLY PROVIDED BY GO SOUTH COAST GO FROM
- BRISTOL TO THE FESTIVAL</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td STYLE="text-transform:lowercase">ADDITIONAL BUSES USUALLY PROVIDED BY GO SOUTH COAST GO FROM BRISTOL
- TO THE FESTIVAL </td>
- <td STYLE="text-transform:lowercase">ADDITIONAL UP TO BORSES NORMALLY PROVIDED BY COAST SO CAST BANDS OF
- BRISTOL ALL FESTIVAL</td>
- <td STYLE="text-transform:lowercase">ADDITIONAL BUSES USUALLY PROVIDED BY GO SOUTH COAST GO FROM BRUCE
- TO THE FESTIVAL</td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 2: S2UT+LNAD performs best</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src_waveform_2"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var src_2 = WaveSurfer.create({ container: '#src_waveform_2', waveColor: 'violet', progressColor: 'purple' });
- src_2.load('./audios/es-en/set1/source/2692_cv.wav'); </script>
- </th>
- <th>
- <div id="target_waveform_2"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var tgt_2 = WaveSurfer.create({ container: '#target_waveform_2', waveColor: 'violet', progressColor: 'purple' });
- tgt_2.load('./audios/es-en/set1/target/2692_cv.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform_2"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad_2 = WaveSurfer.create({ container: '#s2ut_waveform_2', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad_2.load('./audios/es-en/set1/s2ut_lnd/2692_cv.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform_2"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_mt_2.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_mt_2 = WaveSurfer.create({ container: '#translatotron_waveform_2', waveColor: 'violet', progressColor: 'purple' });
- s2ut_mt_2.load('./audios/es-en/set1/s2ut_mt/2692_cv.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform_2"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="cas_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var cas_2 = WaveSurfer.create({ container: '#s2ttts_waveform_2', waveColor: 'violet', progressColor: 'purple' });
- cas_2.load('./audios/es-en/set1/s2t_tts/2692_cv.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>encontró un país con dos gobiernos en la capital maximiliano era el
- emperador </td>
- <td STYLE="text-transform:lowercase">HE FOUND A COUNTRY WITH TWO GOVERNMENTS IN THE CAPITAL MAXIMILIAN
- WAS THE EMPEROR </td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td STYLE="text-transform:lowercase"> HE FOUND A COUNTRY WITH TWO GOVERNMENTS IN THE CAPITAL MAXIMILIAN
- WAS THE EMPEROR</td>
- <td STYLE="text-transform:lowercase"> HE FOUND A COUNTRY WITH TWO GOVERNMENTS IN THE CAPITAL THE MOST
- SIMILIAN CAPITAL WAS THE EMPEROR
- </td>
- <td STYLE="text-transform:lowercase">HE FOUND A COUNTRY WITH TWO GOVERNMENTS AND THE CAPITAL MAXIMILIAN
- WAS AN EMPEROR</td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 3: S2T+TTS performs best
- </th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src_waveform_3"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var src_3 = WaveSurfer.create({ container: '#src_waveform_3', waveColor: 'violet', progressColor: 'purple' });
- src_3.load('./audios/es-en/set1/source/1507_epst.wav'); </script>
- </th>
- <th>
- <div id="target_waveform_3"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var tgt_3 = WaveSurfer.create({ container: '#target_waveform_3', waveColor: 'violet', progressColor: 'purple' });
- tgt_3.load('./audios/es-en/set1/target/1507_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform_3"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad_3 = WaveSurfer.create({ container: '#s2ut_waveform_3', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad_3.load('./audios/es-en/set1/s2ut_lnd/1507_epst.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform_3"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_mt_3.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_mt_3 = WaveSurfer.create({ container: '#translatotron_waveform_3', waveColor: 'violet', progressColor: 'purple' });
- s2ut_mt_3.load('./audios/es-en/set1/s2ut_mt/1507_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform_3"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="cas_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var cas_3 = WaveSurfer.create({ container: '#s2ttts_waveform_3', waveColor: 'violet', progressColor: 'purple' });
- cas_3.load('./audios/es-en/set1/s2t_tts/1507_epst.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td> otro aspecto más institucional es el equilibrio de fuerzas entre
- el parlamento y el consejo</td>
- <td STYLE="text-transform:lowercase"> ANOTHER MORE INSTITUTIONAL ASPECT IS THE BALANCE OF POWER BETWEEN
- PARLIAMENT AND THE COUNCIL</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td STYLE="text-transform:lowercase">ANOTHER MORE INSTITUTIONAL ASPECT IS THE BALANCE OF FORCES BETWEEN
- PARLIAMENT AND THE COUNCIL</td>
- <td STYLE="text-transform:lowercase"> ANOTHER MORE INSTITUTIONAL ASPECT IS THE BALANCE OF FORCES BETWEEN
- PARLIAMENT AND THE COUNCIL</td>
- <td STYLE="text-transform:lowercase"> ANOTHER MORE INSTITUTIONAL ASPECT IS THE BALANCE OF POWER BETWEEN
- PARLIAMENT AND THE COUNCIL</td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 4: All systems make errors</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src_waveform_4"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src_4.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var src_4 = WaveSurfer.create({ container: '#src_waveform_4', waveColor: 'violet', progressColor: 'purple' });
- src_4.load('./audios/es-en/set1/source/1700_epst.wav'); </script>
- </th>
- <th>
- <div id="target_waveform_4"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt_4.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var tgt_4 = WaveSurfer.create({ container: '#target_waveform_4', waveColor: 'violet', progressColor: 'purple' });
- tgt_4.load('./audios/es-en/set1/target/1700_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform_4"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad_4.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad_4 = WaveSurfer.create({ container: '#s2ut_waveform_4', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad_4.load('./audios/es-en/set1/s2ut_lnd/1700_epst.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform_4"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_mt_4.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_mt_4 = WaveSurfer.create({ container: '#translatotron_waveform_4', waveColor: 'violet', progressColor: 'purple' });
- s2ut_mt_4.load('./audios/es-en/set1/s2ut_mt/1700_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform_4"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="cas_4.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var cas_4 = WaveSurfer.create({ container: '#s2ttts_waveform_4', waveColor: 'violet', progressColor: 'purple' });
- cas_4.load('./audios/es-en/set1/s2t_tts/1700_epst.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>además su capacidad de regeneración es muy limitada </td>
- <td STYLE="text-transform:lowercase"> MOREOVER THEIR CAPACITY FOR REGENERATION IS VERY LIMITED</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td STYLE="text-transform:lowercase"> MOREOVER ITS CAPACITY FOR REGENERATION IS VERY LIMITED</td>
- <td STYLE="text-transform:lowercase"> IN ADDITION HIS REGENERATION CAPACITY IS VERY LIMITED</td>
- <td STYLE="text-transform:lowercase"> MOREOVER ITS RECOVERY IS VERY LIMITED</td>
- </tr>
- </table>
- </div>
- <div id="ES-EN Different Data Setups" class="content-container">
- <script src="wavesurfer.js"></script>
- <div class="content-title">
- <font size="+5">Spanish To English</font>
- </div>
- <div class="content-subtitle">Different Data Setups
- </div>
- <p> We provide ground truth source and target audios with the corresponding reference text,
- as well as audio samples from three systems. All the three models are initialized with wav2vec 2.0 encoder,
- unit
- mBART decoder and finetuned using LNA-D strategy but use different datasets for finetuning: <br>
- (1) <strong>S2UT_Base</strong>: finetuned on the combination of CoVoST2, Europarl-ST, mTEDx datasets.
- <br>
- (2) <strong>S2UT_LR</strong>: finetuned on low resource setup with 50hr of data sampled from the the
- combination of CoVoST2, Europarl-ST, mTEDx datasets
- <br>
- (3) <strong>S2UT_Aug:</strong> finetuned on the the combination of CoVoST2, Europarl-ST, mTEDx datasets
- datasets plus the ASR data. <br>
- All models use an open sourced HiFi-GAN vocoder to convert units to waveforms.
- </p>
- <table border="0" class="inlineTable">
- <tr>
- <th></th>
- <th colspan="2">Ground truth</th>
- <th colspan="3">Predictions</th>
- </tr>
- <tr>
- <th></th>
- <th>Source (Spanish)</th>
- <th>Target (English)</th>
- <th>S2UT_Base</th>
- <th>S2UT_LR</th>
- <th>S2UT_Aug</th>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 1: All systems do well</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src1_waveform1_1"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src1_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src1_1 = WaveSurfer.create({ container: '#src1_waveform1_1', waveColor: 'violet', progressColor: 'purple' });
- src1_1.load('./audios/es-en/set2/source/9756_cv.wav'); </script>
- </th>
- <th>
- <div id="target_waveform1_1"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt1_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt1_1 = WaveSurfer.create({ container: '#target_waveform1_1', waveColor: 'violet', progressColor: 'purple' });
- tgt1_1.load('./audios/es-en/set2/target/9756_cv.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform1_1"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad1_1.playPause()"> <i class="fa fa-play"></i> Play / <i
- class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad1_1 = WaveSurfer.create({ container: '#s2ut_waveform1_1', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad1_1.load('./audios/es-en/set2/s2ut_lnd/9756_cv.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform1_1"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lr50_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_lr50_1 = WaveSurfer.create({ container: '#translatotron_waveform1_1', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lr50_1.load('./audios/es-en/set2/s2ut_lr50/9756_cv.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform1_1"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_asr1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_asr1 = WaveSurfer.create({ container: '#s2ttts_waveform1_1', waveColor: 'violet', progressColor: 'purple' });
- s2ut_asr1.load('./audios/es-en/set2/s2ut_lnd_w_asr/9756_cv.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>cada uno de ellos es un derecho exclusivo sujeto a ciertas
- limitaciones y excepciones
- </td>
- <td STYLE="text-transform:lowercase"> EACH ONE OF THEM IS AN EXCLUSIVE RIGHT SUBJECT TO CERTAIN
- LIMITATIONS AND EXCEPTIONS</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td STYLE="text-transform:lowercase">EACH OF THEM IS AN EXCLUSIVE RIGHT SUBJECT TO CERTAIN LIMITATIONS
- AND EXCEPTIONS</td>
- <td STYLE="text-transform:lowercase">EACH ONE OF THEM IS AN EXCLUSIVE RIGHT SUBJECT TO CERTAIN
- LIMITATIONS AND EXCEPTIONS</td>
- <td STYLE="text-transform:lowercase">EACH OF THEM IS AN EXCLUSIVE RIGHT SUBJECT TO CERTAIN LIMITATIONS
- AND EXCEPTIONS</td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 2: S2UT_LR performs best</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src1_waveform1_2"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src1_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src1_2 = WaveSurfer.create({ container: '#src1_waveform1_2', waveColor: 'violet', progressColor: 'purple' });
- src1_2.load('./audios/es-en/set2/source/12478_cv.flac'); </script>
- </th>
- <th>
- <div id="target_waveform1_2"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt1_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt1_2 = WaveSurfer.create({ container: '#target_waveform1_2', waveColor: 'violet', progressColor: 'purple' });
- tgt1_2.load('./audios/es-en/set2/target/12478_cv.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform1_2"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad1_2.playPause()"> <i class="fa fa-play"></i> Play / <i
- class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad1_2 = WaveSurfer.create({ container: '#s2ut_waveform1_2', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad1_2.load('./audios/es-en/set2/s2ut_lnd/12478_cv.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform1_2"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lr50_2.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_lr50_2 = WaveSurfer.create({ container: '#translatotron_waveform1_2', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lr50_2.load('./audios/es-en/set2/s2ut_lr50/12478_cv.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform1_2"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_asr2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_asr2 = WaveSurfer.create({ container: '#s2ttts_waveform1_2', waveColor: 'violet', progressColor: 'purple' });
- s2ut_asr2.load('./audios/es-en/set2/s2ut_lnd_w_asr/12478_cv.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>esta experiencia representa un paso trascendental en la historia
- espacial del país </td>
- <td STYLE="text-transform:lowercase">THIS EXPERIENCE REPRESENTS A TRANSCENDENTAL STEP IN THE SPATIAL
- HISTORY OF THE COUNTRY</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td STYLE="text-transform:lowercase">THIS EXPERIENCE REPRESENTS A TRANSCENDENT STEP IN THE SPACE HISTORY
- OF THE COUNTRY</td>
- <td STYLE="text-transform:lowercase">THIS EXPERIENCE REPRESENTS A TRANSCENDENTAL STEP IN THE SPATIAL
- HISTORY OF THE COUNTRY</td>
- <td STYLE="text-transform:lowercase">THIS EXPERIENCE REPRESENTS A MOVEMENT STEP IN THE SPACE HISTORY OF
- THE COUNTRY</td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 3: S2UT_Aug performs best</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src1_waveform1_3"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src1_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src1_3 = WaveSurfer.create({ container: '#src1_waveform1_3', waveColor: 'violet', progressColor: 'purple' });
- src1_3.load('./audios/es-en/set2/source/4109_cv.flac'); </script>
- </th>
- <th>
- <div id="target_waveform1_3"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt1_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt1_3 = WaveSurfer.create({ container: '#target_waveform1_3', waveColor: 'violet', progressColor: 'purple' });
- tgt1_3.load('./audios/es-en/set2/target/4109_cv.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform1_3"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad1_3.playPause()"> <i class="fa fa-play"></i> Play / <i
- class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad1_3 = WaveSurfer.create({ container: '#s2ut_waveform1_3', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad1_3.load('./audios/es-en/set2/s2ut_lnd/4109_cv.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform1_3"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lr50_3.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_lr50_3 = WaveSurfer.create({ container: '#translatotron_waveform1_3', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lr50_3.load('./audios/es-en/set2/s2ut_lr50/4109_cv.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform1_3"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_asr3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_asr3 = WaveSurfer.create({ container: '#s2ttts_waveform1_3', waveColor: 'violet', progressColor: 'purple' });
- s2ut_asr3.load('./audios/es-en/set2/s2ut_lnd_w_asr/4109_cv.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td> desde la perspectiva del balance físico químico y biológico está
- en una posición clave</td>
- <td STYLE="text-transform:lowercase"> THE PERSPECTIVE OF PHYSICAL CHEMICAL AND BIOLOGICAL BALANCE IT IS
- IN A KEY POSITION</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td STYLE="text-transform:lowercase">FROM A PHYSICAL CHEMICAL AND BIOLOGICAL BALANCE HE IS IN A KEY
- POSITION</td>
- <td STYLE="text-transform:lowercase">FROM A PHYSICAL PERSPECTIVE OF PHYSICAL CHEMICAL AND BIOLOGICAL
- POSITION</td>
- <td STYLE="text-transform:lowercase">FROM THE PERSPECTIVE OF PHYSICAL CHEMICAL AND BIOLOGICAL BALANCE IT
- IS IN A KEY POSITION</td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 4: S2UT_Aug performs best</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src1_waveform1_4"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src1_4.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src1_4 = WaveSurfer.create({ container: '#src1_waveform1_4', waveColor: 'violet', progressColor: 'purple' });
- src1_4.load('./audios/es-en/set2/source/289_epst.flac'); </script>
- </th>
- <th>
- <div id="target_waveform1_4"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt1_4.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt1_4 = WaveSurfer.create({ container: '#target_waveform1_4', waveColor: 'violet', progressColor: 'purple' });
- tgt1_4.load('./audios/es-en/set2/target/289_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform1_4"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad1_4.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad1_4 = WaveSurfer.create({ container: '#s2ut_waveform1_4', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad1_4.load('./audios/es-en/set2/s2ut_lnd/289_epst.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform1_4"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lr50_4.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_lr50_4 = WaveSurfer.create({ container: '#translatotron_waveform1_4', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lr50_4.load('./audios/es-en/set2/s2ut_lr50/289_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform1_4"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_asr4.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_asr4 = WaveSurfer.create({ container: '#s2ttts_waveform1_4', waveColor: 'violet', progressColor: 'purple' });
- s2ut_asr4.load('./audios/es-en/set2/s2ut_lnd_w_asr/289_epst.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>desde un punto de vista presupuestario no parece adecuada la
- propuesta de financiación procedente de la comisión de
- desarrollo ya que este dinero no existe al</td>
- <td STYLE="text-transform:lowercase"> IN ANY CASE GIVEN THAT THE FINANCING OF THIS NEW COOPERATION
- INSTRUMENT MUST BE COMPATIBLE WITH THE
- TWO
- THOUSAND SEVEN
- TWENTY THIRTEEN FINANCIAL FRAMEWORK IT IS WORTH</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td STYLE="text-transform:lowercase">IN ANY CASE GIVEN THAT THE FUNDING OF THIS NEW CORPORATION
- INSTRUMENT MUST BE COMPATIBLE WITH THE
- TWO THOUSAND SEVEN
- TWENTY THIRTEEN FINANCIAL FRAMEWORK IT IS IMPORTANT</td>
- <td STYLE="text-transform:lowercase">IN ANY CASE SINCE THE FINANCING OF THIS NEW INSTRUMENT OF
- CORPORATION MUST COMPATIBLE WITH THE
- FINANCIAL FRAMEWORK
- FOR TWENTY THIRTEEN</td>
- <td STYLE="text-transform:lowercase">IN ANY CASE GIVEN THAT THE FINANCING OF THIS NEW CORPORATION
- INSTRUMENT MUST BE COMPATIBLE WITH THE
- TWO THOUSAND
- SEVEN TWENTY THIRTEEN FINANCIAL FRAMEWORK</td>
- </tr>
- </table>
- </div>
- <div id="EN-ES Comparison with Baselines" class="content-container">
- <script src="wavesurfer.js"></script>
- <div class="content-title">
- <font size="+5">English to Spanish</font>
- </div>
- <div class="content-subtitle">Comparison with Baselines
- </div>
- <p> We provide ground truth source and target audios with the corresponding reference text,
- as well as audio samples from three systems: <br>
- (1) <strong>S2UT+LNA-D</strong>: the proposed direct speeech-to-unit translation
- system initialized with wav2vec 2.0 encoder, unit mBART decoder and finetuned using LNA-D strategy<br>
- (2) <strong>Supervised S2UT</strong>: a baseline direct speech-to-unit translation system trained with
- source and target text as auxiliary task targets.
- <br>
- (3) <strong>S2T+TTS:</strong> a baseline cascaded system with a speech-to-text translation model initialized
- with wav2vec 2.0 encoder and a randomly initialized decoder, followed by a text-to-speech synthesis model. <br>
- Both (1) and (2) use an open sourced HiFi-GAN vocoder to convert units to waveforms.
- </p>
- <table border="0" class="inlineTable">
- <tr>
- <th></th>
- <th colspan="2">Ground truth</th>
- <th colspan="3">Predictions</th>
- </tr>
- <tr>
- <th></th>
- <th>Source (English)</th>
- <th>Target (Spanish)</th>
- <th>S2UT+LNA-D</th>
- <th>Supervised S2UT</th>
- <th>S2T+TTS</th>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 1: S2UT+LNAD performs the best.</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src2_waveform2_1"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src2_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src2_1 = WaveSurfer.create({ container: '#src2_waveform2_1', waveColor: 'violet', progressColor: 'purple' });
- src2_1.load('./audios/en-es/set1/source/1149_epst.wav'); </script>
- </th>
- <th>
- <div id="target_waveform2_1"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt2_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt2_1 = WaveSurfer.create({ container: '#target_waveform2_1', waveColor: 'violet', progressColor: 'purple' });
- tgt2_1.load('./audios/en-es/set1/target/1149_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform2_1"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad2_1.playPause()"> <i class="fa fa-play"></i> Play / <i
- class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad2_1 = WaveSurfer.create({ container: '#s2ut_waveform2_1', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad2_1.load('./audios/en-es/set1/s2ut_lnd/1149_epst.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform2_1"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_mt2_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_mt2_1 = WaveSurfer.create({ container: '#translatotron_waveform2_1', waveColor: 'violet', progressColor: 'purple' });
- s2ut_mt2_1.load('./audios/en-es/set1/s2ut_mt/1149_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform2_1"></div>
- <button id="written_cas2caded_header" class="play-button-demo btn btn-primary"
- onclick="cas2_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var cas2_1 = WaveSurfer.create({ container: '#s2ttts_waveform2_1', waveColor: 'violet', progressColor: 'purple' });
- cas2_1.load('./audios/en-es/set1/s2t_tts/1149_epst.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>this should also be an important part of our approach to the twenty
- twelve budget</td>
- <td>esto también debería ser una parte importante de nuestro enfoque
- del
- presupuesto dos mil doce
- </td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td>esto también debería ser una parte importante de nuestro enfoque al
- presupuesto dos mil doce </td>
- <td>también debería ser una parte importante de nuestro enfoque al
- presupuesto dos mil doce</td>
- <td>esto también debería ser una parte importante de nuestro enfoque al
- presupuesto de dos mildos mil
- dos mil doce</td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 2: S2UT+LNAD performs the best.</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src2_waveform2_4"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src2_4.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src2_4 = WaveSurfer.create({ container: '#src2_waveform2_4', waveColor: 'violet', progressColor: 'purple' });
- src2_4.load('./audios/en-es/set1/source/890_epst.wav'); </script>
- </th>
- <th>
- <div id="target_waveform2_4"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt2_4.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt2_4 = WaveSurfer.create({ container: '#target_waveform2_4', waveColor: 'violet', progressColor: 'purple' });
- tgt2_4.load('./audios/en-es/set1/target/890_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform2_4"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad2_4.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad2_4 = WaveSurfer.create({ container: '#s2ut_waveform2_4', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad2_4.load('./audios/en-es/set1/s2ut_lnd/890_epst.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform2_4"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_mt2_4.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_mt2_4 = WaveSurfer.create({ container: '#translatotron_waveform2_4', waveColor: 'violet', progressColor: 'purple' });
- s2ut_mt2_4.load('./audios/en-es/set1/s2ut_mt/890_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform2_4"></div>
- <button id="written_cas2caded_header" class="play-button-demo btn btn-primary"
- onclick="cas2_4.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var cas2_4 = WaveSurfer.create({ container: '#s2ttts_waveform2_4', waveColor: 'violet', progressColor: 'purple' });
- cas2_4.load('./audios/en-es/set1/s2t_tts/890_epst.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>information encourages citizens interest in public matters and
- their participation</td>
- <td>la información fomenta el interés de los ciudadanos por los
- asuntos
- públicos y su participación</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td>la información fomenta el interés de los ciudadanos en asuntos
- públicos y su participación</td>
- <td>la información y el interés de los ciudadanos alientan los
- intereses de las cuestiones públicas y su
- participación
- </td>
- <td>la información alienta el interés de los ciudadanos en asuntos
- públicos y en su participación</td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 3: S2UT+LNAD performs the best.</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src2_waveform2_2"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src2_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src2_2 = WaveSurfer.create({ container: '#src2_waveform2_2', waveColor: 'violet', progressColor: 'purple' });
- src2_2.load('./audios/en-es/set1/source/476_epst.wav'); </script>
- </th>
- <th>
- <div id="target_waveform2_2"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt2_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt2_2 = WaveSurfer.create({ container: '#target_waveform2_2', waveColor: 'violet', progressColor: 'purple' });
- tgt2_2.load('./audios/en-es/set1/target/476_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform2_2"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad2_2.playPause()"> <i class="fa fa-play"></i> Play / <i
- class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad2_2 = WaveSurfer.create({ container: '#s2ut_waveform2_2', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad2_2.load('./audios/en-es/set1/s2ut_lnd/476_epst.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform2_2"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_mt2_2.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_mt2_2 = WaveSurfer.create({ container: '#translatotron_waveform2_2', waveColor: 'violet', progressColor: 'purple' });
- s2ut_mt2_2.load('./audios/en-es/set1/s2ut_mt/476_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform2_2"></div>
- <button id="written_cas2caded_header" class="play-button-demo btn btn-primary"
- onclick="cas2_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var cas2_2 = WaveSurfer.create({ container: '#s2ttts_waveform2_2', waveColor: 'violet', progressColor: 'purple' });
- cas2_2.load('./audios/en-es/set1/s2t_tts/476_epst.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>his family who are my constituents are convinced of his innocence
- </td>
- <td>su familia que son mis electores está convencida de su inocencia
- </td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td></td>
- <td></td>
- <td>su familia que son mis electores está convencida de su inocencia
- </td>
- <td>su familia que son mí circunscripciones están convencidas de estos
- inocentes</td>
- <td>su familia que son mis electores están convencidos de su inocencia
- </td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 4: All systems make errors.</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src2_waveform2_3"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src2_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src2_3 = WaveSurfer.create({ container: '#src2_waveform2_3', waveColor: 'violet', progressColor: 'purple' });
- src2_3.load('./audios/en-es/set1/source/651_epst.wav'); </script>
- </th>
- <th>
- <div id="target_waveform2_3"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt2_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt2_3 = WaveSurfer.create({ container: '#target_waveform2_3', waveColor: 'violet', progressColor: 'purple' });
- tgt2_3.load('./audios/en-es/set1/target/651_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform2_3"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad2_3.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad2_3 = WaveSurfer.create({ container: '#s2ut_waveform2_3', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad2_3.load('./audios/en-es/set1/s2ut_lnd/651_epst.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform2_3"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_mt2_3.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_mt2_3 = WaveSurfer.create({ container: '#translatotron_waveform2_3', waveColor: 'violet', progressColor: 'purple' });
- s2ut_mt2_3.load('./audios/en-es/set1/s2ut_mt/651_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform2_3"></div>
- <button id="written_cas2caded_header" class="play-button-demo btn btn-primary"
- onclick="cas2_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var cas2_3 = WaveSurfer.create({ container: '#s2ttts_waveform2_3', waveColor: 'violet', progressColor: 'purple' });
- cas2_3.load('./audios/en-es/set1/s2t_tts/651_epst.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>of the directive on all taxes including social security
- contributions the automatic exchange of information and improved
- cooperation between the member states in matters of taxation</td>
- <td> de la directiva a todos los impuestos incluidas las
- contribuciones a
- la seguridad social el intercambio automático de
- información y la mejora de la cooperación fiscal entre los estados miembros</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td>de la directiva a todos los impuestos incluidas las contribuciones
- a la seguridad social el
- intercambio automático
- de información y la mejor cooperación entre los estados miembros en las cuestiones de impuestos</td>
- <td>de la directiva a todos los impuestos impluyendo las contribuciones
- de seguridad social el
- intercambio automático de
- la información y mejorar la cooperación entre los estados miembros y las cuestiones de impuestos
- </td>
- <td>de la directiva para todos los impuestos incluidos las
- contribuciones de seguridad social el
- intercambio automático
- de información y la mejor cooperación entre los estados miembros en la cuestión de la fiscalidad
- </td>
- </tr>
- </table>
- </div>
- <div id="EN-ES Different Data Setups" class="content-container">
- <script src="wavesurfer.js"></script>
- <div class="content-title">
- <font size="+5">English To Spanish</font>
- </div>
- <div class="content-subtitle">Different Data Setups
- </div>
- <p> We provide ground truth source and target audios with the corresponding reference text,
- as well as audio samples from three systems. All the three models are initialized with wav2vec 2.0 encoder,
- unit
- mBART decoder and finetuned using LNA-D strategy but use different datasets for finetuning: <br>
- (1) <strong>S2UT_Base</strong>: finetuned on the combination of Europarl-ST, MuST-C datasets.
- <br>
- (2) <strong>S2UT_LR</strong>: finetuned on low resource setup with 50hr of data sampled from the combination
- of Europarl-ST, MuST-C datasets
- <br>
- (3) <strong>S2UT_Aug:</strong> finetuned on the combination of Europarl-ST, MuST-C datasets plus the ASR
- data. <br>
- All models use an open sourced HiFi-GAN vocoder to convert units to waveforms.
- </p>
- <table border="0" class="inlineTable">
- <tr>
- <th></th>
- <th colspan="2">Ground truth</th>
- <th colspan="3">Predictions</th>
- </tr>
- <tr>
- <th></th>
- <th>Source (English)</th>
- <th>Target (Spanish)</th>
- <th>S2UT_Base</th>
- <th>S2UT_LR</th>
- <th>S2UT_Aug</th>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 1: All systems do well.</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src3_waveform3_1"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src3_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src3_1 = WaveSurfer.create({ container: '#src3_waveform3_1', waveColor: 'violet', progressColor: 'purple' });
- src3_1.load('./audios/en-es/set2/source/37_epst.wav'); </script>
- </th>
- <th>
- <div id="target_waveform3_1"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt3_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt3_1 = WaveSurfer.create({ container: '#target_waveform3_1', waveColor: 'violet', progressColor: 'purple' });
- tgt3_1.load('./audios/en-es/set2/target/37_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform3_1"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad3_1.playPause()"> <i class="fa fa-play"></i> Play / <i
- class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad3_1 = WaveSurfer.create({ container: '#s2ut_waveform3_1', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad3_1.load('./audios/en-es/set2/s2ut_lnd/37_epst.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform3_1"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lr50_2_1.playPause()"> <i class="fa fa-play"></i> Play / <i
- class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_lr50_2_1 = WaveSurfer.create({ container: '#translatotron_waveform3_1', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lr50_2_1.load('./audios/en-es/set2/s2ut_lr50/37_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform3_1"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_asr3_1.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_asr3_1 = WaveSurfer.create({ container: '#s2ttts_waveform3_1', waveColor: 'violet', progressColor: 'purple' });
- s2ut_asr3_1.load('./audios/en-es/set2/s2ut_lnd_w_asr/37_epst.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>we want to see energy poverty as a part of this debate</td>
- <td>queremos ver la pobreza energética como parte de este debate
- </td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td>queremos ver la pobreza energética como parte de este deate</td>
- <td>queremos ver la pobreza energética como parte de este date</td>
- <td>queremos ver la pobreza energética como parte de este deate</td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 2: S2UT_LR has errors but S2UT_Base and S2UT_Aug got
- it
- right.</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src3_waveform3_2"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src3_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src3_2 = WaveSurfer.create({ container: '#src3_waveform3_2', waveColor: 'violet', progressColor: 'purple' });
- src3_2.load('./audios/en-es/set2/source/923_epst.wav'); </script>
- </th>
- <th>
- <div id="target_waveform3_2"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt3_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt3_2 = WaveSurfer.create({ container: '#target_waveform3_2', waveColor: 'violet', progressColor: 'purple' });
- tgt3_2.load('./audios/en-es/set2/target/923_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform3_2"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad3_2.playPause()"> <i class="fa fa-play"></i> Play / <i
- class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad3_2 = WaveSurfer.create({ container: '#s2ut_waveform3_2', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad3_2.load('./audios/en-es/set2/s2ut_lnd/923_epst.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform3_2"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lr50_2_2.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_lr50_2_2 = WaveSurfer.create({ container: '#translatotron_waveform3_2', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lr50_2_2.load('./audios/en-es/set2/s2ut_lr50/923_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform3_2"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_asr3_2.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_asr3_2 = WaveSurfer.create({ container: '#s2ttts_waveform3_2', waveColor: 'violet', progressColor: 'purple' });
- s2ut_asr3_2.load('./audios/en-es/set2/s2ut_lnd_w_asr/923_epst.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>in my view one of the most important elements is the follow up of
- legislative initiative requests from parliament</td>
- <td>en mi opinión uno de los elementos más importantes es el
- seguimiento de
- las solicitudes de iniciativa legislativa del
- parlamento</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td>n mi opinión uno de los elementos más importantes es el
- seguimiento de las peticiones de la
- iniciativa legislativa
- por parte del pagamento</td>
- <td>en mi opinión uno de los elementos más importantes es el
- seguimiento de las emiendas de iniciativas
- legislativas de
- ley</td>
- <td>en mi opinión uno de los elementos más importantes es el
- seguimiento de las solicitudes de
- iniciativa legislativa
- del pagamento</td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 3: S2UT_Aug performs the best</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src3_waveform3_3"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src3_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src3_3 = WaveSurfer.create({ container: '#src3_waveform3_3', waveColor: 'violet', progressColor: 'purple' });
- src3_3.load('./audios/en-es/set2/source/970_epst.wav'); </script>
- </th>
- <th>
- <div id="target_waveform3_3"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt3_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt3_3 = WaveSurfer.create({ container: '#target_waveform3_3', waveColor: 'violet', progressColor: 'purple' });
- tgt3_3.load('./audios/en-es/set2/target/970_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform3_3"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad3_3.playPause()"> <i class="fa fa-play"></i> Play / <i
- class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad3_3 = WaveSurfer.create({ container: '#s2ut_waveform3_3', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad3_3.load('./audios/en-es/set2/s2ut_lnd/970_epst.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform3_3"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lr50_2_3.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_lr50_2_3 = WaveSurfer.create({ container: '#translatotron_waveform3_3', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lr50_2_3.load('./audios/en-es/set2/s2ut_lr50/970_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform3_3"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_asr3_3.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_asr3_3 = WaveSurfer.create({ container: '#s2ttts_waveform3_3', waveColor: 'violet', progressColor: 'purple' });
- s2ut_asr3_3.load('./audios/en-es/set2/s2ut_lnd_w_asr/970_epst.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>we must find an open and constructive procedure on the next
- financial framework</td>
- <td> debemos encontrar un procedimiento abierto y constructivo en el
- próximo marco financiero</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td>debemos encontrar un procedimiento abierto y constructivo sobre el
- próximo marco financiero</td>
- <td>debemos encontrar un procedimiento abierto y constructivo en el
- sistema financiero financiero
- financiero financiero
- </td>
- <td>debemos encontrar un procedimiento abierto y constructivo en el
- próximo marco financiero</td>
- </tr>
- <tr>
- <th colspan="6" style="text-align:left">Sample 4: All systems make errors</th>
- </tr>
- <tr>
- <th></th>
- <th>
- <div id="src3_waveform3_5"></div>
- <button id="written_source__header" class="play-button-demo btn btn-primary"
- onclick="src3_5.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var src3_5 = WaveSurfer.create({ container: '#src3_waveform3_5', waveColor: 'violet', progressColor: 'purple' });
- src3_5.load('./audios/en-es/set2/source/651_epst.wav'); </script>
- </th>
- <th>
- <div id="target_waveform3_5"></div>
- <button id="written_target__header" class="play-button-demo btn btn-primary"
- onclick="tgt3_5.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var tgt3_5 = WaveSurfer.create({ container: '#target_waveform3_5', waveColor: 'violet', progressColor: 'purple' });
- tgt3_5.load('./audios/en-es/set2/target/651_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ut_waveform3_5"></div>
- <button id="written_s2ut__header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lnad3_5.playPause()"> <i class="fa fa-play"></i> Play / <i
- class="fa fa-pause"></i>
- Pause </button>
- <script> var s2ut_lnad3_5 = WaveSurfer.create({ container: '#s2ut_waveform3_5', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lnad3_5.load('./audios/en-es/set2/s2ut_lnd/651_epst.wav'); </script>
- </th>
- <th>
- <div id="translatotron_waveform3_5"></div>
- <button id="written_translatotron_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_lr50_2_5.playPause()">
- <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i> Pause
- </button>
- <script> var s2ut_lr50_2_5 = WaveSurfer.create({ container: '#translatotron_waveform3_5', waveColor: 'violet', progressColor: 'purple' });
- s2ut_lr50_2_5.load('./audios/en-es/set2/s2ut_lr50/651_epst.wav'); </script>
- </th>
- <th>
- <div id="s2ttts_waveform3_5"></div>
- <button id="written_cascaded_header" class="play-button-demo btn btn-primary"
- onclick="s2ut_asr3_5.playPause()"> <i class="fa fa-play"></i> Play / <i class="fa fa-pause"></i>
- Pause
- </button>
- <script> var s2ut_asr3_5 = WaveSurfer.create({ container: '#s2ttts_waveform3_5', waveColor: 'violet', progressColor: 'purple' });
- s2ut_asr3_5.load('./audios/en-es/set2/s2ut_lnd_w_asr/651_epst.wav'); </script>
- </th>
- </tr>
- <tr>
- <th> Reference: </th>
- <td>of the directive on all taxes including social security
- contributions the automatic exchange of information and improved
- cooperation between the member states in matters of taxation</td>
- <td>de la directiva a todos los impuestos incluidas las contribuciones
- a la
- seguridad social el intercambio automático de
- información y la mejora de la cooperación fiscal entre los estados miembros</td>
- </tr>
- <tr>
- <th> ASR: </th>
- <td> </td>
- <td> </td>
- <td>de la directiva a todos los impuestos incluidas las contribuciones
- a la seguridad social el
- intercambio automático
- de información y la mejor cooperación entre los estados miembros en las cuestiones de impuestos</td>
- <td>la directiva sobre el impuesto de todos los contribuyentes
- inpluyendo las contribuciones sociales la
- introducción
- automática y mejorada de los estados miembros y mejorar la cooperación entre los estados miembros
- </td>
- <td>de la directiva a todos los impuestos incluidas las contribuciones
- a la seguridad social el
- intercambio automático
- de información y la mejor cooperación entre los estados miembros en materia de impuestos</td>
- </tr>
- </table>
- </div>
- <div class="content-container">
- Template based on <a style="color:rgb(22, 38, 67)" href="https://speechbot.github.io/"> Textless NLP</a> and <a
- style="color:rgb(22, 38, 67)" href="https://daps.cs.princeton.edu/projects/HiFi-GAN/index.php"> HiFi-GAN</a>
- pages.
- </div>
- </body>
- </html>
|