Softwarová podpora přepisu přednášek z videozáznamů

Rameš, Jan

Softwarová podpora přepisu přednášek z videozáznamů

Title Alternative:Software support for transcription of lectures from video recordings

Files

mgr_18318.pdf(816.96 KB)

Date

2011

Authors

Rameš, Jan

Publisher

Technická Univerzita v Liberci

Abstract

Tato práce se zabývá automatickým přepisem spontánní řeči především pro oblast přepisu přednášek s možností jejich oprav, nutnosti adaptace slovníků (gramatických modelů) a odlišností od diktovacích systémů. Součástí práce je také ukázka přístupu, jak nakládat s přepsanými texty s využitím webových technologií s důrazem na možnost spolupráce více osob při opravě přepisů. Výsledná aplikace je realizována jako server-klient, kdy klientská část využívá HTML a JavaScriptu společně s přehrávačem Flash k realizaci kompletního uživatelského rozhraní pro opravu a kontrolu přepisu, zobrazení výsledných titulků uvnitř videa pro snadné přehrávání. Dále nastiňuje způsob využití indexace pro nalezení relevantního obsahu v zaznamenaných přednáškách. V první části je rozebrána problematika rozpoznávání řeči používaná v přepisovači a dalších použitých nástrojích. Druhá část obsahuje popis konkrétních metod, knihoven a frameworků, které byly při tvorbě aplikace použity. V závěru této části jsou popsány podobné technologie, které se přepisu spontánní řeči také věnují. V dalších kapitolách je nejprve nastíněn přístup řešení a následně jsou podrobně popsány přístupy k řešení daného problému s důrazem na jejich využití na webu. Závěr této části je věnován způsobům vyhodnocení úspěšnosti přepisů využívajícím různá měřítka pro vhodné zhodnocení úsilí, jaké musí být vynaloženo k opravě přepsaného textu. Závěr práce je věnován vyhodnocení výsledků pomocí metod popsaných v části předchozí. Je zde také nastíněna řada oblastí, ve kterých je možné v řešení tohoto problému pokračovat, především je pak zdůrazněna nutnost adaptace slovníků pro daný obor přednášky.
This work describes use of automatic transcription of spontaneous speech specifically for transcription of lectures. It also describes means for correction of such transcriptions together with the necessity of language model adaptation and other diversities from dictating systems. There is also shown an approach of how to deal with such transcriptions using web technologies with emphasis on collaboration of multiple people while correcting the transcriptions. Resulting application is server-client based, where the client side uses HTML and JavaScript with Flash based player to create a full featured user interface for correcting and administering the transcriptions. Another part of client application inserts the subtitles directly into the video image and shows a way how transcriptions can be used for indexing which allows users to find a specific and most relevant part of information they seek inside of the lectures. First part of the work introduces speech recognition algorithms used in the transcription software and other tools used in this work. Second part contains descriptions of individual methods, libraries and frameworks that were used for creating the application. At the end of this section there are described similar technologies that also deal with spontaneous speech recognition. In several next chapters there is first shown an approach of how the resulting software could be implemented and then there are described all the methods and algorithms in detail with emphasis on web technologies. Finally there is also shown which methods were used for comparison of transcription success rate by using several different approaches which better describe the effort that needs to be made to correct the transcribed text. Final chapters of the work show the actual results of tests (which algorithms were described in previous section) that were performed. There are also shown numerous areas that could further be developed in the future, and most importantly there is also noted that specific language models need to be created for each of the lecture fields.

Description

katedra: ITE; přílohy: 1 CD; rozsah: 56 s. (86035 znaků)

Subject(s)

skryté markovské modely, přepis spontánní řeči, webové technologie, hidden markov model, transcription of spontaneous speech, web technologies

Item identifier

https://dspace.tul.cz/handle/15240/5905

Collections

Fakulta mechatroniky, informatiky a mezioborových studií
Cena děkana za vynikající diplomovou práci

Show full item record