Automatická sumarizace textových dokumentů

Rott, Michal

Automatická sumarizace textových dokumentů

Title Alternative:Automatic summarization of text documents

Files

mgr_23169.pdf(774.43 KB)

opo_23169.pdf(27.31 KB)

ved_23169.pdf(27.31 KB)

obh_23169.pdf(27.31 KB)

Date

2012-01-01

Authors

Rott, Michal

Publisher

Technická Univerzita v Liberci

Abstract

Dnešní svět je přehlcen informacemi a právě tato práce se snaží lidem usnadnit práci s informacemi vytvářením souhrnů těchto informací. V rámci výzkumu byly převážně z anglické literatury nastudovány metody vytvářející z rozsáhlých článků extrakty. Byly nastudovány sumarizační metody heuristické a statistické využívané v počátcích digitalizace textů, ale i moderní metody analyzující texty hlouběji. Hlavní pozornost byla věnována Luhnovu sumarizátoru a latentní sémantické analýze. Tyto metody byly také implementovány v jazyku C# na platformě Mono. Druhá část diplomové práce řeší problematiku evaluace implementovaných sumarizačních metod. Z literatury a vědeckých článků byly nastudovány techniky používané pro měření a hodnocení automaticky generovaných souhrnů. Pro vlastní provedení evaluace byl využit program ROUGE, využívaný pro tento účel i na konferencích Text Analysis Conference. V rámci evaluace bylo provedeno několik experimentů s různými nastaveními sumarizace a byly vyhodnoceny i volně dostupné sumarizátory.
Today's world is overloaded with information and this work is trying to help people work with information by creating summaries of this information. During the research has been staging method of producing extracts from large articles. Staging were summarization mehods heuristic and statistical used in the early days of text digitization and modern methods analyzing texts more deeply. The main attention was paid to Luhn summarizer and to method using latent semantic analysis. These methods were also implemented in C# on the Mono platform. The second part of the thesis deals with the issue summarizing the evaluation of implemented methods. From literature and scientific articles have been staging techniques used for measurement and evaluation of automatically generated summaries. For the actual performance evaluation program was used ROUGE, used for that purpose at conferences and Text Analysis Conference. The evaluation was carried out several experiments with different settings and summaries have been evaluated and freely available sumarizátory.

Description

katedra: ITE; přílohy: 1 DVD; rozsah: 52

Subject(s)

sumarizace, souhrn, luhnův sumarizátor, latentní sémantická analýza, evaluace, rouge, summarization, summary, luhn summarizer, latent semantic analysis, evaluation, rouge

Item identifier

https://dspace.tul.cz/handle/15240/12037

Collections

Fakulta mechatroniky, informatiky a mezioborových studií
Cena rektora za vynikající magisterskou práci

Show full item record