Automatická detekce témat
Title Alternative:Automatic detection of topics
Loading...
Date
2013-12-20
Authors
Journal Title
Journal ISSN
Volume Title
Publisher
Technická Univerzita v Liberci
Abstract
vyhledání a zhodnocení informací o automatické klasi kaci dokumentů, seznámení s jazykem Perl a balíkem LWP pro potřeby práce s textovými dokumenty, nalezení klasi kátorů v programu WEKA, porovnání různých metod klasi kace a parametrizace textů.
The aim of diploma thesis is to find sufficient sequence which can sort out unsigned text documents. It means to prepare a lot of training data for classifier learning. The fruitfulness of classifer is tested by the help of testing data. Newspaper articles from server zpravy.atlas.cz are used as a testing data. The first part of diploma thesis is about automatic detection theory. The second part of diploma thesis is about finding the classifier by the help of program WEKA. Data is processed by the help of programming language Perl and package LWP. Simple text isn't suitable for next processing. For this reason a global dictionary is created. Documents are converted into feature vectors. These vectors can be written by the help of different representation. In diploma thesis different sorts of representation are tested. Program WEKA is used for training classifiers, cluster analysis and select attributes. In this program different representation feature vectors and classifiers algorithms are tested.
The aim of diploma thesis is to find sufficient sequence which can sort out unsigned text documents. It means to prepare a lot of training data for classifier learning. The fruitfulness of classifer is tested by the help of testing data. Newspaper articles from server zpravy.atlas.cz are used as a testing data. The first part of diploma thesis is about automatic detection theory. The second part of diploma thesis is about finding the classifier by the help of program WEKA. Data is processed by the help of programming language Perl and package LWP. Simple text isn't suitable for next processing. For this reason a global dictionary is created. Documents are converted into feature vectors. These vectors can be written by the help of different representation. In diploma thesis different sorts of representation are tested. Program WEKA is used for training classifiers, cluster analysis and select attributes. In this program different representation feature vectors and classifiers algorithms are tested.
Description
katedra: ITE; přílohy: 1 DVD; rozsah: 102
Subject(s)
perl, weka, automatická klasifikace, klasifikátor, příznakový vektor, třídění dokumentů, perl, weka, automatic classification, classifier, feature vector, sort out documents