dc.contributor	Rott Michal, Ing. : 63020
dc.contributor.advisor	Paleček Karel, Ing. Ph.D. : 61120
dc.contributor.author	Krechler, Tomáš
dc.date.accessioned	2021-03-02T04:20:14Z
dc.date.available	2021-03-02T04:20:14Z
dc.date.committed	2020-5-18
dc.date.defense	2021-02-02
dc.date.submitted	2019-10-9
dc.date.updated	2021-2-2
dc.degree.level	Bc.
dc.description.abstract	Cílem této práce je příprava pro analýzu politického smýšlení obyvatelstva, rozdílnost mezi komentáři různých webů a snaha o zjištění funkčnosti rekurentních neuronových sítí pro český jazyk. Zaměřil jsem se na webové portály novinky.cz a idnes.cz, data byla sbírána po dobu pěti let. K řešení zvoleného problému jsem využil právě rekurentních neuronových sítí s Long short-term memory buňkami. Navrhl jsem tři různé modely. První po natrénování na textu generuje texty po znacích. Druhý použije word2vec slovník slov a jejich příslušných číselných vektorů ke klasifikaci sentimentu komentářů a poslední za použití stejných slovníků generuje texty po slovech. K naprogramování modelů jsem použil jazyk Python a nástroj JupyterLab. Obě metody generování vytvářely text, vypadající jako čeština, ovšem občas postrádající smysl. Jelikož roli hraje náhoda, lépe vygenerované komentáře by mohly být na první pohled zaměnitelné s člověkem psaným textem. Klasifikace sentimentu dosáhla pro web iDnes 61 % přesnosti a pro web Novinky 71 %. Tyto dva modely se při klasifikaci shodovaly v 80 % případů. Provedený výzkum naznačil, s relativně nízkou hladinou významnosti, trend v podobných náladách diskutujících na obou periodikách. Při porovnání podobnosti slov jsou vidět patrné rozdíly v použití slova. Na každém webu je použito v různých větách s rozdílnými citovými zabarveními. Hlavním zjištěním této práce je, že rekurentní neuronové sítě lze dobře použít i pro český jazyk. Vyšších přesností klasifikace a menší chybovosti a smysluplnosti generování by se dalo dosáhnout především delším či paralelním trénováním na více zařízeních. Na stejném principu lze analyzovat i další periodika a utvořit si tak ucelený přehled o politické náladě ve společnosti.	cs
dc.description.abstract	The main goal of this bachelor thesis is to prepare basis for analysis of the political mindset of the population, differences between the comments from different web sites and to determine functionality of recurrent neural networks for Czech language. I focused on web portals novinky.cz and idnes.cz. The data were collected for five years. To solve the problem, I used recurrent neural networks with long short-term memory cells. I designed three different models. The first for generating texts by characters. Second one uses word2vec dictionary of words and their respective number vectors to classify sentiment of the comments. Last one uses same dictionaries to generate text by words. To program the models, I used language Python and JupyterLab tool. Both text generation models produced text that looked like Czech, but sometimes lacking in meaning. Since chance plays a role, better generated comments could at first glance be interchangeable with human written text. The sentiment classification model reached 61% accuracy for the website iDnes and 71% for Novinky. However, these two models agreed in classification of same comments in 80% of cases. The research indicated with relatively low level of significance a trend of similar moods of discussing in both periodicals. When comparing the similarities of words, obvious differences in different uses of the word are seen. On each site words are used in distinct sentences with unalike emotions behind. The main finding of this thesis is that recurrent neural networks can also be well used for the Czech language. However, higher accuracy of classification and lower error of meaningfulness of generated texts could be achieved by longer or parallel training on multiple devices. The same principle can be applied to analyze other periodicals and thus create a comprehensive overview of political moods in societies.	en
dc.description.mark
dc.format	53 s. (70 000 znaků)
dc.format.extent	5 souborů s příponou ipynb pro použití v JupyterLabu
dc.identifier.signature	V 202102575
dc.identifier.uri	https://dspace.tul.cz/handle/15240/159857
dc.language.iso	cs
dc.relation.isbasedon	beginarab renewcommandlabelenumi[arabicenumi] item parGoodfellow, I., Bengio, Y., Courville, A. Deep learning. MIT Press, 2016 Bishop, C. Pattern Recognition and Machine Learning. 2006. ISBN 13: 978-038731073 Karpathy, A., Johnson, J., Li, F. Convolutional neural neworks for visual recognition. dostupné online: http://cs231n.stanford.edu/par endarab
dc.rights	Vysokoškolská závěrečná práce je autorské dílo chráněné dle zákona č. 121/2000 Sb., autorský zákon, ve znění pozdějších předpisů. Je možné pořizovat z něj na své náklady a pro svoji osobní potřebu výpisy, opisy a rozmnoženiny. Jeho využití musí být v souladu s autorským zákonem https://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf a citační etikou https://knihovna.tul.cz/document/26	cs
dc.rights	A university thesis is a work protected by the Copyright Act. Extracts, copies and transcripts of the thesis are allowed for personal use only and at one?s own expense. The use of thesis should be in compliance with the Copyright Act. https://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf and the citation ethics https://knihovna.tul.cz/document/26	en
dc.rights.uri	https://knihovna.tul.cz/document/26
dc.rights.uri	https://www.mkcr.cz/assets/autorske-pravo/01-3982006.pdf
dc.subject	rekurentní neuronové sítě	cs
dc.subject	Long short-term memory	cs
dc.subject	Adam	cs
dc.subject	Python	cs
dc.subject	JupyterLab	cs
dc.subject	klasifikace sentimentu	cs
dc.subject	generování textu	cs
dc.subject	po slovech	cs
dc.subject	po znacích	cs
dc.subject	word2vec	cs
dc.subject	periodika	cs
dc.subject	recurrent neural network	en
dc.subject	Long short-term memory	en
dc.subject	Adam	en
dc.subject	Python	en
dc.subject	JupyterLab	en
dc.subject	sentiment classification	en
dc.subject	text generating	en
dc.subject	word level	en
dc.subject	char level	en
dc.subject	word2vec	en
dc.subject	journal	en
dc.title	Analýza textů online periodik pomocí metod strojového učení	cs
dc.title	Analysis of online news text and user comments using machine learning	en
dc.type	bakalářská práce	cs
local.degree.abbreviation	Bakalářský
local.degree.discipline	IT
local.degree.programme	Informační technologie
local.degree.programmeabbreviation	B2646
local.department.abbreviation	ITE
local.faculty	Fakulta mechatroniky, informatiky a mezioborových studií	cs
local.faculty.abbreviation	FM
local.identifier.author	M16000039
local.identifier.stag	40096
local.identifier.verbis
local.identifier.verbis	kpw06676457
local.note.administrators	automat
local.note.secrecy	Povoleno ZverejnitPraci Povoleno ZverejnitPosudky
local.poradovecislo	2575

Analýza textů online periodik pomocí metod strojového učení

Files

Original bundle

Collections