dc.contributor.advisor	Poláček Martin, Ing. :68956	cs
dc.contributor.author	Tauchman, Denis	cs
dc.contributor.referee	Halada Martin, Ing. :69482	cs
dc.date.accessioned	2025-07-14T12:43:58Z
dc.date.available	2025-07-14T12:43:58Z
dc.date.committed	9.5.2025	cs
dc.date.defense	10.6.2025	cs
dc.date.issued	2025-06-10	cs
dc.date.submitted	14.10.2024	cs
dc.description.abstract	Tato bakalářská práce se zabývá tvorbou jazykového modelu pro český jazyk, určeného k vektorizaci textu v rámci metody Retrieval- -Augmented Generation (RAG). Cílem práce bylo navrhnout a natrénovat model, který umožní efektivní převod vstupních textových dotazů a dokumentů do vektorového prostoru, a tím zlepšit proces vyhledávání informací. Navržený model vychází z předtrénované architektury XLM-RoBERTa-base typu transformer, která byla dále doladěna (fine-tuned) na českých datech, včetně vlastního datasetu vytvořeného pro účely této práce. Experimentální část se zaměřuje na výběr základního modelu, úpravu hyperparametrů a přípravu trénovacích dat. Dosažené výsledky byly porovnány s alternativními přístupy běžně používanými pro podobné úlohy, přičemž navržený model dosáhl lepší přesnosti. V závěrečné části práce je model integrován do webové aplikace pro vektorizaci dokumentů v rámci techniky RAG, čímž je ověřena jeho praktická použitelnost.	cs
dc.description.abstract	This bachelor thesis deals with the creation of a language model for Czech, which is designed for text vectorization within the Retrieval- -Augmented Generation (RAG) method. The aim of the thesis was to design and train a model that will enable efficient conversion of input text queries and documents into vector space, thus improving the information retrieval process. The proposed model is based on the pre-trained XLM-RoBERTa-base transformation architecture, which was further tuned on Czech data, including a custom dataset created for the purpose of this work. The experimental part focuses on the selection of the base model, the adjustment of the hyperparameters and the preparation of the training data. The results obtained were compared with alternative approaches commonly used for similar tasks, with the proposed model achieving higher accuracy. In the final part of the work, the model is integrated into a web application for document vectorization within the RAG technique, thus verifying its practical applicability	en
dc.format	53 s	cs
dc.identifier.uri	https://dspace.tul.cz/handle/15240/177319
dc.language.iso	CS	cs
dc.subject	Velké jazykové modely	cs
dc.subject	RAG	cs
dc.subject	vektorizace textu	cs
dc.subject	RoBERTa	cs
dc.subject	transformery	cs
dc.title	Tvorba jazykového modelu pro vektorizaci textu	cs
dc.title	Development of a language model for text vectorization	en
dc.type	diplomová práce	cs
local.degree.abbreviation	Bakalářský	cs
local.identifier.author	M22000192	cs
local.identifier.stag	47840	cs

Tvorba jazykového modelu pro vektorizaci textu

Files

Original bundle

Collections