Influence of ratio of auxiliary pages on the pre-processing phase of web usage mining

Munk, Michal

Influence of ratio of auxiliary pages on the pre-processing phase of web usage mining

dc.contributor.author	Munk, Michal
dc.contributor.author	Benko, Ľubomír
dc.contributor.author	Gangur, Mikuláš
dc.contributor.author	Turčáni, Milan
dc.contributor.other	Ekonomická fakulta	cs
dc.date.accessioned	2015-09-02
dc.date.available	2015-09-02
dc.date.defense	2015-09-04
dc.date.issued	2015-09-04
dc.description.abstract	Data mining belongs to the one of the important tools for Business Intelligence. It is a means to increase competitiveness of a company. Web usage mining is engaged in data mining of web server log file and it analyzes the user´s behavior on the web site. The first step of web usage mining process is data pre-processing obtained from a web log file. Data pre-processing is an important part of web usage mining. Discovering patterns of behavior of web visitors depends on the quality of pre-processing phase. Therefore it is important to understand the used methods. This paper summarizes the pre-processing phases and especially the phases of session identification. There are introduced two algorithms for data cleaning and session identification using the reference length method. The main aim of this paper is to compare a calculation of cutoff time and its influence on discovered useful, trivial and inexplicable rules. Cutoff time is an important part of the session identification using the Reference Length method. The influence of ratio of auxiliary pages on the calculation based on a sitemap and subjective estimation was compared. Statistical methods were used to determine the difference between these two approaches. In this paper was examined the portion of found rules based on quantity and quality. The ratio of auxiliary pages has only an impact on quantity of extracted rules in the files with path completion. It has no impact on portion of extracted useful rules, on the other hand, inappropriate estimation of the ratio of auxiliary pages may cause increasing of trivial and inexplicable rules.	en
dc.format	text
dc.format.extent	144-159 s.	cs
dc.identifier.doi	10.15240/tul/001/2015-3-013
dc.identifier.eissn	2336-5604
dc.identifier.issn	12123609
dc.identifier.uri	https://dspace.tul.cz/handle/15240/13249
dc.language.iso	en
dc.publisher	Technická Univerzita v Liberci	cs
dc.publisher	Technical university of Liberec, Czech Republic	en
dc.publisher.abbreviation	TUL
dc.relation.isbasedon	ABRAHAM, A. Natural computation for business intelligence from Web usage mining. In: Proceedings of Seventh International Symposium on Symbolic and Numeric Algorithms for Scientific Computing. 2005, pp. 3-10. doi: 10.1109/SYNASC.2005.59.
dc.relation.isbasedon	Agrawal, R., Imieliński, T., Swami, A. Mining Association Rules Between Sets Of Items In Large Databases. In: SIGMOD '93 Proceedings of the 1993 ACM SIGMOD international conference on Management of data. New York: ACM, 1993, pp. 207-216. ISBN 0-89791-592-5. DOI: 10.1145/170036.170072
dc.relation.isbasedon	Agrawal, R., Srikant, R. Fast Algorithms for Mining Association Rules in Large Databases. In: Proceedings of the 20th International Conference on Very Large Data Bases. San Francisco, CA: Morgan Kaufmann Publishers Inc., 1994. pp. 487-499.
dc.relation.isbasedon	Arora, D., Neville, S.W., Li, K.F. Mining WiFi Data for Business Intelligence. In: 8th International Conference on P2P, Parallel, Grid, Cloud and Internet Computing (3PGCIC). IEEE, 2013. pp. 394-398. doi: 10.1109/3PGCIC.2013.67.
dc.relation.isbasedon	AYE, T. Web log cleaning for mining of web usage patterns. In: Computer Research and Development (ICCRD). Vol. 2. IEEE, 2011. pp. 490-494. ISBN 978-1-61284-839-6. DOI: 10.1109/ICCRD.2011.5764181.
dc.relation.isbasedon	BERRY, M., LINOFF, G. Data mining techniques for marketing, sales, and customer relationship management. 2nd ed. Indianapolis: Wiley, 2004. 672 p. ISBN 978-0-471-47064-9.
dc.relation.isbasedon	Cooley, R., Mobasher, B., Srivastava, J. Data Preparation for Mining World Wide Web Browsing Patterns. Knowledge and Information Systems. 1999, Vol. 1, Iss. 1, pp. 5-32. ISSN 0219-1377. DOI: 10.1007/BF03325089.
dc.relation.isbasedon	Electronic statistics textbook. Tulsa, OK: Statsoft, 2010.
dc.relation.isbasedon	Frawley, W., Piatetsky‐Shapiro, G., Matheus, C. Knowledge Discovery in Databases: An Overview. AI Magazine. 1992, Vol. 13, Iss. 3, pp. 213‐228. ISSN 0738-4602. DOI: 10.1609/aimag.v13i3.1011.
dc.relation.isbasedon	Han, J., Lakshmanan, L., Pei, J. Scalable Frequent-pattern Mining Methods: An Overview. In: Tutorial Notes of the Seventh ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. New York: ACM, 2001. pp. 5.1-5.61. DOI: 10.1145/502786.502792.
dc.relation.ispartof	Ekonomie a Management	cs
dc.relation.ispartof	Economics and Management	en
dc.relation.isrefereed	true
dc.rights	CC BY-NC
dc.subject	health care system	en
dc.subject	in-hospital care	en
dc.subject	day surgery	en
dc.subject	Analytical Hierarchy Process	en
dc.subject	functionality of day surgery	en
dc.subject.classification	C88
dc.subject.classification	C69
dc.subject.classification	M15
dc.subject.classification	O33
dc.subject.classification	D89
dc.title	Influence of ratio of auxiliary pages on the pre-processing phase of web usage mining	en
dc.type	Article	en
local.access	open
local.citation.epage	159
local.citation.spage	144
local.faculty	Faculty of Economics
local.fulltext	yes
local.relation.abbreviation	E&M	en
local.relation.abbreviation	E+M	cs
local.relation.issue	3
local.relation.volume	18

Files

Original bundle

Now showing 1 - 2 of 2

Name:: 10.15240tul0012015-3-013.pdf
Size:: 1.4 MB
Format:: Adobe Portable Document Format
Description:

Download

Name:: licence.txt
Size:: 1.74 KB
Format:: Item-specific license agreed upon to submission
Description:

Download

Collections

Číslo 3