Essay preview
22
CHAPTER 2 Literature Review
The intent of web usage mining is to analyze the users’ access patterns from the data generated from browsing web. The output from these analyses has tremendous practical applications like personalized web search, target marketing, adaptive web sites and several sales analyses. This chapter aims to follow up the introductory constricts and approaches on network utilization excavation. First of all, we introduce web usage data there on moving to preprocessing and a review on various blue print breaks through approaches for network utilization excavation. Ultimately, we sum up the conceptions discoursed. 2.1 WEB MINING There exists abundant content information in web pages and also their hyperlinks. These pages are accessed by the users and hence a new set of data by name web logs are generated. These logs contain the access patterns of the users. The techniques used for mining these logs incidentally discover and identify exiting information from the logs. Hence the inputs for web mining come from several areas like databases, Data Recovery, Machine Acquisition and Instinctive Speech Litigating Network excavation techniques can broadly classified into three types (Fig:2.1) namely 1. Web content mining, 2. Web structure mining, and 3. Web usage mining.
23
2.2
WEB CONTENT MINING (WCM)
Web content is a combination several types of data like structured data, semi structures data, unstructured data further this data again could be text, images, audio or video content. The category of algorithms that uncover useful information from these data types or documents is called web content mining.
The main goals of WCM includes assisting information finding, (ex: Search Engine), filtering information to users on user profiles, database view in WCM simulates the information on the network and incorporate them for large number of convoluted questions. Many intelligent tools namely web agents were developed by the researchers for information processing and retrieval and a higher level of abstraction is provided to the semi-structured data on the web using the data mining techniques. Text mining [4] and multimedia data mining [5] proficiencies are useful excavation the subject in network paginates. Some of these efforts are summarized as follows. 2.2.1 Agent –Based Approach Generally, agent based Web mining systems can be categorized as: a. Intelligent Search Agents. b. Information Filtering/Categorization. c. Personalized Web Agents.
24 a. Intelligent Search Agents various well informed network agentive roles are built up for looking up for pertinent entropy utilizing knowledge base features and client visibilities to prepare and render the ascertained data. Some of the web agents are Harvest [6], FAQ-Finder [7], Information Manifold [8] , OCCAM [9] and ParaSite [10]. b. Information Filtering/Categorization The network agentive roles use different data recovery methods [11] and features of candid machine-readable text network papered to
mechanically recover and assess them[12, 13, 14, 15&16]. c. Personalized Web Agents Many web agents learn user interests according to their web usage and discover the patterns based on their preferences and interests. Examples of such personalized web agents are the Web Watcher[17], PAINT[18], Syskill& Webert[19], GroupLens[20] , Firefly[21], and others[22]. For
instance, Syskill & Webert used Bayesian classifier to rate web page of users interests based on user’s profile. 2.2.2 Database Approach The semi-structured data is organized to structured data using various database approaches. Various database query processing mechanisms and data mining techniques are used to analyze the structured data available on web. The database approaches are listed as: a. Multilevel databases
25 b. Web query systems. a. Multilevel databases The main idea behind this approach is that the lowest level of the database contains semi-structured information stored in various web repositories, such as hypertext documents. b. Web query systems A large number of network-grounded interrogation systems and languages use standard repository interrogations languages like
structured query language, morphological data about network text files, and regular instinctive language treating for the interrogations which are utilized in web look ups [23]. 2.3 WEB STRUCTURE MINING (WSM)
Network structure excavation is pertained on evoking the model or patterns or structures which construct the structural representation of the web through links. It is used to study the hierarchical structure of the hyperlinks. The links may be with or without description about them. This framework is useful to classify network varlets and helpful to evoke data for example same kind and kinship among various internet sites. NSE can be utilized to disclose authorized sites. More significant is the construction of the network varlets and the quality of the pecking order of web links in the website of a specific field. Few algorithmic rules have been suggested to simulate the network topology for example HITS [24], Page Rank [25] and betterments of HITS
26 by tallying subject matter to the links structure [26] and by utilizing resident straining [27]. These frameworks are primarily implemented as a method to estimate the character grade or relevance for every network varlet. Few instances are the Clever System [26] and Google [25]. Few more applications of the instances consist of network varlets
classification [28] and disclosing mini communities on the network [29]. 2.4 WEB USAGE MINING (WUM)
NETWORK UTILIZATION EXCAVATION concentrates on approaches which anticipate client conduct when the client moves on the network.WUM intends to uncover exciting WUM intends to uncover exciting recurrent client access patterns produced while the surfing the web which is maintained in the web server logs, intermediate server logs or user logs. WUM is about finding patterns of page views by Web users or to find the usage of a particular Website. There are many applications of Web usage mining, such as targeting advertisements. The objective is to find the set of customers who are most likely to respond to an advertisement. By sending advertisement materials to these potential customers significant savings in mailing costs can be achieved. Another application is in designing of Web pages. By studying the seque...