WEB MINING: AN INTRODUCTORY APPROACH
Lavalee Singh1 Arun Singh2
1 M.Tech (C.S.) Student IIMT Engineering College Meerut (U.P.) India [email protected]
2Associate Professor IIMT Engineering College Meerut (U.P.) India
The World-Wide-Web contains a large amount of information. Everyone can store and retrieve the information from web. It is difficult to find the relevant piece of information from web. Extracting the important information from web is called Web Mining. Web mining technologies are best suited for web information extraction and information retrieval. Web mining is one of the mining technologies, which applies data mining techniques in large amount of web data to improve the web services. We are going to give a brief description of web mining and its categorization namely: web content mining, web structure mining and web usage mining. This paper also reports the web data mining with applications. Keywords: Web Mining, Information Extraction, Information Retrieval, Web content mining, Web structure mining, Web usage mining and Web crawling
The World Wide Web is a popular and interactive medium to disseminate information today. With the explosive growth of information sources available on the World Wide Web, it has become increasingly necessary for users to utilize automated tools in order to find, extract, filter, and evaluate the desired information and resources. The World Wide Web provides a vast source of information of almost all types, ranging from DNA databases to resumes to lists of popular multiplexes. Web has a large amount of data and it is not easy task to find out the content or information of our interest. Web mining is one of the techniques to solve such kind of problem. We are not saying that this is the only technique, a no. of technique are namely Machine Learning, Natural Language Processing etc. Due to the large availability of data the World Wide Web, it has become very important for users to use automated tools to find the desired information resources. Information Retrieval is the automatic retrieval of all relevant documents while at the same time retrieving as few of the non-relevant as possible. Information extraction aims to extract relevant facts from the documents while aims to select relevant documents .
As shown is Figure (1) YAHOO, GOOGLE and MSN are search engines, used to extract the information from web. The extracted information may be relevant but also contain less relevant, and some time irrelevant information.
2.0 WEB MINING
Web mining is the...