Essays /

Web Mining Essay

Essay preview

Lavalee Singh1 Arun Singh2
1 M.Tech (C.S.) Student IIMT Engineering College Meerut (U.P.) India [email protected]
2Associate Professor IIMT Engineering College Meerut (U.P.) India

The World-Wide-Web contains a large amount of information. Everyone can store and retrieve the information from web. It is difficult to find the relevant piece of information from web. Extracting the important information from web is called Web Mining. Web mining technologies are best suited for web information extraction and information retrieval. Web mining is one of the mining technologies, which applies data mining techniques in large amount of web data to improve the web services. We are going to give a brief description of web mining and its categorization namely: web content mining, web structure mining and web usage mining. This paper also reports the web data mining with applications. Keywords: Web Mining, Information Extraction, Information Retrieval, Web content mining, Web structure mining, Web usage mining and Web crawling

The World Wide Web is a popular and interactive medium to disseminate information today. With the explosive growth of information sources available on the World Wide Web, it has become increasingly necessary for users to utilize automated tools in order to find, extract, filter, and evaluate the desired information and resources. The World Wide Web provides a vast source of information of almost all types, ranging from DNA databases to resumes to lists of popular multiplexes. Web has a large amount of data and it is not easy task to find out the content or information of our interest. Web mining is one of the techniques to solve such kind of problem. We are not saying that this is the only technique, a no. of technique are namely Machine Learning, Natural Language Processing etc. Due to the large availability of data the World Wide Web, it has become very important for users to use automated tools to find the desired information resources. Information Retrieval is the automatic retrieval of all relevant documents while at the same time retrieving as few of the non-relevant as possible. Information extraction aims to extract relevant facts from the documents while aims to select relevant documents [1].

Figure (1)
As shown is Figure (1) YAHOO, GOOGLE and MSN are search engines, used to extract the information from web. The extracted information may be relevant but also contain less relevant, and some time irrelevant information.

Web mining is the...

Read more


-323 -59593 -639 -68 -9 /06/0005 0 0975 1 1.0 10 11 12 13 1996 2 2.0 2000 2002 2006 2008 2009 2011 23 26 2associate 3 3.1 3.2 3.3 39 4 4.0 4.1 4.2 5 5.0 6 6.0 635 65 7 8 8887 9 abl abstract access accord accur acm across address advanc agent aggreg aim algorithm almost also amount analysi analyz and animesh aol appli applic approach appropri arc architectur aren arun asset associ attribut audio author autom automat avail b base becom behavior best better blockeel brief broad brows c c.s call categor categori chang characterist chen chiang chung chung-wei classif classifi clear client client-sid close collect colleg combin communic communiti complex comput concept conclus confer connect consist constraint contain content cooki copi correl correspond could couldn crawl crawler current custom cybernet d darmstadt data data-typ databas deal decis definit descript design desir differ difficult digit digraph directori discov discoveri discuss display dissemin distanc distinct distribut divid dna document dr drawback due duppala e e.kirubakaran easi easili edinburgh educ effici either engin ester etc evalu everi everyon examin explor explos extens extern extract fact figur file filter find first five focus follow formal fürnkranz g gather general generat give given go gold good googl graph group growth h hammamet han hans-pet haw help hendrik hiroyuki hit hommachi hong html hu hub huge human hyper hyper-link hyperlink hypertext i.e identifi ieee ii iii iimt imag implement import imprecis improv inabl includ incomplet increas index india individu induc inform initi intend inter inter-docu interact interconnect interest intern internet interpret intra intra-docu introduct introductori invok involv ip irrelev iter iv jain januari johann journal juli jyh jyh-haw k kawano keyword kind kitsuregawa know knowledg kosala kriegel ku lama languag larg lavale learn lee less level line linguist link link-bas list log lot [email protected] low m machin main major make man mani martin masaru masashi matthia may medium meerut meet meta meta-data mine minimum model move msn multimedia multipl multiplex must n name natur navig necessari need new next nimgaonkar no.5 node non non-relev o.etzioni offlin on-lin one onlin order origin other overal overcom overload p2p page pagerank paper patra pattern perform person perspect peter piec platform play point pointer popular portal possibl potenti pp prashanta pre pre-process precis predict preliminari prepar preprocess present previous problem process professor profil program properti provid purohit quagmir qualiti queri rang rank raymond real real-tim realli recal record recoveri refer referr rekha relat relev reliabl report research resourc result resum retriev role s.padmapriya sakyo sakyo-ku satyajeet say schubert score scotland search see seen select semi semi-structur seri serv server servic set shown side sigkd sigkdd singh1 singh2 singl site snapshot solv sourc specif store structur student studi subtask suffer suit suitabl support survey suryaprakash system t.anand take target task techniqu technolog terror text three time today tool topic toyoda trace track transform travers tri tripathi tu tunisia two type typic u.p un un-structur understand understood unknown unstabl url us usag use user util v valid valu various vast video view visit visual volum wang web web-bas web/text webpag websit wei weight well wen wen-chen wide wield within world world-wide-web written www xml xuli yahoo yasmin yeh yoshida zong