Designing New Crawling and Indexing Techniques for Web Search Engines

Designing New Crawling and Indexing Techniques for Web Search Engines
Автор
 
Год
 
Страниц
 
156
ISBN
 
9783639204001
Категория
 
Новые поступления

Описание:

This thesis studies in a Web search engine how a crawler with limited computing resource can effectively crawl from the dynamically changing Web and acquire the most updated Web documents, and how a Web search engine can provide information-object--oriented indexing methods which enable users to retrieve desired information with high accuracy and high efficiency. To address the first problem, we design a set of sampling policies with various downloading granularity for the sampling method, taking into account the link structure, the directory structure, and the content-based features which include the clustering technique. We further extend the clustering-based sampling approach by testing more dynamic features and strategically selecting samples from each cluster. For the second problem, we propose building indexes on extracted metadata of various information objects, instead of the whole document. We set up a digital library named ArchSeer for the domain of archeology. ArchSeer...

Похожие книги

Audit Sampling: An Introduction to Statistical Sampling in Auditing, 5th EditionAudit Sampling: An Introduction to Statistical Sampling in Auditing, 5th Edition
Автор: Dan M. Guy, D. R. Carmichael, O. Ray Whittington
Год: 2002
Digital Alias-free Signal ProcessingDigital Alias-free Signal Processing
Автор: Ivars Bilinskis
Год: 2007
Monte-Carlo methodsMonte-Carlo methods
Автор: Malvin H. Kalos
Год: 1986
Introduction to derivative-free optimizationIntroduction to derivative-free optimization
Автор: Andrew R. Conn
Год: 2009
Introduction to derivative-free optimizationIntroduction to derivative-free optimization
Автор: Conn A. R., Scheinberg Katya, Vicente Luis N.
Год: 2009
Introduction to derivative-free optimizationIntroduction to derivative-free optimization
Автор: Andrew R. Conn
Год: 2009