Sunday, November 14, 2010

Reading Notes- Week 11, Nov 22, 2010

Web Search Engines: Part 1 and 2
The links for Part 1 and Part 2 about search engines led me to the homepage for Computer, the flagship publication of the IEEE Computer Society. I searched for both the articles within the journal and found them, but couldn't access to them. All I could see were the abstracts. So I went to the ULS website and found them by looking through the electronic journals. Just thought I'd throw that out there in case anyone else was having trouble accessing the articles from the links provided.
I thought these articles provided a good overview of how search engines and web crawlers work. I had never heard of the "politeness delay" Hawking mentions in Part 1, but it makes sense because I guess overworking the machinery would put too much stress on it. Also, I liked the term "politeness delay." Considering how many steps go into crawling web pages, I am amazed at how fast you get search results back. It's hard to believe all that is going on in a fraction of a second. I am just so impressed by how quickly and efficiently search engines work. With all the junk out there on the web, I might expect to get a lot more hits that were irrelevant, but clearly a lot of thought has been put into designing these search engines, and they generally do a good job.

Current developments and future trends for the OAI protocol for metadata harvesting
The Open Archives Initiative sounds really interesting. It seems to allow various groups to collect their own metadata and then share it through service providers. But toward the end of the article the authors described how even through everyone's using Dublin Core, there are still differences in how data are being entered. Will our field ever be able to reach a standard for interoperability? Or are there just too many archives and too many libraries out there with too many diverse and unique collections to make this possible? Maybe that's not even the problem. Is communication between different institutions the issue?

The Deep Web: Surfacing Hidden Value
I am constantly amazed by how large the internet is. And I feel like because it is so gigantic I can't even imagine how gigantic it is. According to this article, when people use search engines, they are only searing 0.03 percent of the internet. That's crazy! How many pages are out there that you might want to see but never will? The article states: "Traditional search engines can not 'see' or retrieve content in the deep Web — those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden." It was interesting to read how the web has evolved and how search engines have evolved with it. Does anyone have any future predictions for the future of internet search engines? Will they ever penetrate the deep web?

4 comments:

  1. Regarding the Hawking articles, I think that certain innovations such as the "politeness delay" could yield even more tremendous improvements in web crawling. Currently, search engines are capable of performing operations of which so many people are unaware, but I think that if more people were to become aware of these capabilities, they would become more appreciative of these capabilities. One can only wonder how the increasing amount of web information could affect web crawling techniques in the future.

    ReplyDelete
  2. Does anyone have any future predictions for the future of internet search engines?

    My assumption is that web search engines will be evolving and trying to satisfy user's growing needs for more complicated searches until other powerful means of searching become available. Search engines with low efficiency will eventually disappear, while the highly effective search engines will remain. It looks like Google will continue to evolve even more and I suspect will move to the education and research while keeping add business as supplementary.

    ReplyDelete
  3. Is it possible for us as future librarians to "reach" the ever expanding layers of information?

    ReplyDelete
  4. You make a good point about the Deep Web, Christy. It is almost depressing all the information available to us that we may never find simply because the web is so vast that our search engine will not uncover it for us.

    ReplyDelete