Web Search Engines: Part 1 and 2
The links for Part 1 and Part 2 about search engines led me to the homepage for Computer, the flagship publication of the IEEE Computer Society. I searched for both the articles within the journal and found them, but couldn't access to them. All I could see were the abstracts. So I went to the ULS website and found them by looking through the electronic journals. Just thought I'd throw that out there in case anyone else was having trouble accessing the articles from the links provided.
I thought these articles provided a good overview of how search engines and web crawlers work. I had never heard of the "politeness delay" Hawking mentions in Part 1, but it makes sense because I guess overworking the machinery would put too much stress on it. Also, I liked the term "politeness delay." Considering how many steps go into crawling web pages, I am amazed at how fast you get search results back. It's hard to believe all that is going on in a fraction of a second. I am just so impressed by how quickly and efficiently search engines work. With all the junk out there on the web, I might expect to get a lot more hits that were irrelevant, but clearly a lot of thought has been put into designing these search engines, and they generally do a good job.
Current developments and future trends for the OAI protocol for metadata harvesting
The Open Archives Initiative sounds really interesting. It seems to allow various groups to collect their own metadata and then share it through service providers. But toward the end of the article the authors described how even through everyone's using Dublin Core, there are still differences in how data are being entered. Will our field ever be able to reach a standard for interoperability? Or are there just too many archives and too many libraries out there with too many diverse and unique collections to make this possible? Maybe that's not even the problem. Is communication between different institutions the issue?
The Deep Web: Surfacing Hidden Value
I am constantly amazed by how large the internet is. And I feel like because it is so gigantic I can't even imagine how gigantic it is. According to this article, when people use search engines, they are only searing 0.03 percent of the internet. That's crazy! How many pages are out there that you might want to see but never will? The article states: "Traditional search engines can not 'see' or retrieve content in the deep Web — those pages do not exist until they are created dynamically as the result of a specific search. Because traditional search engine crawlers can not probe beneath the surface, the deep Web has heretofore been hidden." It was interesting to read how the web has evolved and how search engines have evolved with it. Does anyone have any future predictions for the future of internet search engines? Will they ever penetrate the deep web?