NOTE: Even though it's now somewhat dated, the most valuable single resource for understanding this topic is The Invisible Web, Information Today, 2001, by Gary Price and Chris Sherman. The web site for their book is at http://www.invisible-web.net/
What's Invisible? Some Questions Not Answered by General Search Engines
- Is my plane on time?
- Track the path of an approaching hurricane
- check current road conditions
- check hotel availability and prices
Invisible because data is too current, real-time, constantly changing -- current stock price for specific companies, news. Because of the massive failure of search engines -- or searchers -- on Sept. 11), Google now respiders millions of sites daily.
- Find a recent webcast.
- Watch birds hatching and the mothers feeding them, or other processes observable through continuous webcams
- What does a machine gun sound like?
- Get a visual display of your search results that shows amount of data and relationships between ideas.
Invisible because the format is difficult for crawlers -- post script documents, flash, audio, streaming video, etc. (though search engines are rapidly adding these capabilities -- Google access to Usenet groups, other search engines increasing the kinds of file formats they index)
- Does any local library have a copy of American Infidel: Robert G. Ingersoll?
- Find out if anybody has patented your nifty invention
- How much did this stock I inherited cost at the time my uncle purchased it in 1956?
- Get a list of stocks that match my personal criteria?
- What concerts are going to be available in Boston next March?
- where can I buy an out of print book called Angels and Spaceships? Or, I have a copy of this book; is it worth anything?
- find out how to handle a chemical that may be toxic
- compare nursing homes in my area
- search for unclaimed property
- find the author of a quote and verify its exact wording and source
- What are usenet groups saying about a particular topic?
Invisible because the answers don't exist until the question is asked within a database; to get the answer you have to fill in the form on a specific database -- patent-searching, trip directions, etc.
- find articles in magazines and newspapers and reference sources that aren't published free on the web
- find historical archives of magazine and newspaper articles
- find internal corporate documents
- search a national database for local obituaries and death notices
- Find a snapshot of a web site that doesn't exist anymore
Invisible because access is proprietary, forbidden to crawlers, and/or passworded -- NY Times, commercial databases, intranets. "On the web" vs. "by way of the web". See the recent article in Wired, Searching for the New York Times, http://www.wired.com/news/culture/0,1284,64110,00.html?tw=wn_tophead_3, to which Gary Price says, nonsense, http://www.resourceshelf.com/2004/07/professional-reading-shelf_108989562745956073.html
Other Reasons for Invisibility
- Nobody linked to it
- Sites that have the information may not be crawled in depth -- how many pages of the EPA's web site have been indexed by engines? Try a search by topic plus domain -- "acid rain" site: epa.gov, "guru interview" site:marylaine.com
- Search engines limit the number of viewable results. [Google only displays two results from any one site unless you specifically click on MORE].
- Some search engines spidered it, others didn't. There's not that much overlap between search engines, so when you don't find what you need with one search engine, or completeness counts, use multiple search engines.
- Maybe it's there, on the search engine you're using, but your search engine's preferences are screening it out. Check the default preferences.
- Maybe it's there, but if it's not in the top ten results, it might as well not be, since most people don't look any further.
When To Use Invisible Web Resources
- When you need real-time information -- flight-tracker, news, etc.
- When you need dynamically-generated information from a specialized database -- trip directions from here to there, sources for an out of print book, all the lawsuits filed by or against a particular company, phone numbers and addresses for people and businesses.
- When you need highly authoritative information from journals and other specialized sources such as LookSmart's FindArticles http://www.findarticles.com/PI/index.jhtml, Bartleby http://bartleby.com/, EbscoHost, Thomas Register http://thomasregister.com/, Making of America http://moa.umdl.umich.edu/, etc.
- When you need more control over the ways to limit the search -- in a history database, you can restrict by period or continent, for instance, and in a patent database, you can search by years and by class number.
- When you need a particular kind of content search engines don't do well with -- images, streaming video, etc.
- When you already know who's likely to have produced the info you need and want to go directly there.
- When you need to search a narrower more selective universe -- kids' sites, news sites, web cams, government documents, maps, reference sources, full texts of books, etc.
- You need to do a particular kind of searching -- CiteSeer, for instance, http://citeseer.ist.psu.edu/, allows citation searching, the Classical Music search http://la.znet.com/~iwamura/page2.html that allows you to key in a tune to find out what it is.
Finding Tools
General
- Complete Planet - "discover 70,000+ searchable databases and specialty search engines" http://www.completeplanet.com/
- Direct Search http://www.freepint.com/gary/direct.htm
- The Invisible Web -- companion site to the book by Chris Sherman and Gary Price http://www.invisible-web.net/
- OAIster http://oaister.umdl.umich.edu/o/oaister/ -- simultaneously searches hundreds of major digital collections
- ProFusion http://www.profusion.com/ -- "Target your search by drilling into one of these vertical search groups"
For more info on the scope and content of the invisible web, read: Bergman, M. K. (2001) The deep Web: surfacing hidden value. The Journal of Electronic Publishing, 7 (1) http://www.press.umich.edu/jep/07-01/bergman.html (18 January 2003)
Search Engines Inside the Search Engines
- Google: Google Uncle Sam http://www.google.com/unclesam, Google Groups, Google News, Google Images, directory, catalogs, Linux sites, university search, Google Answers http://answers.google.com/ [see http://www.google.com/advanced_search?hl=en], and more
- Lycos http://www.lycos.com/ -- pictures, news, shopping, yellow pages, etc.
- Yahoo! http://www.yahoo.com/ -- web sites, images, yellow pages, news, products, auctions, classifieds, maps, people search
Directories and Specialized Search Engines
- FIND BOOKS:
AddALL Book Search and Price Comparison http://www.addall.com/ -- defaults to searching in print titles; click on Used and Out of Print for OOP
Amazon http://amazon.com/ -- remember the new search inside the book feature.
Finding Out of Print Books http://marylaine.com/bookbyte/getbooks.html
RedLightGreen http://www.redlightgreen.com/ -- RLG's shared catalog of the 126 million item records of its member libraries
- FIND DISCONTINUED SITES:
Use cache command on Google http://www.google.com/ to find discontinued pages, e.g., cache:www.qconline.com/myword/perfectc.htmlInternet Archive http://www.archive.org/ to search by URL; for topical search, still in beta, http://recall.archive.org/
- FIND IMAGES, VIDEOS, SOUNDS, WEBCAMS, WEBCASTS, ETC.
Finding Images and Sounds on the Web http://marylaine.com/images.htmlKartoo http://kartoo.com/ -- a visual display search engine.
FindSounds http://www.findsounds.com/
Google Directory - Webcams - Directories http://directory.google.com/Top/Computers/Internet/On_the_Web/Webcams/Directories/?tc=1/
WebCam Central http://www.camcentral.com/
Classical Music Search http://la.znet.com/~iwamura/page2.html -- "When you know a melody and you do not know its title or composer, this melody search engine will help you."
Singingfish - Find Audio and Video http://www.singingfish.com/
Streaming News and Video http://www.freepint.com/gary/audio.htm -- another Gary Price gem.
- SOME MISCELLANEOUS FINDING TOOLS:
Bloogz http://www.bloogz.com/ and Daypop http://www.daypop.com/ search blogs and RSS feedsEpinions http://www.epinions.com/ -- product reviews. Reviews themselves are rated by other readers, and the best rated become trusted reviewers
FindLaw http://findlaw.com/ -- includes a general law crawler, but also several that search inside specialized FindLaw databases
Kids Click Search http://sunsite.berkeley.edu/KidsClick!/
Medlineplus http://www.medlineplus.gov/ -- great starting place for vetted medical info.
National Library of Medicine http://www.nlm.nih.gov/ -- has four different metasearch engines plus several highly specific ones.
Search Systems - Largest Free Public Records Database Collection http://www.searchsystems.net/
Finding Strategies
Use Two-step SearchingUse general search engine to search for the likely source or database, then search inside that page. Sample search statements:
- streaming video + search engine
- diabetes + association (sometimes your best info comes from the primary professional or charitable association involved with the topic)
- patents + database
- "rock music" + encyclopedia -- the only way you're going to find anything about the artist known simply as E
- Hispanics + demographics
- word 6.0 + tutorial
- plumbing + "how to"
- "hp deskjet 5550" + product review
- cataloging + listserv (or discussion)
- whales + webcam
There are many ways to approach the needle in the haystack problem:
- A known needle in a known haystack
- A known needle in an unknown haystack
- An unknown needle in an unknown haystack
- Any needle in a haystack
- The sharpest needle in a haystack
- Most of the sharpest needles in a haystack
- All the needles in a haystack
- Affirmation of no needles in a haystack
- Things like needles in any haystack
- Let me know whenever a new needle shows up
- Where are the haystacks?
- Needles, haystacks -- whatever
Matthew Koll. "Information Retrieval." http://www.asis.org/Bulletin/Jan-00/track_3.html
Becoming More Visible All the Time
Search engines are changing constantly, trying to give access to the invisible web. To keep up with search engine improvements you can have all of these mailed to you:- Research Buzz http://www.researchbuzz.com/
- Resource Shelf http://www.resourceshelf.com/ -- daily tips from Gary Price.
- Search Day http://searchenginewatch.com/searchday/
- Search Engine Watch http://searchenginewatch.com/
- Search Engine Showdown http://www.searchengineshowdown.com/
For tips on getting the most out of Google, read Tara Calishain's Google Hacks, O'Reilly, 2003.
For more on blogs, RSS, and site minders, read Steven Cohen's Keeping Current, ALA 2003.
To find library weblogs to keep you current on developments in librarianship and technology, see Peter Scott's Library Weblogs http://www.libdex.com/weblogs.html