pjk pjk at design.eng.yale.edu
Fri Dec 14 02:20:25 EST 2001

Subject: The Deep Web

Dear Colleagues -

The "Invisible Web" has come up in these mailings before, e.g.

This article "The Deep Web: Surfacing Hidden Value"
is quite revealing. Contrasting "surface Web" and "deep Web"
(rather than "invisible" Web) it claims that search engines with the
largest compass (e.g. Google) index no more than 16% of available
"surface" Web resources. The staggeringly larger "deep Web" is
untouched. Their study of March 2000, published in Nature, estimates

*	Public information on the deep Web is currently 400 to 550 times
larger than the commonly defined World Wide Web.
*	The deep Web contains 7,500 terabytes of information compared to
nineteen terabytes of information in the surface Web.
*	The deep Web contains nearly 550 billion individual documents
compared to the one billion of the surface Web.
*	More than 200,000 deep Web sites presently exist.
*	Sixty of the largest deep-Web sites collectively contain about 750
terabytes of information -- sufficient by themselves to exceed the
size of the surface Web forty times.
*	On average, deep Web sites receive fifty per cent greater monthly
traffic than surface sites and are more highly linked to than surface
sites; however, the typical (median) deep Web site is not well known
to the Internet-searching public.
*	The deep Web is the largest growing category of new information on
the Internet.
*	Deep Web sites tend to be narrower, with deeper content, than
conventional surface sites.
*	Total quality content of the deep Web is 1,000 to 2,000 times
greater than that of the surface Web.
*	Deep Web content is highly relevant to every information need,
market, and domain.
*	More than half of the deep Web content resides in topic-specific
*	A full ninety-five per cent of the deep Web is publicly accessible
information -- not subject to fees or subscriptions.

How to penetrate the Web more deeply? One answer is to use your local
library's expertise. Those of us in academic institutions are
fortunate with regard to such resources. 
The author also sees the need for "server-side content-aggregation
vertical 'infohubs' for deep Web information to provide answers where
comprehensiveness and quality are imperative." Escalating copy-right
and fair-use restrictions will impede such developments, but again I
expect libraries to be leaders in this area.  --PJK

