[EAS]Internet Archive

Fri Nov 2 22:20:38 EST 2001

Subject:   Internet Archive

(from The Scout Report -- November 2, 2001)
<http://scout.cs.wisc.edu/report/sr/2001/scout-011102.html>

Internet Archive
http://www1.archive.org/index.html

The Internet Archive is the "parent" site for two sites previously
reviewed in the Scout Report, Election 2000 (see the July 13, 2001
Scout Report) and September11.archive.org (see the October 19, 2001
Scout Report). The Archive has been cataloging Webpages since its
inception in 1996, and for their fifth anniversary has opened the
archive to the public by launching their "Wayback Machine." To
operate the "Machine," users type a URL into the search box, which
will call up dated, archived pages of the site. The Internet
Archive holds ten billion Webpages, making it the largest known
database. Since announcing public access to the overall database,
the site has experienced a great deal of traffic. They are in the
process of adding servers, but users should be warned that, in the
meantime, access may be tricky. The Internet Archive is a
nonprofit, which has received funding from a number of sources
including the Library of Congress and the National Science
Foundation.

Copyright Internet Scout Project, 1994-2000. http://scout.cs.wisc.edu/ 

===================================================================
Dear Colleagues -

>From its early honeymoon days in, say, 1994, the Web has changed in
many ways. Dimensions of e-commerce have blossomed--Amazon.com has
had much business from me since early 1996. Other fledging
e-enterprises have expired. Out of intellectual property and
security concerns, many early academic Web sites no longer provide
world access. And in many quarters, thinking has matured about
where the Web is useful, and where the overhead is greater than the
payback. Those MIT lab courses where around 1995/6 I could read
individual students' 'lab notebooks' on the Web (with all the
grumbling about how much work it was to do it all in .html), no
longer offer that level of detail. On the other hand, MIT's
OpenCourseWare Initiative <http://web.mit.edu/ocw/> is a beacon of
academic openness and has been commented on here before
<http://www.yale.edu/engineering/eng-info/msg00831.html>.

In all, about 20% of my 3000-4000 bookmarks have gone stale. Before
I delete one, I now have an opportunity to possibly recover some
information I valued but find to have disappeared, the Internet
Archive. The brainchild of Brewster Kale, whom more than a few
people considered 'cracked' for starting it, has grown into a 100+
Tb(terabyte) database, now accessible publicly. It is an archive of
the Internet from about 1996 on. I dare say it managed to capture
only a small fraction of Internet history, a history of nowhere and
everywhere, but it is the best thing we have to explore the
Internet's historical dimensions.

Whether archiving should be attempted at this central "Library of
Alexandria" level, or be the responsibility of local institutions
and organizations as they evolve their Web resources, is a question
I don't have a good answer to. There are those who claim that
without an Internet Archive, we live in the "digital dark age." If
they read a few good medieval histories, they would have a better
sense of the complex intellectual fabric and sensibilities about
the process of publication that existed in the Middle Ages. (All of
St. Augustine's 5 million words survived, because he understood how
to publish.)

Internet archiving will have interesting legal ramifications, in
copyright and privacy areas. (Btw, many a failed dot.com is busy
trying to sell its customer database to the highest bidder, any
earlier privacy assurances notwithstanding. Bankruptcy sale claims
have, I believe, taken precedent over those of privacy.)

Anyway, to the already complex dimensions of the Internet in the
present, the Internet Archive now adds historical depth. Have fun
exploring, though note that serious access still requires a ssh
user account and Unix programming.

    --Peter Kindlmann