WHAT'S NOT ON THE NET
by Marylaine Block.
I have a little chart, designed to blow the minds of students whose first impulse, with any assignment, is to head to the net, and maybe also those folks who ask whether we need libraries now that we have the internet. The chart is my guess -- and I accentuate guess, because I'm not sure it's even possible to get exact numbers -- about the distribution of all the information that has ever existed.
You will note that I'm guessing the net represents, at most, perhaps 12% of the world's accumulated store of information. Here is my reasoning:
I figure the largest single source of information is in the form of government documents -- federal, state, local, regional, international. Every single governmental unit -- and there have to be millions of them -- has been gathering and producing information since it first existed. And keep in mind that the first recorded inscriptions on stone a couple of thousand years ago seem to have been records for tax collection.
Think about it. Start with your local government, which collects birth, death, marriage and divorce records, probate reports, tax records, property assessments, housing code and health code inspection reports, zoning information, crime reports, city budget information, court proceedings, annual reports from each department, transcripts of city council meetings and court proceedings, water quality data, emergency plans, statistics on the number of police, fire and ambulance calls, tapes of calls to 911, and more.
Multiply that by every town in this country, then add the data collected at county, state and federal level, where responsibilities are wider, budgets are greater, and the number of agencies is larger: demographic data, business and professional license filings, tax data, drivers license records, public health statistics, the accumulated body of laws, codes, and court decisions, environmental information, road and traffic statistics, regulations, drivers license registrations, tourist info, sponsored research, and on and on.
Now, compute the same for all the countries in the world and their municipalities. That's a whole lot of data. And though governments have been very good about putting current information online, damn few of them have gone back to input past data retroactively, and doing so would be incredibly expensive.
Worse, some of them don't understand the historical value of their pages, and when they put the current information online, they wipe out the previous data -- on January 20, 2001, for instance, five minutes after George W. Bush took the oath of office, every single document from the Clinton administration White House page had been erased and replaced with the new Bush White House page. [Happily, thanks to the National Archive and Records Administration, those pages have been preserved.] That's one reason governments should value their libraries, which preserve those historical government records.
I'd guess the second largest amount of information is in the form of books. After all, we've been producing them since the 1400s and they've added up over time. Last year, American publishers alone published over 100,000 books. The OCLC database of items held by some 20,000 libraries lists over 20 million item records, most of them books. Most of those books will NEVER be digitized, because of copyright, expense, or lack of interest. The Online Books page http://digital.library.upenn.edu/books/, which tries very hard to list all books available full text on the net, links in about 16,000.
The third largest amount of information, I figure, comes from periodicals -- newspapers have been around since the 1600s, and magazines and journals came along soon thereafter. There are at least 150,000 magazines and journals being published right now, and that doesn't count the thousands of magazines and journals that are no longer being published but whose backfiles continue to exist in libraries.
That also doesn't include newspapers. Let's make another conservative estimate here: I'll guess that magazines, which might be weekly, biweekly, monthly, bimonthly, or quarterly, contain an average 10 articles per issue multiplied by 8 issues per year, for a multiplier effect of 80 articles per year of existence per magazine or journal. The multiplier for newspapers would, of course, be higher, since many newspapers publish daily; let's guess that each contains, say, six articles per page, multiplied by, let's say, an average of ten pages of non-advertising content. Over four hundred years, we're talking about a whole lot of articles.
Now, some of those articles are available on and by way of the net. Some periodicals put their current issues online, and many periodicals are available full-text in commercial databases delivered by way of the net. But these files are incomplete -- few of the periodicals from before 1980, let alone back as far as the 1700s, have been digitized. In most cases you'll need to go to libraries if you want to read newspaper and magazine and official accounts of, say, the French and Indian wars, the 1810 census, the flu epidemic of 1918, the history of your local diocese, or the presidential campaign of 1884 ("the continental liar from the state of Maine" vs. "Ma, Ma, where's my pa?" "Gone to the White House, ha, ha, ha!").
Furthermore, most of those databases of articles are not free. People only have access to them by way of private subscriptions or through their libraries, which pay thousands of dollars for each database they license for their users.
The smallest, unitemized chunk of the chart is composed of things like movies, audio and video recordings, conference papers, dissertations, diaries, letters, photographs, etc.
That leaves the internet, which I figure, with over 3 billion indexed pages, is the fourth largest information source. It includes some of all of the above, but only a small fraction of them. Because of the inordinate cost of digitizing, lack of interest, and copyright issues, most of the world's information will NEVER be on the internet, or at least not for free.
However, if you look at the Berkeley "How Much Information" project, at
projects/how-much-info/charts/charts.html, you'll notice that they think the largest amount of information produced in a given year is actually in the form of Word documents and e-mail.
Which is to say, most of the information produced in any given year is locked inside people's brains and hard drives. (And indeed, when companies that had been housed in the World Trade Center had to reconstruct their businesses, they found that, though they were able to restore data that was backed up on computers elsewhere, the human knowledge they lost was irretrievable.
So if I were to construct that pie chart again today, I'd have to redraw it to include individual minds. The internet is a great tool but it not only hasn't replaced books and periodicals, it hasn't even replaced our rolodexes and our phones.
Oh, and the name of my pie chart? Let's just call it CONTENTS OF THE WORLD'S LIBRARIES.
* * * * *
URGENT REQUEST FOR INFORMATION: For the book I'm writing, I need to know how librarians are dealing with the holes in databases caused by the post-Tasini removal of articles from databases. Are you doing more binding or buying more microfilm instead of relying on the databases? Are you providing notice to users of the databases that articles may have been removed? Have you come up with any other solutions? Please let me know. Thanks for your support.
* * * * *
In the world around us are things that we, or other human beings, have created -- things which play a similar role to intelligence but sit outside us. They are things like libraries, books, and the internet . . . The Discworld concept of L-Space -- library space -- is similar: it's all one thing. These influences, sources of not just information but of meaning, are "cultural capital." They are things that people put out into the culture which can then sit there, or even reproduce, or interact in a way that individuals can't control.
Terry Pratchett, Ian Stewart, and Jack Cohen. The Science of Discworld. Ebury Press, 2000.
* * * * *
You are welcome to copy and distribute or e-mail any of my own articles for noncommercial purposes (but not those by my guest writers) as long as you retain this copyright statement:
Ex Libris: an E-Zine for Librarians and Other Information Junkies.
Copyright, Marylaine Block, 1999-2002.
[Publishers may license the content for a reasonable fee.]