July 29, 2002
WHAT IS NOT ON THE NET
Governors whose budgets are in trouble, which is to say, every governor, these days, have been looking for easy budget items to cut. That's why the Governor of Washington zeroed out the entire state library's budget; when his legislature restored it with reduced funding, he also confiscated some of that funding for his rainy day fund. He's not the only one, either. The state of Minnesota eliminated its state library, and Colorado and Arkansas cut huge chunks out of theirs. Their reasoning was that, after all, who needs a library, anyway, when everything's on the internet?
Sadly, that's such a common notion that it threatens the existence of the libraries that store the heritage of human culture, the 90% or so of it that is not on the net and in all probability will never be on the net.
I have a little chart, designed to blow the minds of students whose first impulse, with any assignment, is to head to the net. It's my guess -- and I accentuate guess, because I'm not sure it's even possible to get exact numbers -- about where you will find what percentages of all the information that has ever existed.
You will note that I'm guessing the net represents, at most, perhaps 15% of the world's store of information. Here is my reasoning:
I figure the largest single source of information is in the form of government documents -- federal, state, local, regional, international. Each and every governmental unit -- and there have to be millions of them -- has been gathering and producing information since it first existed. And keep in mind that the first recorded inscriptions on stone a couple of thousand years ago seem to have been records for tax collection.
Think about it. Start with your local government, which collects birth, death, marriage and divorce records, probate reports, tax records, property assessments, housing code and health code inspection reports, zoning information, crime reports, city budget information, court proceedings, annual reports from each department, transcripts of city council meetings and court proceedings, water quality data, emergency plans, statistics on the number of police, fire and ambulance calls, tapes of calls to 911, and more.
Multiply that by every town in this country, then add the data collected at county, state and federal level, where responsibilities are wider, budgets are greater, and the number of agencies is larger: demographic data, business and professional license filings, tax data, drivers license records, public health statistics, the accumulated body of laws, codes, and court decisions, environmental information, road and traffic statistics, regulations, drivers license registrations, tourist info, sponsored research, and on and on.
Now, compute the same for all the countries in the world and their municipalities. That's a whole lot of data. And though governments have been very good about putting current information online, damn few of them have gone back to input past data retroactively, and doing so would be incredibly expensive. Worse, some of them put the current information online and wipe out the previous data -- on January 20, 2001, for instance, five minutes after George W. Bush took the oath of office, every single document from the Clinton administration White House page had been erased and replaced with the new Bush White House page. That's one reason governments should value their libraries, which preserve those historical government records.
I'd guess the second largest amount of information is in the form of books. After all, we've been producing them since the 1400s and they've added up over time. Last year, American publishers alone published over 100,000 books. The OCLC database of items held by some 20,000 libraries lists over 20 million item records, most of them books. Most of those books will NEVER be digitized, because of copyright, expense, or lack of interest. The Online Books page http://digital.library.upenn.edu/books/, which tries very hard to list all books available full text on the net, links in about 16,000.
The third largest amount of information, I figure, comes from periodicals -- newspapers have been around since the 1600s, and magazines and journals came along soon thereafter. There are at least 150,000 magazines and journals being published right now, and that doesn't count the thousands of magazines and journals that are no longer being published but whose backfiles continue to exist in libraries.
That also doesn't include newspapers. Let's make another conservative guess here: figure magazines contain an average 10 articles per issue multiplied by 10 issues per year, for a multiplier effect of 100 articles per year of existence per magazine or journal. The multiplier for newspapers would, of course, be higher, since many newspapers publish daily; let's guess that each contains, say, six articles per page, multiplied by, let's say, an average of ten pages of non-advertising content. Over four hundred years, we're talking about a whole lot of articles.
Now, some of those articles are available on and by way of the net. Some periodicals put their current issues online, and many periodicals are available full-text in commercial databases delivered by way of the net. But these files are incomplete -- few periodicals from before 1980 have been digitized. You'll need to go to libraries if you want to read newspaper and magazine and official accounts of, say, the 1810 census, America's civil war, the flu epidemic of 1918, the history of your local diocese, or the presidential campaign of 1884 ("the continental liar from the state of Maine" vs. "Ma, Ma, where's my pa?" "Gone to the White House, ha, ha, ha!").
Furthermore, most of those files of articles are not free. People only have access to them by way of private subscriptions or through their libraries, which pay $20,000 and up for each database they license for their users.
The smallest, unitemized chunk of the chart is composed of things like movies, audio and video recordings, conference papers, dissertations, diaries, letters, photographs, etc.
That leaves the internet, which I figure, with over 3 billion indexed pages, is the fourth largest information source. It includes some of all of the above, but only a small fraction of them. Because of the inordinate cost of digitizing, lack of interest, and copyright issues, most of the world's information will NEVER be on the internet, or at least not for free.
However, if you look at the Berkeley "How Much Information" project, at
http://www.sims.berkeley.edu/research/projects/how-much-info/charts/charts.html, you'll notice that they think the largest amount of information produced in a given year is actually in the form of Word documents and e-mail.
Which is to say, most of the information produced in any given year is locked inside people's brains and hard drives. (And indeed, when companies that had been housed in the World Trade Center had to reconstruct their businesses, they found that the human knowledge was irretrievable, though they were able to restore data that was backed up on computers elsewhere.
So if I were to construct that pie chart again today, I'd have to redraw it to include individual minds. The internet is a great tool but it not only hasn't replaced books and periodicals, it hasn't even replaced our rolodexes and our phones.
Oh, and the name of my pie chart? Let's just call it CONTENTS OF THE WORLD'S LIBRARIES.
NOTE: My thinking is always a work in progress. You could mentally insert all my columns in between these two sentences: "This is something I've been thinking about," and "Does this make any sense to you?" I welcome your thoughts. Please send your comments about these columns to: marylaine at netexpress.net. Since I've written a lot of these, some of them many years ago, help me out by telling me which column you're referring to.
I'll write columns here whenever I really want to share an idea with you and can find time to write them . If you want to be notified when a new one is up, send me an e-mail and include "My Word's Worth" in the subject line.