To step back in time to 1450, Johann Gutenberg started a massively collaborative social media phenomenon when he invented the printing press. It freed knowledge from monasteries, clergy and wealthy aristocrats. A man with a bit of capital could print easy-to-read books, newspapers, and pamphlets that almost anyone could buy or distribute for free. Once almost anyone could learn to read, the church or government lost their monopolies on information. It was a revolution. By 1776, the coffee houses of London were raucous networks of idea exchange about the latest news in print. Lloyd's of London coalesced as a collective securitization of merchant shipping risk. Postal services had volumes of new mail to take from person to person.
Now that digital information is freed from paper, just as when printing freed it from parchment, old books are being freed from libraries. Printing saved knowledge from isolation and the risk of destruction, and digitizing knowledge has an almost infinitely greater potential.
Project Gutenberg starts with scanned documents, the same as Google Books. Image scans, like PDFs or specialized viewers, have their limits. Books available unless you have the right program or device. What happens when formats become obsolete? Because Project Gutenberg renders documents in text, they can be viewed or converted almost universally. You can download nearly any out-of-copyright book you can think of, for free, from the Project Gutenberg library on the web. Now the project is moving into more obscure and specialized works, those even more in danger of loss.
Getting from scan to text with people
To get from scans to text is rocky road. OCR (optical character reader) software is unable to make sense of broken type, smudges on paper, and so on. Here's where the massive collaboration comes in: Thousands of human beings, working for no pay, are inspecting the defect-ridden text rendered by OCR, stripping out page headers and footers, and opening up knowledge to anyone with internet access.
The Project Gutenberg Distributed Proofreading (PGDP) network has been around for a while. It has released 18,793 meticulously inspected books to Project Gutenberg. 3,016 are in progress, 688 currently being proofread. How big is the effort? Of nearly 100,000 registered users, 535 people helped in one typical, more than 1,000 in a week, and more than 2,000 in July 2010.
|Distributed Proofreaders cumulative results|
|Not fancy, but functional: The proofreading interface|
It is a culture as much as a structure for a work process. People drawn to the community have a shared purpose: getting out-of-copyright books into the public domain, usable in any format on any reader, preserving knowledge and classic works. While the number of pages in the works at any given time is immense, the goal requested by the community from any one proofer is a page a day. It's a takt that is easy to swallow.
A sociable experience
As you peel back the layers of the work, the social media levels begin to reveal the personalities of the leaders, as well as the new entrants and the workers who are developing their skills. A conversation on a project discussion board or in the wiki starts to feel like friendship after a while, and our emotional satisfaction brain circuits begin to create engagement. The member becomes part of a team.
In a wiki, people don't always explore the “history” and “discussion” tabs, but they are where the social richness is found. In the Distributed Proofreader wiki, the work process is discussed at high level of knowledge and conscientiousness, a sense of holding the members to high standards, and I'm in awe of the people who make such a huge commitment to creating defect-free texts to share with the world.
From a lean perspective, you might ask how much inspection is appropriate. PGDP has three levels of proofing, two levels of formatting, and a couple more levels of rendering before a text is released. There is a way to record whatever is changed at each level, to show whether proofreaders are just changing the same word back and forth, and to identify give feedback to proofreaders who need more training on the standard. There are a couple of experiments that carry a text through many rounds (the one I saw is up to 10) to see how people behave when given an endless opportunity to change things. At all times, the original scan is available and the project manager usually has the paper text for validation.
What can we in the lean community learn from the Distributed Proofreaders? Some of our projects produce documents, so there is a direct set of lessons. We can also learn how much work people are motivated to contribute when they believe in the importance of the result. Motivation comes out of interaction too. There is an extraordinary emphasis on fairness, civility, and respect for people that is not found in an old-style business culture. And the technology does not have to be the latest in order to produce a good project experience.
In fact, lean tells us that when team members work in close proximity, frequent face-to-face interactions, white boards, paint on walls, and pictures are all more effective than computerized "knowledge management systems" that become knowledge prisons.
But as our teams are forming across functions, organizations, and industries, we need online tools to collaborate, conduct team projects and interact. We won't all have satellite conferences, vast simulations, and virtual worlds to work within.
Project Gutenberg took years to create, and we need to move faster. What open tools do you use for team projects? The venerable conference call? Webex or Go-to-Meeting? Google Docs (my teams use them a lot)? Dropbox? Sharepoint? Have you found something else?