Project Runeberg

Från Lysators datorhandbok, den ultimata referensen.
Hoppa till: navigering, sök

http://runeberg.org/img/runeberg.gif

Project Runeberg (runeberg.org) is one of LYSATOR's many projects, a huge website full of classic, out-of-copyright Scandinavian literature, which is being scanned and proofread by volunteers.

Denna information på svenska.

How to get involved

You don't need to be a LYSATOR member to use or contribute to Project Runeberg. Anybody can read, download, or help in scanning, OCR, proofreading, and writing documentation. As much as possible of the user and volunteer interaction is managed with web forms, the rest by e-mail.

The inner circle of the project is a small group of LYSATOR members known as "the editors" (redaktionen). These are members of the LysKOM conference "Redaktion (för) Projekt Runeberg" and have login accounts on the webserver runeberg.org. Volunteers and outsiders can send e-mail to the whole group of editors@runeberg.org

LYSATOR's general assembly elects a project coordinator, which so far has been the founder Lars Aronsson. Current or previous editors are: Martin Bergström, Björn Brenander, Anders Brun, Per Cederqvist (ceder), Lisa Hallingström, Erik M. Johansson, Erik S-O Johansson (esoj), Karl-Johan Karlsson (ka-ka), Hans Persson (unicorn), Joakim Ragnvaldsson (jr), Leif Stensson, and Johan Tufvesson (tuben). Project Runeberg's logotype was designed by Leif Nixon.

History

Project Runeberg was founded on the evening of December 13, 1992 by Lars Aronsson, who needed more content to fill up LYSATOR's Gopher server. Gopher was a text and menu based precursor to the World Wide Web. The main inspiration came from the American Project Gutenberg. LYSATOR's WWW server was started in February 1993 by Per Hedbor, and Project Runeberg gradually moved to the web during 1993 and 1994. Some of the early works published were Fänrik Ståls sägner (March 1993), Bilder ur Nordens Flora (November 1994), and the Bible in Swedish (March 1996). In the early days, Project Runeberg was often shown as an example of how the web could be used.

In January 1996 the Swedish copyright law was extended from 50 to 70 years after the death of an author. Works that should have entered the public domain in 1996, those written by authors who died in 1945, will now remain under copyright until the end of 2015. But in reality, this copyright extension didn't have much impact on Project Runeberg, as there is plenty of other old literature where the copyright has expired. More important was the negative propaganda launched around the same time by literature scholars about textual quality. This forced Project Runeberg in 1998 to develop new methods for publishing scanned images as well as e-text.

On May 11, 2003, all 20 + 38 volumes of the two oldest editions of Nordisk familjebok were digitized, the largest encyclopedia ever published in the Swedish language. With 45,000 pages it then made up almost half of Project Runeberg. On January 18, 2006, Project Runeberg's collections reached 400,000 book pages in digital facsimile.

On December 15, 2004, Project Runeberg moved to its own domain, runeberg.org, which is an alias for the project's own server, officially known as Fatabur.runeberg.lysator.liu.se and kept in LYSATOR's computer room FOO-hallen.

More details are documented in the Project Runeberg Timeline.

Technology

The so-called "editors" are really computer programmers, but Project Runeberg has allowed them to work in an environment that is very different from what most programmers experience. For the first ten years, Project Runeberg relied almost exclusively on C, Pike, and Perl programs that generated static webpages (or Gopher menues). This was combined with a conservative choice of simple file formats: Plain text in ISO 8859-1, a simplified subset of HTML, images in GIF, PNG, JPEG, and TIFF G4. This made the website very fast and reliable. In times when new flashy technologies such as Java and "push" were introduced at a rapid pace, Project Runeberg's electronic texts of classic literature didn't really need any updates or changes. All source text files are kept under the RCS revision control system. Website navigation is not part of the source text, but added automatically by software.

Since 1998 most new books are published as scanned images ("facsimile") in combination with OCR text. The standard format for scanned images is 600 dpi TIFF G4. Volunteers were originally encouraged to submit proofread OCR text by e-mail, which then was manually installed by the editors. But this didn't scale when more text was submitted than the editors could handle. Since 2002 proofreading is instead made through a web form, very much like a wiki. All edits appear in a Recent Changes page, and there are diff and history functions.

During 2005 and 2006, more functions are moved from static to dynamic webpages. New books are published in the UTF-8 (Unicode) character set and old books are converted to this format. Some new books are captured with digital camera as JPEG images instead of a scanner. As high resolution (5 megapixel) digital cameras are becoming affordable and very common, this allows more people than before to contribute.