Volunteers’ FAQ
Project Gutenberg welcomes contributions of eBooks from people with
the interest, time, and skillset needed to meet our submission
standards. Details of the process and the standards are at our
copyright clearance site copy.pglaf.org and upload site
upload.pglaf.org.
Join Distributed Proofreaders, Instead
For most people interested in producing eBooks, we recommend starting
with Distributed Proofreaders. With
Distributed Proofreaders, you can get involved with different portions
of the production pipeline described below. This is a much easier way
to get started, and results in very high quality eBooks.
If you simply want to suggest a book for digitization, DP has online
forums for this, or you can simply send an email (contact information
is on the site).
Distributed Proofreaders maintains canonical guidance on production.
See especially:
Being a Solo Producer
If you might be interested in producing an eBook yourself, without involving
Distributed Proofreaders, here is some guidance. But start with what’s above,
including the DP links.
In a nutshell, the production process typically involves the following:
- Identify a candidate printed book. Confirm it is not already in the
collection, or in process
by other volunteers. Use the Collection Development
Policy to guide you on
eligibility.
- Obtain scans of the book. This may be done using your own scanner,
or there might be online scans available. Scans must come from the
exact same print edition as your copyright clearance.
- Obtain a copyright clearance for the printed book. Usually this is
based on scanned title page and verso page demonstrating the printed
book was published more than 95 years ago. See the Copyright
How-To.
- Perform optical character recognition (OCR) on the scans, to make an
approximate representation of the book in plain text. Depending on the
quality and availability of OCR, you may choose to “type in” all or
part of a text.
- Proofread, proofread, proofread: “Fix” the OCR output by carefully
fixing any errors it made. Remove page headers and
footers. De-hyphenate. Add back italics or other formatting.
- Format: Most people start with HTML, and then derive a plain text
version from the HTML. It is stronly recommended you utilize a HTML
editor or write HTML “by hand” (using a text editor). Most word
processors do not create usable HTML, which could lead to significant
additional work to meet Project Gutenberg’s HTML requirements. The
plain text version may also require some effort to ensure formatting
is of high quality, likely starting with “Save as…” from .htm to
.txt followed by editing of the .txt with a text editor.
- Check, and recheck. The upload site has various tools, including to
test proper conversion to derived formats.
- Upload your work using the copyright clearance key obtained
earlier. The “Preview Submission” capability during upload runs
several tests; review those results before choosing “Submit eBook.”
- A Project Gutenberg production volunteer (known as a “whitewasher,”
after the Mark Twain book) will check your upload prior to posting.
Uploads with major problems (invalid HTML/CSS, non-working or
incorrect links, etc.) will not be published and must first be
corrected by the submitter.
- Once the eBook is added to the Project Gutenberg collection, confirm
it is appearing correctly, and all metadata are correct. Note that
automatic cataloging (i.e., adding metadata) happens when a new eBook
is published, and then is finalized by a human cataloger a few days
later.
- If possible, stay in touch into the future. If we receive errata
reports that require access to source material, or are stylistic or
subjective in nature, we might get in touch to discuss potential
changes.
The importance of using the online tools at the upload
site cannot be overstated. Use these tools
on early versions of your new eBook – even before you are done with
more than a few pages. It is much easier to meet the requirements for
accuracy, validity and quality from the beginning of your work, than
to need to spend hours and days fixing it later.
Each eBook is different. Generally, though, the time spent in
formatting and proofreading far exceeds the time spent in scanning+OCR
or typing.
If you have questions, get in touch! The contact
information page can put you in
touch with the helpdesk.
Thank you for your help and interest in creating new Project Gutenberg
eBooks!