Vital Records of Scituate
Massachusetts to 1850
Description of the project


D. Pane-Joyce
Version 1.0, Jan 2002
Version 1.1, Apr 2003


Table of Contents for the entire project


Description of the project. This project is the Annotated Vital Records of Scituate. The first phase, described below, is to transcribe the Vital Records of Scituate Massachusetts to 1850 word for word for the web. There are two volumes, the first for births, the second for marriages and deaths, making a total of about 900 pages. There are numerous advantages of a version of the vital records on the web. Three main ones are:

What to do after Phase 1? Phase 1 includes no annotations, but annotations are what interest me. Here are a few things I'm considering:

These possibilities aren't all equally easy or important, and I haven't decided what to do. In any case, all annotations will be presented in color so it will be easy to distinguish original content from what's been added. And, of course, references will be cited for any such additions. I'm sure to start on one or more of these inclusions before Phase 1 is completed. (Transcription is particularly boring).

Phase 1, the transcription. the transcription phase will take me several months. This first installment (version 1.0, Jan 2002) of the vital records included over 200 pages of text, about 25% of the two volumes, primarily the pages that include families of personal interest. The second installment (version 1.1, Apr 2003) extends that coverage to about 40%. It's going a little slower than I had imagined since I haven't gotten much feedback on the project. I've only gotten two email messages since I put up the last version, over a year ago. In comparison, I get two messages a day on my genealogical reports I've got up.

As source material, I've taken photocopied pages and scanned them into image files. I've also used the the scanned images at http://genweb.net/~blackwell/books.html. These images were then OCRed (Optical Character Recognition) into text files. The error rate of the conversion was very high. In order to reduce the error rate, some of the images had to be edited with Adobe Photoshop to align the text horizontally. Still, the OCR errors were innumerable. I think I could have reduced the errors by (1) scanning the original text for the images rather than scanning photocopies, (2) scanning for the images as a finer setting (smaller pixels), and probably (3) using better software to perform the OCR.

Here is an example of text showing it (1) right after OCRing, and (2) in the final version.

The text right after OCRing:

The final version:

As you can see, there's a lot of work to do to clean up the text. This example text is typical. Some were in a better state, a few so much worse that I had to type them in myself. Each page takes 15 minutes or more of work. Most of that is correcting OCR errors, but there are also markups that have have to be done (such as boldface and tags for links), proofing, and keeping track of the state of every page. After editing a page, I reprinted it and compared it to the original. I did a fairly good job of proofing, but I'm sure there are plenty of errors that remain. If you have any doubt about something here, check the original printed version, or the images of it, or even better, check the microfiche of the original records. Let me know if you find any errors.


D. Pane-Joyce
Scituate Genealogy
© Jan 2002, Apr 2003. All rights reserved
Located at http://aleph0.clarku.edu/~djoyce/gen/scituate/VitalRecords.html