Turn Back Time: The High Street

Tonight is the first in a four part series looking at the historic high street over six eras. Shopkeepers will experience what it was like to live in these eras, and tonight follows an 1870’s Baker, Butcher, Grocer and Ironmonger. The show will be aired on BBC1 at 9pm.

We have already included some items in the archive that are related to the high street, especially in the Vanished Leicester collection, the Historical Archives and among the Ghost signs.

Advertisements

Digitisation & OCR

Disclaimer – This is not gospel. For sensible advice on digitisation go to the JISC Digital Media site http://www.jiscdigitalmedia.ac.uk/

The first phase of this project has required a substantial amount of digitisation. Through working with partner organisations we have covered a wide spectrum of media including moving image, audio, and rare books. In the next few weeks each of the groups will report on their travails but the university library will go first to discuss the digitisation of rare books.

A key item identified for digitisation is the important historical study of Leicestershire – “History and Antiquities of the County of Leicester” by John Nicholls which is a large work (slightly more than A3 and running to 5,000 pages) with around five million words and many illustrations. The work is in four volumes each of which are split into two parts. It is a very important piece of work and quite unprecedented in terms of scope and detail even in comparison to similar works beinging undertaken at the time. It is also quite difficult to navigate due to poor pagination and the sheer size.

The recent BBC series Story of England featured Michael Wood referring to this book several times. What we want to achieve is a digital version of the book which is searchable and available to all online. It would be a nice test of how well this succeeds if we asked Michael Wood to use it for his research, in terms of how his familiarity with the print copy translates to a keyword searchable version.

Due to the size of this book we sent it to a commercial digitisation service and it is due back with the digital files next week. We have seen samples and are very pleased with how it has turned out. There were many worries with the original item as there is significant bleed through on the pages, and the volumes are very unwieldy. One major issue that we have had to tackle is OCR. The typography in contains the long ’s’ which resemble an ‘f’ and this is how conventional OCR software reads them. The typography also includes ligatures which are not well read by OCR. The solution proposed was to use ABBYY X1X http://www.frakturschrift.com/  high-end OCR software designed to cope with old texts. The samples we have seen are fantastic – the quality of the OCR output is very high indeed almost as good as having someone transcribe the text.  As we are using CONTENTdm for the archive which uses the OCR output to build an index for full-text searching we felt this was an important requirement and thus could justify the not insignificant extra cost of this software whihc is charged on a per page basis. The decision was whether we could live with users who search for “Leicester” getting no hits because it is Leicefter throughout the index.  We didn’t think we could.

Website live

We are pleased to say that the My Leicestershire Digital Archive is now available at www.myleicestershire.org.uk

The archive is still a work in progress and more content will be added in the coming weeks and months.  A major software upgrade is scheduled for 8-12 weeks time which will introduce some Web 2.0 features such as comments, rating, and tagging, as well as a complete redesign of the interface.

If you have a moment please take a look and give us your feedback.  It would be much appreciated.