ocr

Early Modern Digital Agendas Summer Institute at the Folger

  • Posted on: 22 June 2015
  • By: admin

Today, Jonathan Hope introduced us to the tools made by the Visualizing English Print project, beginning with Ubiquity, a tool that allows you to perform word frequency calculations resulting in a csv file as well as to create html pages of your plain-text files that visualize the occurrence of the word. Using Shakespeare's plays, people visualized using pen, paper, clay, tape, and sticks.

[pictures coming soon]

I uploaded 7 ECCO-TCP texts to the tool

eMOP Interim Report Now Available Online

  • Posted on: 10 March 2014
  • By: admin

The Early Modern OCR Project (eMOP) has submitted their interim report to the Andrew W. Mellon Foundation, and we now have permission to post sections of that report online. Prepared by PI and IDHMC Director, Dr. Laura Mandell, and eMOP co-project managers for year two, Matthew Christy and Elizabeth Grumbach, this report details work performed during the first year of the grant project.

eMOP news: blog post on post-processing workflow

  • Posted on: 11 October 2013
  • By: admin

Over the past few weeks, IDHMC lead programmer Matt Christy and I have been settling into our new roles as eMOP co-project managers for Year Two of the Mellon grant project. We've also begun looking towards the future, having a series of planning meetings with our post-processing collaborators.

Look forward to several new things from eMOP this fall, including the release of Franken+, a tool created by eMOP graduate student researcher Bryan Tarpley (which allows for easier creation of font training libraries).

More from eMOP: OCR Tips and Tricks

  • Posted on: 31 July 2013
  • By: admin

IDHMC Lead Programmer Matt Christy has been working with Gamera over the past few weeks, as we've chosen three open-source OCR engines to test our theory for improving the OCR of early modern texts. We hope you'll visit the post for a look at the lessons we've learned and the progress we've made.

Especially illuminating is Matt's description of Gamera, the software's OCR toolkit, and the collaborative effort it has taken for eMOP to come to certain conclusions about the effectiveness of the software for large data sets.

OCR Tips and Tricks from eMOP

  • Posted on: 10 July 2013
  • By: admin

IDHMC Lead Programmer and self-dubbed "code monkey," Matt Christy, has posted a series of short reflections about eMOP's Tesseract experimentations. We hope you'll visit the page to get a look at the lessons we've learned, the goals we're trying to achieve, and the tips and tricks we have to offer the OCR community.