The following abstract has been [Edit: March 2017:] accepted for SHARP 2017: Technologies of the Book (9-12 June, Victoria, BC). It will be part of a panel under the common title “Reading and writing to disk: Sheffield and Books in the Digital Humanities”.
The Impossibilities of Reading Big Book Data: Studying Concepts and Context with EEBO-TCP
Computational techniques enable humans to seek out patterns in collections of texts that exceed what one human can read. This permits the identification of textual and linguistic phenomena that may otherwise defy human recognition. It first requires texts in suitable digital format, texts that are “machine readable”. However, the use of the verb “read” to describe the discrete activities of human and machine can mask considerable difference between the two audiences’ needs.
Emerging from a collaborative project that seeks to identify and trace the movement of paradigmatic terms in early modern English, this paper considers different ways of moving from the products of machine reading to the work of human reading (and back again), weighing up their strengths and weaknesses in the context of this work. The paper will respond to questions such as:
- What may be gained, lost, (or simply hidden) when historical texts are prepared for computational analysis?
- What do the different audiences (computers, humanities scholars) not read?
- How can humanities scholars test claims about collections of texts that are too big to read?
- How does one systematise close reading with big data?
Reflections will be grounded in current work with early modern English text collections, particularly the remediation of Early English Books Online by the Text Creation Partnership (EEBO-TCP).