22 April 2015: It was a bright sunny morning and the taxi driver was keen to impart his tricks for the best route into town. (Look, no traffic lights!) It was also the day I was offered the job on Linguistic DNA.
Before Linguistic DNA, I looked to EEBO-TCP to provide context for shifts in the language of bible translation. It was quantifiable language data, enabling me to work out a loose comparison between the first century of English print (-1569) and the fifty years that followed (-1619) and so sample language change between Geneva and King James. This was especially applicable to my work with translations of the Hebrew chayil. Forms of valour are rare in (what survives of) English (in print) pre-1570, but “mighty man of valour” became the dominant translation for the Hebrew phrase ish gibbor chayil in the King James Version of the Bible. There are thirty times as many instances of valour in the later period, though the document count increases only three-fold. The astute will wonder whether this might be a case of documents getting longer or using the term more densely. At best, statistics raise questions rather than providing answers, and it’s not my intention to deal with the questions my deliberately provocative statistic raises here.
22 April 2016: A year on, LDNA has strategies to query the fullness of EEBO-TCP (not to be confused with the fullness of English print from the period) and illuminate not only the distribution of individual lemmas, but collective relationships between groups of words; as well as plentiful ideas to expand those queries. As we move closer to a point of analysis and interpretation, we need to treat our processor’s outputs with ever increasing awareness of the heterogenous nature of EEBO-TCP, the diversity (and sometimes perversity) of what’s in it. (LDNA blog readers can expect reflections on the representativeness of EEBO in a future post.)
Personally, I look forward to scrutinising one of the special issues affecting our data: English as a vehicle for translation. Many of the longest texts printed in sixteenth-century England were translations (and the shorter ones too!). It is in translated texts that a majority of the early “valours” can be found. Is this about genre? (The popularity of knights’ legends and their association with valour and its cognates?) Or the priming of the translator’s lexicon, likely to use Latinate language where French had gone before? Will our analysis show up pairings in line with Gideon Toury’s hypothesis—that where the target language lacks prestige (and the translator confidence) doublets are used to express an idea that took up only one term in the source?
Valour, I should note, is not the kind of epi-concept Linguistic DNA expects to expose; but if a word is known by the company it keeps, it is relevant to ask how much of that company migrates between languages, and to wonder how we (in an evolution of the present project, perhaps) can assess that. How might computational approaches take into account the prevalence of translation in early modern English? A thorough answer is I suspect beyond the scope of LDNA; but in the next few months, in consideration of the contextual research theme (RT1), I shall continue to mull over the question of how translated texts contribute to concept formation—and have pledged some first reflections on this at SHARP 2016.
 The quantity of printed works increases throughout the period. Anupam Basu’s image effectively illustrates the relationship between what has survived to be catalogued in the ESTC and what has been transcribed for EEBO-TCP, as well as the growth in both across time.
 It would be better to observe that, those who want to take token counts into question are presently well advised to make use of Andrew Hardie’s CQPweb interface—while noting that his EEBO-TCP v 3.0 contains rather less of EEBO-TCP than is now available to UK universities via JISC.
 See G. Toury, Descriptive Translation Studies and Beyond (John Benjamins Publishing, 1995) 102-112.