Friday, October 31, 2014

Report from the Digital Collation Conference in Münster

The following is a report from Peter Gurry who attended the Research Summit on Collation of Ancient and Medieval Texts in Münster on 3-4 October.
* * *

A few weeks ago I attended the Research Summit on Collation of Ancient and Medieval Texts in Münster, Germany and I thought I would offer a brief summary of some of the papers. The conference was designed to introduce textual scholars to the ins and outs of electronic collation in general and CollateX in particular. The first day was primarily focused on papers from invited speakers and the second day was set up to be more hands-on with CollateX. Readers of this blog will be interested to know that versions of Collate have been used for the Editio Critica Maior (ECM) since 1 John (published in 2003).

The first presentation was from Caroline Macé of the Universität Frankfurt who spoke about her experience editing Gregory of Nazianzus. She spoke about the choice between collating and transcribing and suggested that the right choice depends on the purpose of the edition being made. For Gregory, she had around 140 manuscripts and decided that transcribing these would have been too much work with too little benefit. Her own preference, in fact, would be to have automated transcriptions from digital collations rather than automated collations from digital transcriptions.

Next up was Peter Robinson whose pioneering work as a student at Oxford in the 1980s led to the first version of Collate (history here). Robinson spoke about misconceptions of digital collation, the main one being the belief that the computer does all the work. In actual fact, Robinson wrote Collate only after becoming dissatisfied with other collation software because he felt it was too mechanical; he wanted something that required editorial input during the collation process itself. He went on to argue that the purpose of a digital collation should not simply be to record differences but to use those differences to understand the relationships of witnesses. Like the CBGM, Robinson wants to use all textual variants for genealogy rather than just a selection. The use of complete collations is what led to a revision in previous genealogies in the recent electronic edition of Dante’s Commedia

The third presentation was offered by Klaus Wachtel and David Parker about their use of Collate for the ECM. For John’s Gospel, the team in Birmingham has incorporated Collate into their own editing software (mentioned here) which allows them to move from transcriptions, to regularization of spelling, to construction of the apparatus, all in one place. It was impressive. In all, Parker said that the new software has made constructing the apparatus faster and more accurate. If my notes are right, he said it took them about 6 months to construct a full apparatus for the Greek witnesses of John.

Barbara Bordalejo presented next on the praxis of collation and gave some fun examples of how hard but also important it can be to electronically encode the complexities encountered in a manuscript. She showed examples of the change in the first draft of the Declaration of Independence including the change discovered in 2010 from “our fellow subjects” to “our fellow citizens”—a small change that makes a big difference! (But given my current home I shall say no more about that.) At the end of her talk there was a brief but lively back-and-forth over whether an expunction dot should be marked in a transcription as a “deletion” or as “marked for deletion” in order to distinguish it from the ways other scribes in the same manuscript deleted text.

The final talk was offered by Ronald Dekker, one of the programmers behind CollateX, who talked about some of the principles behind the software’s collation algorithms. The hardest part, as any human collator knows, is deciding how to segment the texts for comparison; the actual comparison is the easy part. Peter Robinson told us at one point that only about 1–2 percent of his original code was actually for comparing the texts; most of the rest was used to identify which parts of each text to compare with each other (a process known as “alignment”). Dekker illustrated the complexity of programming these decisions by showing that two witnesses with 100 segments (or “tokens”) could potentially produce as many as 10,201 possible points of disagreement (or “nodes”).

Unfortunately I had to catch a train the next morning so I wasn’t able to attend the second day of the conference. But the first day provided a good sense of where digital collation is and how it is being used. And as always, it was good to meet and talk with scholars editing a variety of other texts. The only real disappointment for me was learning that the location had originally been set for Iceland. I guess there’s always next time.

Finally, my thanks to Joris van Zundert and Klaus Wachtel for all their behind the scenes work in organizing the conference for us.

1 comment