Thursday, December 21, 2017

The problem with digitizing our discipline

There is much rejoicing about the benefits of computer technology for the humanities in general and for New Testament textual criticism in particular. I too rejoice as I suspect you do. Who among us is not thrilled, for example,  by the ease of access to so many manuscript images or by the wonderful NTVMR or by the fact that the texts of our modern Greek New Testaments are all freely available online?

But here I want to sound a warning about computer technology. We all know how fast technology changes. Probably none of you have a flip phone any more or use floppy disks to save your work (although I know Maurice has some truly old school tech he still works with still). Technology changes rapidly, usually for the better. But therein lies the problem. Technology changes rapidly. That means that tools that were great five or ten years ago may be difficult or even impossible to use now.

This is one of my fears about digital critical editions. The new digital ECM may be great now, but will it be great in ten years? Maybe, but how do we know? We can’t, because we don’t know the future. There is always talk of future-proofing our digital work. But let us be honest: that is a myth. When I worked on the CBGM, there were parts of the software for the Catholic Letters that only ran  on Mac OS 9. What happens when the computer running that defunct operating system dies?

Nor is the internet the solution. Look, for example, at the genuinely wonderful Codex Sinaiticus website. When it came out in 2008, it was the baddest manuscript viewer in town. You could zoom in and out, switch to raking lighting, and even select words from the transcription and watch them be highlighted right there in the image—it was great. And most of it still is great.

But when I use the site in Chrome now, look at what happens.

The zoom disappears in Chrome
The zoom function does not even show up. I have to move my mouse around until it turns into a hand and then I have to guess how far I am zooming in because there is no visual measurement.

Things are better in Microsoft’s Edge browser, but still a little off.

The zoom is not quite right in Edge, but it is usable
Compare this to Tischendorf’s facsimile of 01 which, as a technology, works just as well today as it did on the day it came off the press in 1863. Obviously, the website for 01 has major advantages over Tischendorf’s facsimile. There is no question about that. But that is not my point. My point is that the usability of Tischendorf’s edition has aged less in 150 years than the Sinaiticus website has in 15! Will the Sinaiticus website work at all in 30 years? 50? 100? Who knows.

What I do know from designing websites for the last 17 years is that there is no way to guarantee that a site built today will still be usable in 10 or 15 years. And usually, the more bells and whistles a site has when it’s built, the worse it ages. Part of this is a matter of funding. It is easier to fund an exciting new digital project than to maintain or update an old, flagging one. But I do not see that changing any time soon.

So the problem remains and it is serious one we all need to think more about in our mad dash to digitally revolutionize our discipline. Are there still things that are better in analog than digital? If so, what are they? Are there things that can be done digitally but shouldn’t be? How can we ensure that our best digital work is still accessible in 100 years time? These are just some of the questions we need to ask ourselves.


  1. Finally, a post on ETC that I can comment on! The phenomenon you describe is very real; many technologies are becoming obsolete in a hurry. The risk of that happening to very popular file formats such as Word or PDF documents are somewhat smaller; there is usually backward compatibility built in newer versions of MS Office - but how long will MS Office be around? Who knows? Probably decades, but there's no way of knowing for sure. Yet, I think that you'll be able to use those file formats for a long time to come.

    Also an issue: 'link rot'. What happens if you link to someones' online article in your blog post, and that person removes their online content some time later? The link in your blog post still exists, but goes to.. nothing. That could even happen if the technology behind popular web sites changes (say, switches to a more modern viewer).

    Technology is a wonderful enabler of things that might otherwise not be possible (or very hard to do), such as CBGM, but it is not without its own risks.

    1. Link rot should be a major concern as well. What happens in 10 years when the BnF or the Vatican change all their servers and the URLs to these wonderful images changes? I know Troy Griffitts has had to deal with this with the VMR at times. So some of this is a matter of constant maintenance. But therein lies the problem. I suppose the same problem faces libraries, but print books are very low maintenance.

  2. The best data preservation strategy is the same as it's always been: multiple and continuous copying. The internet archive ( looked into what physical formats were needed to preserve it's data long term. The best answer they came up with is 'keep multiple copies' make regular copies frequently.

    This means that while tools like the VMR or the various library websites are great, at some point we need to have multiple copies of all these things. So the VMR, Bnf, Vatican, British Library, CSTNM, etc should have copies of each others manuscripts and meta-data. Ideally the meta data should be downloadable so that may various groups can build their own tools to access it, but I think at this time libraries would be unwilling to release their copyrights on their digital copies.

  3. As a former librarian here is another part of this issue to consider apart from the technical issues – half lives of information. Books are written, read and cited, read by others who are following up citations, and then either (rarely) become a long term standard work or (commonly) become a work that had relevance for a somewhat limited period but over time becomes less cited and less read. In STEM subjects this period of relevance – the half life where half of all reading and citation of the book occurs – is very short, in the humanities it may be over a decade with a long tail off period. After a few decades the book is either weeded from collections or relegated to off site closed access storage. If later the information in the book becomes relevant again it can usually be located in a few libraries.

    In contrast to the long half life of information in humanities books, information in online items such as blog posts, forum posts, web pages and the like have a much shorter half life, often being out of date within a few days. Some of the information will be archived in multiple places but not all and much information within a decade of its creation only exists in a single computer maintained by a single person or organisation. Eventually the computer fails, or the organisation decides the information has no value to them and is not worth the cost of storage, or the person looses interest in the apparently out of date information, changes their role, retires, or dies, nobody maintains the information and it ceases to exist. If later the online material becomes relevant again – too bad, it can’t be located anywhere.

    Bob Relyea is correct that it is a good idea to have multiple and continuous copying, but who is responsible for the labour, management, testing, hardware, software, etc. to do this across the full range of online resources including many that seem at the time to be of little long term relevance? Who will continue with this responsibility not just for a few years, or a few decades, but for hundreds of years?

  4. The angst expressed here is accompanied by some very muddled thinking.

    There are basic formats which do not age and which do not go away. Plain text, XML, USFM for scripture, standard compliant HTML. None of that is suddenly rendered inaccessible by technological progress. It is using fancy, proprietary technologies which causes the issues described.

    And this not new. Closed source, proprietary formats and arbitrary copyright restrictions have been criticised for exactly this reason for as long as computers exist. Richard Stallmann wrote his General Public License and created Free/Libre Software movement in the 1970s to counter the risk of data being rendered inaccessible due to programmes becoming unavailable. Larry Lessig created the Creative Commons movement and its free licensing regime in 2001 in order to ensure that created works and texts remain always usable, even when the rights owners are not anymore around. OSIS XML was created in the last decade of last century in order to.create a reliable international standard for encoding Scripture, including all its variations, its apparatus etc in an Open and lasting fashion. Around the same time SIL, UBS and others agreed on USFM as another equally accessible and lasting standard for encoding Scripture, particularly translations. No text encoded in this fashion is inaccessible now nor will it ever become inaccessible. The tools are to.handle such texts are well documented, ubiquitous and often open source and freely licensed themselves.

    It is shortsightedness and unwillingness to do one's homework which leads to inaccessible websites and texts. Not technological progress.

    Peter von Kaehne

    So, th

    1. Yes, some formats have a longer life than others and HTML and XML fit that category. For now. But who wants to make a website with static HTML nowadays? And even XML has to be giving specifications for something like TEI. But certainly TEI has not been static.