Thursday, March 30, 2006

Unique variants per manuscript page

In a recent message to Wieland Willker’s textual criticism list (here), Daniel Wallace estimates the number of pages of extant Greek NT manuscript at 2.5 million.

In a Kenneth W. Clark memorial lecture of 1997, Bart Ehrman (here) says:

‘No one knows for sure how many differences there are among our surviving witnesses, simply because no one has yet been able to count them all. The best estimates put the number at around 300,000, but perhaps it’s better to put this figure in comparative terms. There are more differences among our manuscripts than there are words in the NT.’

What is the basis of Ehrman’s calculation? There is no further reference.

If Wallace’s and Ehrman’s estimations are both correct then it strikes me that the overall rate of unique variants per manuscript page is incredibly low. This is only just above one unique variant per 10 ‘pages’ (which I presume in Wallace’s original context, i.e. of photography, means either a single side or the facing sides of an open codex). Obviously the rate of unique variants per page is not the same as the rate of errors since the same error could occur independently in two manuscripts and then only count as one unique variant. Nevertheless, I still find the ratio rather low. Has Ehrman been too conservative?


  1. PJW asked: 'Has Ehrman been too conservative?'

    ha ha ha. good one.

  2. This penetrating observation is exactly what needs to be asked when Ehrman (and others) makes statements like his that use a numerical estimate of the total number of variants among all extant mss, and draw from those estimates (which may be correct for all I know) conclusions that are completely fallacious.

    In one of his anti-majority text articles Wallace made the same fallacy, trying to create the impression that the byzantine text is not very homogeneous after all, because Kurt Aland had written (prior to the publication of the relevant volume of Text und Textwert) that a short section of the NT exhibited some enormous number of variants among Byzantine mss. Aland's quote was similarly misused by several other authors since Wallace. Enter Maurice Robinson, who presented an article at ETS in 2004 where he outlined the actual variants listed in T&T after it had been published and showed that in the very passage in question the byz was actually very uniform, having only a handful of very minor variants that were repeated in various permutations accross the hundreds of byzantine mss.

    Both of the above examples of this fallacy are cases of scholars using the data tendentiously rather than scientifically. In Wallace's case it was manipulated so as to undermine the impression that the byz is very uniform--which it is, regardless of it's quality. In Ehrman's case it was manipulated to make the NT mss appear as unreliable witnesses to the original--which also is inaccurate, regardless of how much he dislikes it's Author.

    Kudos to PJ for avoiding the errors of Wallace and Ehrman, who in these cases drew conclusions from text critical data that were in fact pretermined by their theologies.

  3. Eric, thanks for the 'kudos', but I'm not sure I want it at Dan Wallace's expense. His '2.5 million' pages estimate I take to be simply a practical estimate. As for the nature of the Byzantine text it would be good to have a debate about this some time.

    Obviously the way Ehrman brings the number of variants into connection with the number of words in the NT does not show very much. However, that does not mean that his 300,000 figure is wrong. Has someone before him laid out the basis for such an estimate?

  4. Here's a stab at an estimate:

    James 1 has 435 variants from its mainline text in the ECM. Multiply that figure by 250 (for the chapters of the NT) and you get 108750 variants.

    Luke 10 has approximately 575 variants from the mainline text of IGNTP (but this does not include many of the nonsense readings or spelling variants, unlike the ECM which lists them all). Say we guess there are 1000 variants from the mainline text, we get a figure of 250,000 variants from the Original Text if we extrapolate this figure.

    However, if we take the Luke 10 figure as representative of the textual variation in the 117 Gospels and Acts chapters, and the James figure as representative for elsewhere, we get about 117,000 variants in the Gospels/Acts and 57,000 variants in the rest of the NT. This yields a total of 175,000 variants in total.

  5. Thanks, Andrew, for these estimates. I'm still inclined to think that you, like Ehrman, are being rather conservative, since there are presumably variations in minuscules that have never been read and variations that have been considered too small for the Editio Critica Maior, which I can hear Klaus Wachtel saying is not an Editio Critica Maxima.

  6. FWIW PJ, I wasn't addressing either Wallace's estimate of ms pages or Ehrman's estimate of total variants. Rather I was pointing out the way the fallacy you debunked (using statistics about the total numbers of variants among large numbers of mss without attention to how many variants are actually the same one repeated multiple times) has been used tendentiously. Textual criticism is an area where scholars in the humanities need to properly apply mathematics, which unfortunately does not always happen.

  7. Exactly. More variants than verses in a single copy of the NT, but not more variants than pages of NT text.

    Big difference.

  8. PJW said: "there are presumably variations in minuscules that have never been read and variations that have been considered too small for the Editio Critica Maior"

    The first presumption is entirely correct, but the second I think not, unless we count itacistic variation; the ECM is "maxima" in the sense that it records wholly those MSS that were selected for inclusion. One problem with this procedure is the need on the part of the editors to discriminate between variant and error, which they sometimes fail...

    In my own "Editio Critica Maxima" of Jude I have found many new variants, (none of which I can say have any claim of being original, but significant in terms of reception and iterpretation).

  9. Tommy is right about the ECM being mind-numbingly exhaustive as far as the witnesses for which it provides evidence. It cannot even be criticised for itacistic variants - it stuffs it all in. It certainly is maxima in that regard.

    But even the fact that the ECM limits itself to (only!) 200 mss for the Catholic epistles means that there are not going to be many more variants out there in other mss.

    All textual critics should be required to re-read Colwell every few years. Here is what Colwell wrote about the later K(r) group:

    "compare 34 verses (Jn 9:1-34) of P66 and P75. They are basically the same genetic strain, yet they differ from each other 39 times in this short block of text - more than one to a verse. From the control period, take two manuscripts: the Isaac gospels and 2322. They belong to Soden's K(r) group - the last of the great Byzantine recensions of the Alpha text-type. In 31 verses of Mark 11 they never differ. Moreover, if ten K(r) mss are compared throughout the Gospel of Mark, "six of the ten agree [in 180 variants from Stephanus]. Every ms has at least 80% of these variants, and seven .. have over 90% ... and only four of the ten mss have more than 15 variants outside this list of 180" in the entire Gospel of Mark'. (Hort Redivivus, p168-9).

    When you add to Colwell's comment the fact that Mark is the most textually disturbed book in the NT, then you realise that no extandt ms is going to add significantly more variants to the ECM in the Catholic epistles than what it already presents.

  10. Andrew Wilson: "It cannot even be criticised for itacistic variants - it stuffs it all in."

    Not correct. The ECM generally does not record interchanges of vowels αι-ε, ε-η-υ-ι-οι, ο-ω.


    "... the fact that the ECM limits itself to (only!) 200 mss for the Catholic epistles"

    That is ca. 180 in James, but ca. 140-160 in the other Catholic Epistles.


    "then you realise that no extandt ms is going to add significantly more variants to the ECM in the Catholic epistles than what it already presents."

    Let me take an example from the first verse of Jude:

    ECM 1,18-20
    a (=base text) εν θεω
    b om.

    Wasserman (my variation unit includes πατρι):
    1. εν θεω πατρι
    2. εν πατρι θεω
    3. εν θεω πατρασιν
    4. εν θεω πατρι ημων (noted in ECM at 1,23)
    5. εν θεω και πατρι (noted in ECM at 1,21)
    6. εν τω πατρι
    7. εν χριστω πατρι
    8. πατρι

    Thus, my 2, 3, 6, 7 are "new".

    I can assure you Andrew that with full collations, even of Mark, many new variants will come to light, many of which will be peculiar, nearly singular, and as I implied, hardly any will be of significance for the reconstruction of the "initial" text.

  11. Ahh - right you are Tommy. They specifically mention making itacism an exception in the ECM Intro 4.1, but they include most other sort of orthographic variants.

    I have not got myself the ECM for Jude - how many mss is the ECM using for it, and how many did you use? Would you say that the situation you illustrate in verse 1 is typical for the rest of Jude, or not?

    Of course, I would expect more variants for Mark than are reported in critical apparatuses, because of the textual nature of Mark. The same would go for Jude (to a lesser degree) because it is more textually 'interesting' than some of the other catholic epistles, say 2 Peter.

  12. Andrew Wilson: "I have not got myself the ECM for Jude - how many mss is the ECM using for it"


    Andrew: "and how many did you use?"


    Andrew: "Would you say that the situation you illustrate in verse 1 is typical for the rest of Jude, or not?"

    The example is not typical in the sense that variants increase x2 in one variation unit (I just turned the papers and picked the first good example for illustration). Nevertheless, there is almost always an increase, and the number of variation units (places where there are variation) are also increased.

  13. There are about five thousand words in the Epistle to the Hebrews. I transcribed and collated the thirty or so papyrus and majuscule MSS of Hebrews for my PhD dissertation. There were over two thousand points of variation. This makes me say the following: If a word can vary, it does vary in the MS tradition. That is, everything but the simplest words are likely to exhibit variants.

    Having said that, most of the variation is purely orthographic. Substantive variants (ones that affect the meaning) are less frequent and most of them are insignificant--involving transpositions, articles, etc. Variants that affect the meaning in a significant way (e.g. a different theological interpretation) are relatively rare.

    When you look at the big picture of, say, Greek NT MS transmission of a particular book, there are some fundamental questions:
    (1) How many MSS were made?
    (2) What is the half-life of a MS (i.e. the time it takes this population of MSS to halve)? (This will change from century to century.)
    (3) What is the scribal "error" function? I.e. How does one describe the probability that a scribe will change a word he or she is copying? This is impossible to answer in a precise manner, but you can make progressively better approximations of it as more information comes to light. Also, the question needs to be qualified--what class of change are you talking about: orthographic or substantive. I put "error" in quotes as it presumes you know the "correct" version. In both classes mentioned here, "correct" is either a wrong description (as with orthography when dictionaries did not exist) or an unknown thing (as with substantive variants).

    If you could answer these questions, you would be in a much better position to understand how far removed we are from what left the apostles' hands--you would know how many generations removed your MS copies are from the first generation of MS copies. (The mathematical form of this knowledge would be a probability distribution: one generation removed is very unlikely. Three generations removed more likely. Where the distribution peaks, I don't know.) You would also know how many changes are likely to have happened to the text.

    I have made initial forays in this direction here:
    (1) my dissertation, esp. chap. 9:
    (2) a MS copying simulation:

    The results of my PhD research tell me there are three primitive groups of texts for Hebrews. I believe that these correspond to three regions of the Empire: Egypt, Palestine and Asia Minor (with rather fuzzy edges to those regions).

    After looking at the results of a few runs of my MS copying simulation (where, by the way, I know the original state of the "text"), it became apparent that there is a simple way to recover a very good approximation to the original text. It involves isolating the readings associated with each region (for each reading, a simple majority within the group), then isolating the group readings of these groups (again, a simple majority).

    This could be tried for the variants recorded in UBS4 apparatus as follows:
    (1) encode the apparatus
    (2) generate dissimilarity matrices
    (3) perform multidimesional scaling on the dissimilarity matrices, thereby identifying groups
    (4) identify majority readings within each group
    (5) identify majority readings among these groups. (You need more than two groups to do this.)
    (6) Rate readings thus:
    A: all groups agree
    B: a majority of groups agree
    C: a minority of groups agree
    D: no group agreement (i.e. all groups differ)
    (7) Publish the result as "The Original Text: A Computer Reconstruction" and make a million bucks.

    Step (7) is optional. I would like to do (1) - (6) but it would take a few months and I don't have that kind of spare time.

    The validity of the approach assumes at least the following:

    (1) there are three or more discernible groups within the data
    (2) my simulation is a reasonably good approximation to what actually happened.

    A couple of final observations:
    (1) If the early generations of MSS were copied in relative isolation then the most rapid divergence of the whole MS tradition would have occurred then; the fastest rate of change happens when there is nothing (such as comparison with nearby copies) to restrain change . By the same logic, rapid divergence also happens when a local population of MSS is decimated, as in times of severe persecution.
    (2) There were very few MSS early on because the demand for them was low. (Total Christian population at, say, 70 AD might have been 0.3% of the Empire.) By contrast, the demand at 300 AD was huge (perhaps 20% of the Empire Christian?)
    (3) A crucial question is how often exemplars came from other regions.


    Tim Finney

  14. Tim, These points are all incredibly interesting. Have you ever tried to estimate the number of NT manuscripts that must have been produced before Constantine?

  15. "The accumulated errors of fourteen centuries of manuscript copying" (RSV introduction, quoted on an islamic website)

    A phrase much bandied about, but does it really mean anything? Tommy, in your study of Jude do the number of variants per chapter increase as the age of the manuscript decreases?

  16. I am becoming confused about the talk on this thread about a given manuscript having some number of variants in it that is different than the number of variants in some other manuscript. Isn't it the case that you can only speak of variants when you compare 2 or more manuscripts (in which case both manuscripts have the same number of variants against one another)? Of course you could collate different manuscripts against some preselected standard. But that standard would necessarily be closer to one of the manuscripts being compared than the other, and tell you nothing about their relative number of variants as some absolute property--the data resulting from such a comparison could just be turned upside down by collating them against yet a different manuscript that is nearer to the other manuscript.

  17. Daniel Buck: "Tommy, in your study of Jude do the number of variants per chapter increase as the age of the manuscript decreases?"

    I have only one chapter :-)

    I am not sure I understand your question, you could clarify.

    Nevertheless, the variants my work will add to those presented in the ECM are generally not shared by many MSS. Certain minor variants come and go, they do not survive in the stream of tradition, especially considering the "gravity" of the dominant Byzantine tradition. That is not to say that such variants are not interesting, some are very interesting...

  18. I have tried to estimate the number of MSS. See my MS copying simulation at

    Here is a rough estimate:

    N = P * C * B

    P = total population of Empire (about 50 million in the first three centuries AD)

    C = proportion of Christians in the Empire (20% at 300 AD?)

    B = MSS per Christian (1/100?)

    Using these estimates,

    N = 50 mill. * 0.2 * 0.01 = 100,000(!)

    (1) P is pretty well accepted. C and B are my guesses. I am interested to know what others would guess these to be.
    (2) MS is a broad term. If we want to be specific about language and whether the MS is e, a, p, r, or a combination of these, then N is correspondingly reduced. If, say, you are talking about Greek MSS, then P is the Greek-as-first-language population. If you are estimating the number of p MSS then B is the number of p MSS per Christian.
    (3) This is a point estimate--the number of MSS required to satisfy demand at 300AD. If you want a cumulative total then you need to know the average lifetime of a MS (or the halflife of a population of MSS). I have heard that with exponential growth, the number of individuals now living equals the total number now dead. I use the logistic growth equation in my simulation. It starts exponential but the growth rate decreases as the population increase, resulting in a sigmoid curve. Even so, an estimated cumulative total of double the 300 AD point estimate is probably not too bad.

    Another, probably better, approach is to estimate the number of churches and assume that each one had a copy of the gospels and a copy of the Pauline corpus. This method doesn't work if you are estimating MS populations before these collections existed (i.e pre-150 AD(?) for gospels; pre-100 AD(?) for Paul).

  19. Eric Rowe: Isn't it the case that you can only speak of variants when you compare 2 or more manuscripts?

    Yes. The total number of variants that you know about depends on how many copies you collate. I would like to know the shape of the [no of variants] vs [no of (complete) mss collated] curve. I imagine that it is steep at first then asymptotic.

    One thing you need to decide is how to count variants. One way is to create a synthetic text that includes all of the words of the collated texts in order. (See my PhD for an explanation.) Any one of the texts can then be represented as a string of 1s and 0s, 1 meaning agreement with the synthetic text, 0 disagreement. The number of places where the texts do not agree is the number of points of variation. (How's that for a tautology?) You might choose to count variation units differently, in which case your number of variation units will be less.


    Text 1: The cat sat on the mat.
    Text 2: The cat pounced on the rat.
    Synth: The cat sat pounced on the mat rat.
    Text 1: 1 1 1 0 1 1 1 0
    Text 2: 1 1 0 1 1 1 0 1

    No of places where all texts do not agree equals four, which is twice what most people would say.

  20. TW:
    "I am not sure I understand. . . clarify.

    Nevertheless, the variants my work will add to those presented in the ECM are generally not shared by many MSS. Certain minor variants come and go, they do not survive"

    Your answer sufficiently demonstrates that the RSV editors were assuming too much. Errors did NOT "accumulate" for 1400 years, rather they "came and went."
    Defining what is an error and what isn't, unfortunately, begs the question. But whatever variants were erroneous, they did not keep accumulating. Many of them, no doubt, have been lost forever.