Tuesday, December 08, 2015

How Many Variants Are There in the Greek New Testament?

(Not this many)
The short answer is no one knows because most of our manuscripts remain uncollated. But this has not stopped scholars from offering numbers since at least the publication of John Mill’s 1707 edition of the Greek New Testament.

That edition was said to have 30,000 variants (that number too is an estimate, by the way). In 1848, J. Scott Porter suggested 100,000 in his Principles of Textual Criticism (p. 11) and the number has been rising ever since. Most recently, Eldon Epp has given a “wild guess” as high as 750,000.

What no one has done—so far as I’m aware—is give a reliable justification for their suggested number of variants. This despite the fact that the number continues to be a matter of genuine apologetic interest—not only to those wanting to defend the NT’s textual reliability but just as much to those wanting to oppose it.

Since a number of major collations have been published in recent years, I thought something could probably be done to set the question on firmer ground. In the latest issue of NTS (online here) you’ll find my best attempt at doing that. I won’t spoil it except to say that my results are larger than most other estimates (including Ehrman’s) but still lower than Epp’s “wild guess.”

In order to produce a good estimate, you need three things: a good data source from which to estimate, a method of extrapolating, and a clear definition of what you’re estimating. In the last case, I decided to exclude spelling differences. This was partly because two of my data sources didn’t include very many of them and partly because I just don’t find the number of spelling differences to be all that significant.

Arguably the most important question, however, is how to define “variant.” I don’t think it is always appreciated in our discipline that the term “variant” is necessarily relative. Something can only vary from something else. If you only have one of something, you have no variation. So the question was whether I should define “variant” in relation to the manuscripts or in relation to some fixed, printed text of the New Testament. I decided to go with the former. My estimate is thus an estimate about the number of cases where the manuscripts vary from one another. It is not an estimate of the number of cases where the manuscripts vary from any particular reconstruction of the original text.

What this means, is that my estimate assumes nothing about whether any of the estimated variants are are also original or authorial. But undoubtedly a great many of them are. Naturally, one question I’ve been asked about my estimate is “How many of your estimated variants are original?” The question is actually not hard to answer. If we assume that the original reading has survived in the manuscript tradition at each point of variation, then it’s simply a matter of counting the number of variation units and extrapolating from there. Thankfully, I did count the number of points of variation in my data.

Based on my sources, the number of “original variants” ranges from 17%–25% of the total number of extant variants. The percentage is much lower in the Text und Textwert volumes (only 9%) because the average number of variants per variant unit is much higher there. I suspect this has to do with the way the variant units were chosen in those volumes. But that’s a post for another day.

Besides variants and variant units, I also kept track of the number of singulars and nonsense readings in my data sources. If we add those in, we can graph the results in a way that gives us some perspective on the kind of variation an editor of a large collation must deal with. Obviously, most of these variants don’t (and probably shouldn’t) make their way in to our hand editions. But it’s still helpful to know what these percentages are.

The hard numbers for these charts are all in the article which is now online here. I’ve also put up the pre-pub version (which is basically the same) on my Academia page for those without access to NTS. There is more that can be done with these data and hopefully I and others will explore some of those in the future.


  1. Thanks Peter, good work. Despite some looking, I couldn't locate your estimate for the number of variation units.

    If we define variation units as non-overlapping, the number of places of variation must be much lower than the number of words in the NT, right? There are a considerable number of words that are not part of any variation unit and serve therefore to demarcate the various units.

    If we allow for overlapping boundaries of variant units we open a whole new can of worms on how to count the number of variants with these overlapping units in relation to one another.
    Any thoughts, Peter?

    1. Great questions, Dirk.

      1. I didn't estimate the number of variation units. But I do give the number of variant units from each data source (Table 2). From there it’s just a matter of extrapolating with the same basic formula.

      2. In my experience with the three big collations, there actually are surprisingly few words in the base text that are not involved in a variation unit. So if you used these as demarcation points, you would end up with pretty large variation units.

      3. As for overlapping units, this has definite ‘can of worms’ potential. But happily it was less of an issue than I expected it to be. In Wasserman’s apparatus he marks units that overlap. If a witness can’t be cited one variant unit because of an issue with an overlapping unit, he gives you an arrow rather than a letter address. So these I did not count. Usually this happens when there is a long omission. I didn’t want to count a witness’s omission more than once. And his apparatus made it easy to avoid that. I did the same thing for John 7.53-8.11 and Mark 16 in Text und Textwert (see the note with Table 4) and to the other two sources. Morrill’s collation actually had very few overlapping variations so it wasn’t hard there either.
      Hopefully that makes sense.

  2. Peter,
    Thanks for the thoroughness of your article! Having an estimate without knowing what or where it estimates has only allowed for any to make it say what they like. I appreciate particularly the point about the relationship between the number of manuscripts collated and number of words comparison.


  3. Peter,
    Back in 2008 at one of the old Yahoo forums I posted the following:

    How many variants exist among the Greek manuscripts of the books of
    the New Testament? Estimates have ranged from 30,000 to 50,000 to
    200,000 to 300,000 to 400,000.

    Dr. Tommy Wasserman's book "The Epistle of Jude: Its Text and
    Transmission" meticulously presents the extant Greek attestation of
    Jude's text. Wasserman's reconstructed text of Jude consists of 461
    words. Wasserman lists 1,271 textual variants (I think. Some of
    these are "defective," which means that they cannot be reconstructed
    with certainty.) If we work with the unproven premise that variants
    were created at the same rate in other books that they were created
    in Jude, then if we apply the ratio of 461-to-1,271 to the total
    number of words in the NT (put at 137,490 by Morgenthaler, as cited
    by Metzger on p. 1 of "Lexical Aids for Students of NT Greek"), then
    the total number of variants = 379,067. Or to loosen up the math a
    bit, we could estimate that the number of variants in a given book
    will be 2.75 times the number of words in the book.

    So, it initially looks like the total number of textual variants in
    the Greek NT is in the neighborhood of 380,000. One thing that I'm
    not sure about, though, is whether or not it's sensible to count the
    *authentic* readings as variants. When most folks talk about
    variants, they mean variations from the original text, even though
    technically a contested genuine reading is also a variant. If we
    subtract from 380,000 the *authentic* 137,490 words, with their
    authentic spelling, in their authentic word-order, then the number of
    inauthentic readings seems to drop to 242,510.

    Now, that unproven premise that I mentioned is probably incorrect.
    We should probably expect the rate of variants in the Gospels to be
    much higher than in Jude, since the Gospels have many more
    witnesses. So let's figure in, oh, another 75,000 variants.
    Depending on whether or not the authentic variants are counted, the
    total number of variants in the Greek witnesses to the NT text might
    be about 455,000 or (subtracting the authentic readings) 317,510.

    (Btw, I don't mean to imply that I agree with Morgenthaler's word-
    count; I just used it because it was handy for the calculation.)

    Yours in Christ,

    James Snapp, Jr.

    1. Ah the old Yahoo forums. Those were good times. Glad to see we were thinking along the same lines. The only difference is in our data. I counted 1,694 variants in Tommy’s dissertation, probably because my definition required me to count errors (marked ‘f’) as distinct variants even where he records them without a separate letter address. As a result, my rate of variation was 3.54 variants per word in Jude. And the NA27 has 138,020 words. Hence my higher estimate from his data. But we’re on the same track it seems.

  4. One question that might also be helpful: how does any estimate of the number of presumed sensible variants among NT MSS compare with the number of presumed sensible variants found among the MSS of classical works, whether poetic (e.g., Homer) or narrative (e.g. Herodotus)? It would seem that an investigation of that issue might profitably assist the overall evaluation.

  5. Congratulations! It is good that you added Matt Solomon's most recent data in.

  6. Peter, thank you for this great work, and for defining so clearly how you arrived at your estimate.

    I appreciate your note of caution regarding how we use the data, but along these lines my first impression when I read your article (similar to James above) is that your estimate seems to facilitate a misinterpretation of the data since it does not subtract an estimated number of textual variants.

    Also, I thought you were headed on the right track when you examined the higher rate of variation in the Text und Textwert test passages, and I thought you were going to tell us how you would take that into account in your estimate. But I was shocked when you said that "100,000-240,000 variants too many... would not be wildly off the mark." That's 17-40% of your specific calculated estimate. That does seem a bit wild to me. Granted, I missed at first that you did in fact trim off 91,044 from your calculated estimate to give us the 500,000 figure as a "reasonable estimate."

    I would prefer to incorporate the data that you showed regarding the consistently higher TuT rate of variation, and subtract something that is based on that data rather than the lesser (more radical) trimming from the 591,044 number to get 500,000. Additionally, subtract an estimated number of textual variants (representing the original) so that we are only talking about variants that differ from the ancestor text. By not including these calculations I think the value of your estimate is greatly lessened.

    I believe a less misleading estimate taking these calculations into consideration would be between 320,000 and 395,000, having subtracted about 170,000 or 95,000 for the higher variation rate of TuT test passages and about another 101,000 for those variants that represent the original text. I averaged your 100,000 and 240,000 figures for the higher TuT variation rate and subtracted 170,000. Better yet, by extrapolating the respective higher rates of variation for the TuT data accross the Gospels, Paul, Catholics and Revelation separately [using the average for Relation] I would only subtract 95,000, which is much closer to the amount you did subtract to get 500,000 - just wish you had given the rationale.

    Although I take issue with the value of your 500,000 number, this does not weaken the value of your paper in the least, since you told us exactly how you arrived at your estimate. Any reference to a reasonable estimate in the future would do very well to follow suit. Well done.

    1. Benjamin, just two things.

      1. I took all four of my data sets into consideration for my final estimate, not just TuT. But I intentionally avoided doing so in any formulaic way. The reason is that I wanted a very round number. This was in the (vain?) hope that it would impress on people that the number is an estimate not a count. Perhaps I could have been clearer about that in the paper though.

      2. I am glad to have folks slice the data in different ways--I did just that in this blog--just as long as we're clear on what we're doing and why. In that vein, I would note that your 320,000-395,000 is not a more accurate estimate; it's an estimate of something different. I estimated variants in the manuscripts not variants 'from the ancestor text.' But I realize there are other possible definitions. Hence this blog post where I've calculated the number of 'original variants.'

      It certainly does get complicated, I know. Hopefully this clarifies a bit.

    2. Thanks for your reply, Peter. Actually, I was very pleased that you ended up with a round number as an estimate. I was concerned at first that your estimate was going to be about 591,044. :)

      Again, I'm really grateful for the approach you have taken in terms of defining what your numbers are. So I can appreciate that your estimate is defined differently than my estimate.
      On the other hand, you are comparing your estimate to a long history of estimates, and your definition seems odd to include the original text as a variant, at least in the way that most people like to talk about variants. That means that every time we discover a new variant, your number of variants increases by two. I don't think most other estimates would have included those textual variants in their thinking when estimating the number of variants.

      My 395,000 number could be rounded to 400,000 and then with a rounded estimate of 100,000 textual variants included, that is essentially equivalent to your 500,000 number.

    3. Benjamin, you might be right about past estimates. But unfortunately there is no way to know because, as I say in the article, almost no one tells us what they mean by 'variant.' And no, the definition is not uniform (see the Epp and Fee articles I cite). But I do admit the most people intuit your definition and I am fine with that definition. It's just not the only one. And, in any case, my data can be useful for either definition (hence this blog post).

  7. My adjusted numbers of 320,000 or 395,000 would mean 1 variant every 5.5 - 6.5 pages rather than the every 4 pages that Peter derives from his estimate.

    But I do agree that a more valuable statistic would be to compare the number of variants to the number of words in the manuscripts. I'm puzzled why that data is not more readily available. Someone could estimate it based on the number of words in the standard text(s) if the extant references were available for all MSS. But that would be no small task.