Friday, October 06, 2006

Statistical Question

There are some useful statistics for the Greek NT listed here by Felix Just; including the Number of Chapters, Number of Verses in Each Chapter, Total Number of Verses, and Total Number of Words in each book of the Greek New Testament.
But what I would like to know is the number of letters in each book of the Greek New Testament (any critical edition accepted; or even TR for comparison). Any offers?

Up-date:
Rhalfs (Textual Criticism of the GNT [ET, 1901], 48f) reports that Zahn (Geschichte des N.T. Kanons, i.76) gives figures from work done by Graux (Revue de Philologie, ii). Rhalfs only provides some figures (letters first, then stichoi [I can't figure out how to do a table]):

Matthew: 89,295 (2,480)
Mark: 55,550 (1,543)
Luke: 97,714 (2,714)
John: 70,210 (1,950)
Acts: 94,000 (2,610)

3 John: 1,100 (31)
Apocalypse: 46,500 (1,292)
Philemon: 1,567 (44)

Looks like a trip to the library to get all the figures; unless someone has Zahn on their shelves.

16 Comments:

Tommy Wasserman said...

Matthew: 90225 letters

Exported the Matthean text from Accordance (NA27 module) with strip accents function. Copied the RTF textfile, pasted into Word. Removed all punctuation with the search/replace facility. Used count words-function.

Perhaps you can do something similar?

Anonymous said...

No need to make a trip to the library. A quick Perl script will tell the story. Just a few minutes please...

Casey Perkins

P J Williams said...

'There are more variations among our manuscripts than there are words in the New Testament' (Misquoting Jesus, p. 90)—but probably fewer than there are letters :-)

P J Williams said...

Sorry to return to a favourite subject.

Anonymous said...

Below is the output from my program for each book of NA/UBS. The counts include no spaces, brackets, accents, breathings, or punctuation.

01 90051
02 56533
03 95957
04 71346
05 95808
06 34428
07 32754
08 22273
09 11079
10 11995
11 7992
12 7878
13 7413
14 4042
15 8848
16 6519
17 3717
18 1556
19 26374
20 8820
21 9056
22 6073
23 9458
24 1129
25 1106
26 2571
27 46028

Casey Perkins

Peter M. Head said...

Thanks Casey,

That looks helpful.

Some interesting differences from Rhalfs figures, esp. for Luke; but that probably reflects some significant textual decisions in Luke.

Tommy came up with a different number using, presumably, a similar technique. Any thoughts on this?

These figures would have all nomina sacra fully spelled out and would presumably not count iota subscript (unlike say, a manuscript, which might have less letters by using NS and a few more by using iotas).

Anonymous said...

Hi Peter,
"Tommy came up with a different number using, presumably, a similar technique. Any thoughts on this?"

Tommy's figure for Matthew was only 175 characters different out of about 90000. I'm not sure how thorough he was in his search-and-replace operation, but I reviewed my output to make sure there were no extraneous characters like brackets or punctuation. (Not to say that I viewed all the data by eye, but I used VIM, my text editor program, to search for non-word characters and found none).

I guess you could verify how precise my figures are, if you wanted to make the effort, by counting out the characters in 3 John, the shortest of the books, and seeing how close your count and my computer count are. I'm guessing you'll find the computer count very accurate. In any event, the figures are at least good enough for relative comparison.

Casey

Anonymous said...

"These figures would have all nomina sacra fully spelled out and would presumably not count iota subscript..."

I forgot about iota subscript. (The text file I'm using represents iota subscript as a vertical bar character, so I removed it with all other such characters). I tweaked two lines of my program and ran it again, this time retaining iota subscript:

01 91005
02 57094
03 97039
04 72131
05 96761
06 34907
07 33159
08 22544
09 11172
10 12178
11 8090
12 8010
13 7500
14 4099
15 8938
16 6609
17 3753
18 1584
19 26562
20 8888
21 9142
22 6142
23 9581
24 1143
25 1123
26 2594
27 46523

Anonymous said...

Interesting. My figures are closer to Casey's (90057 letters in NA27 Matthew).

More info on my blog, ricoblog

I also did counts for Robinson's 2005 edition of the Byzantine and for Scrivener's 1881 edition. Text files with word and letter counts broken out by book are available in the aforementioned blog article.

Hope it helps!

Rick Brannan
ricoblog

Tommy Wasserman said...

PH: "Tommy came up with a different number using, presumably, a similar technique. Any thoughts on this?"

Tommy forgot to remove the question marks which were 167 in number.

90225 - 167 = 90058

Now, there is still one letter difference between my corrected figure and Rick Brannan's. On the other hand, I am using an old Accordance database, which probably has some or other error. In fact, I remember noticing an error once. These very small differences (1-10 letters) are explicable if we use different text releases. At least you now know that it is around 90000 letters in Matthew (with nomina sacra written out).

Peter M. Head said...

Thanks a lot everyone for this.
Very helpful.

P J Williams said...

Does it make sense to discount iota subscripts, but to leave an adscript like John 18:2 in NA27?

Peter M. Head said...

Rick is quite explicit that he counted words within brackets, but I suspect that some of these counts may still need some tweaking in order to distinguish between single square brackets [in which the bracketed words are considered to be part of the NA27 text] and double square brackets [[in which the bracketed words are NOT considered to be part of the text]]. This may require a little human intervention rather than just programming a delete for all bracket characters.
This could significantly impact counts for Mark and Luke.

Anonymous said...

...I suspect that some of these counts may still need some tweaking in order to distinguish between single square brackets [in which the bracketed words are considered to be part of the NA27 text] and double square brackets [[in which the bracketed words are NOT considered to be part of the text]]."

That would certainly give us a better idea of the true difference between the NA and the Byzantine text. The current NA counts give us a misleading view of that.

Anonymous said...

Hi Peter,
I found the discrepancy between my figures and Rico's. It was a bug in my program. Our figures are now identical, with the exception of a 1 character difference in Acts, and a 2 character difference in both 2 Cor and Hebrews. It's no doubt attributable to a difference in our source files.

I did a quick alteration to my program to account for double brackets (which were only relevant in Mark, Luke, and John). Below are the new figures. First column of numbers are the figures without iota subscript, the last column takes it into account. In Mark, Luke, and John, totals with and without words in double brackets are included.

Matthew, 90057, 91011
Mark, 56537/55365, 57098/55915
Luke, 95966/95772, 97048/96852
John, 71348/70526, 72133/71301
Acts, 95811, 96764
Romans, 34434, 34913
1 Corinthians, 32760, 33165
2 Corinthians, 22279, 22550
Galatians, 11085, 11178
Ephesians, 12001, 12184
Philippians, 7998, 8096
Colossians, 7884, 8016
1 Thessalonians, 7419, 7506
2 Thessalonians, 4048, 4105
1 Timothy, 8854, 8944
2 Timothy, 6525, 6615
Titus, 3723, 3759
Philemon, 1562, 1590
Hebrews, 26383, 26571
James, 8827, 8895
1 Peter, 9062, 9148
2 Peter, 6079, 6148
1 John, 9459, 9582
2 John, 1130, 1144
3 John, 1107, 1124
Jude, 2577, 2600
Revelation, 46032, 46527

Regards,

Casey Perkins

Peter M. Head said...

Ah, well done.