Comprehensive NWT Comparison Project (calling all technically skilled members)

by Apognophos 223 Replies latest watchtower bible

  • MeanMrMustard
    MeanMrMustard

    @Apognophos,

    I see. Thanks for the explanation. I did a quick search for a Diff(DELETE,"YOU") in my change log and come up with this (count by book):

    2 Ecclesiastes
    45 Ephesians
    134 Numbers
    136 1 Corinthians
    11 Ezra
    95 2 Corinthians
    3 Nahum
    7 Song of Solomon
    33 2 Samuel
    34 Hebrews
    148 Psalms
    12 Daniel
    61 Job
    2 Obadiah
    37 Philippians
    4 2 John
    153 Jeremiah
    11 Revelation
    43 2 Chronicles
    72 1 Samuel
    88 Mark
    1 3 John
    6 Hosea
    54 Judges
    39 James
    19 2 Peter
    4 Habakkuk
    11 Joel
    206 Deuteronomy
    7 Haggai
    63 Romans
    7 Jude
    10 Micah
    157 Exodus
    179 Ezekiel
    101 Genesis
    4 Zephaniah
    210 Matthew
    51 1 Thessalonians
    21 Malachi
    25 2 Thessalonians
    17 Proverbs
    13 Zechariah
    50 Galatians
    6 Lamentations
    172 Leviticus
    2 Jonah
    28 1 Kings
    17 Nehemiah
    142 Isaiah
    24 1 John
    1 Philemon
    51 Colossians
    194 John
    2 Esther
    196 Luke
    24 Amos
    39 2 Kings
    109 Acts
    15 1 Chronicles
    51 1 Peter
    83 Joshua
    11 Ruth

    MMM

  • MeanMrMustard
    MeanMrMustard

    @wallsofjericho,

    Looks like Titus 2:13 is the only scripture with a change of Jesus Christ from Christ Jesus... ok then. :)

    MMM

  • doneandout
    doneandout

    I got your pdf links. :)

    This will save me tons of time to read through to see the changes manually.

  • lrkr
    lrkr

    So... it looks like they took any magic or mystery that was left in the text out. I understand the "seed of the woman" is now the offspring. To me seed is very different than offspring. (one is animate and current, the other could be potential and latent) Also eliminating all of the "came to be" removes that beautiful reference to "TO BE" in the text.

    Just when you thought it couldnt get any worse than replacing "grace" with "undeserved kindness" they further castrated the poetry.

    I'm an athiest- so I dont really find religious meaning- but the poetry and symbolism have been centuries in the making.

  • Phizzy
    Phizzy

    I think they have moved the average JW a lot further away from ever getting an appreciation of what the Bible writers were saying.

    When I was an active JW I used to pick up on those rather strange expressions and idioms and research what they were about.

    Now, the bland paraphrase by non-scholars that is the RNWT, has removed any term that would cause further research.

    The good Translations, (good notice, none are perfect I guess), the good ones do not emasculate the text.

  • wallsofjericho
    wallsofjericho

    sorry if this has already been asked but.... are there more occurences of the name Jehovah? if so, where are they inserted?

  • fastJehu
    fastJehu

    @ wallsofjericho

    I found in the database from MMM these verses:

    book chapter vers
    Judges 19 18
    1 Samuel 6 3
    1 Samuel 10 26
    1 Samuel 23 14
    1 Samuel 23 16
  • slii
    slii

    I don't think the .PUB files in the Watchtower Library are in this same format. I've done lately some reverse engineering of the 2006 version, and I now understand a bit about the format of the publication files.

    First of all, I have so far seen nothing to indicate that any portion of the files are encrypted. Large parts of them are compressed, though, using a compression algorithm resembling Huffman which I'm starting to understand (I still don't fully understand the construction of some lookup tables used for decompression).

    Some pieces of the textual data (mainly titles) are in uncompressed form, yet this is not immediately obvious from inspecting the files. This is because the system uses internally a 16-bit MEPS-specific character set, which I believe is able to represent multiple scripts but predates Unicode. Overall, I get the impression that whoever designed this knew quite well what they were doing, but there's obviously lots of arcane legacy baggage involved. As to why WTL, or at least the 2006 version, still uses MEPS-coded documents internally, I do not know; perhaps they consider it a useful obfuscation to throw at possible reverse engineers (it did make me scratch my hea for a while), or maybe it's just a legacy thing that hasn't been enough of a problem to touch in code that could be in maintenance-only mode.

    For example, if you look at wte.lib, it contains lots of uncompressed strings (publication names, if I remember correctly). Most (all?) .pub files also contain some uncompressed strings. In English-language files, look for places where every other byte is a 08 (hex); most of those will likely be strings. At least uncompressed MEPS strings tend to be stored in a Pascal string like style, i.e. the first 16 bits are the length of the string (in bytes, so it's always an even number), and not null-terminated.

    Some 16-bit values seem to be mapped to some kind of control codes that most probably specify things like italics. Most of the Latin alphabet seems to be in the 08xx range. Being a typesetting-oriented coding system, it also seems to contain codes for ligatures; for example, the "ff" ligature is apparently represented as 0851.

    Specifically, the English alphabet is mapped to 16-bit values, which are stored in little-endian format, as follows (numbers in hex):

    0800..0819 A-Z; 081a..0833 a-z; 0834..083d numbers, "1234567890" (BTW this is the only character set I know of where 0 comes after 9, not before 1)

    0841..0844 ":;.,"; 0845, 0846 left and right single quotation marks; 0847..0848 "?!"; 084b..084e "()/-"; 084f em-dash; 0850 en-dash; fb61 <SPACE>

    0851 ff ligature; 0865 é (small e with acute); 0x08fb hyphenation point (used a lot e.g. in NWT to show proper hyphenation of names).

    fb57 and fb58 are often seen around dashes, as in <fb57>--<fb58>. Perhaps they prevent breaking the line between them?

    I guess the overall format of the .pub files and the compression are a topic for another post.

  • MeanMrMustard
    MeanMrMustard

    @slii, Hi there! This looks like your one and only post. I've let this thread slide, and I missed your post. Sorry about that.

    I don't think the .PUB files in the Watchtower Library are in this same format. I've done lately some reverse engineering of the 2006 version, and I now understand a bit about the format of the publication files.

    Good for you! I don't have the stomach to get into those PUB files. I didn't really think the PUB files were encoded the way the mobile app is encoding its data.

    First of all, I have so far seen nothing to indicate that any portion of the files are encrypted. Large parts of them are compressed, though, using a compression algorithm resembling Huffman which I'm starting to understand (I still don't fully understand the construction of some lookup tables used for decompression).

    You may be correct here.

    Some pieces of the textual data (mainly titles) are in uncompressed form, yet this is not immediately obvious from inspecting the files. This is because the system uses internally a 16-bit MEPS-specific character set, which I believe is able to represent multiple scripts but predates Unicode. Overall, I get the impression that whoever designed this knew quite well what they were doing, but there's obviously lots of arcane legacy baggage involved. As to why WTL, or at least the 2006 version, still uses MEPS-coded documents internally, I do not know; perhaps they consider it a useful obfuscation to throw at possible reverse engineers (it did make me scratch my hea for a while), or maybe it's just a legacy thing that hasn't been enough of a problem to touch in code that could be in maintenance-only mode.

    Agreed. At this point I think MEPS is dead. Unicode can take its place quite easily and, in fact, be a lot easier to work with. But I think there are a lot of things about the WT Lib that are archane (more on that below).

    For example, if you look at wte.lib, it contains lots of uncompressed strings (publication names, if I remember correctly). Most (all?) .pub files also contain some uncompressed strings. In English-language files, look for places where every other byte is a 08 (hex); most of those will likely be strings. At least uncompressed MEPS strings tend to be stored in a Pascal string like style, i.e. the first 16 bits are the length of the string (in bytes, so it's always an even number), and not null-terminated.

    Interesting.

    Some 16-bit values seem to be mapped to some kind of control codes that most probably specify things like italics. Most of the Latin alphabet seems to be in the 08xx range. Being a typesetting-oriented coding system, it also seems to contain codes for ligatures; for example, the "ff" ligature is apparently represented as 0851.

    Specifically, the English alphabet is mapped to 16-bit values, which are stored in little-endian format, as follows (numbers in hex):

    0800..0819 A-Z; 081a..0833 a-z; 0834..083d numbers, "1234567890" (BTW this is the only character set I know of where 0 comes after 9, not before 1)

    0841..0844 ":;.,"; 0845, 0846 left and right single quotation marks; 0847..0848 "?!"; 084b..084e "()/-"; 084f em-dash; 0850 en-dash; fb61 <SPACE>

    0851 ff ligature; 0865 é (small e with acute); 0x08fb hyphenation point (used a lot e.g. in NWT to show proper hyphenation of names).

    fb57 and fb58 are often seen around dashes, as in <fb57>--<fb58>. Perhaps they prevent breaking the line between them?

    I guess the overall format of the .pub files and the compression are a topic for another post.

    Very interesting! You have definitely gotten into the weeds. Ultimately however, I think there might be an easier way. To date I have successfully extracted every piece of text from the 2011 and 2012 WTLibs. I did this without looking at the PUB files and without any manual clicking. I planned on mentioning it in a future thread with some statistics calculated from the text.

    I thought it would be cool to see if there are any meaningful differences between the 2011 and 2012 version of text. That is, we expect some differences - new content for the 2012 version, and probably some new entries into the publication index. But aside from that, I am just wondering if there are any textual changes they snuck in. The NWT project turned out to be difficult because there were so many changes involved, and it was meant to be that way. But if you take, say, the Jan 1 1980 version of the WT in the 2011 version and the 2012 version, you would expect that there are no changes between these two documents.

    Anyhow, the code base is the same between the 2011 and 2012 version, and probably all the previous versions. They just add new content. I can tell because in the 2011 version has a serious memory leak in it, and the same issue is carried over to the 2012 version.

    MMM

  • DS211
    DS211

    Who do i message to get a pdf

Share this

Google+
Pinterest
Reddit