Comprehensive NWT Comparison Project (calling all technically skilled members)

by Apognophos 223 Replies latest watchtower bible

  • MeanMrMustard
    MeanMrMustard

    Here is a sample if the NWT 1984 and NWTx2... mixed.. diffed, all in one. Let's see if this comes out OK on the forum... It Genesis 1, entire chapter. Red text with lines through it is text removed from 1984. Green text is text added in 2013.

    PFFT... it rendered fine in preview, but not when I submitted. Stripped out all the formatting. *sigh*.. I'll try to post a picture..

    MMM

  • Apognophos
    Apognophos

    Sorry, I was away for a couple days. Glad to see your progress working with the diff code, I hadn't realized it would produce such nice output. I have some ideas on how to process and group the changes, but I'll need to be able to work with the version of the diff code that you are using, so I can wrap some more code around that.

  • MeanMrMustard
    MeanMrMustard

    Apognophos,

    No problem. The output from the google-diff-match-patch is fundamentally by character. In order to make it a word-level diff, you have to define what a word is (are spaces included? are they on the front or back?) and then encode the string - choose a Unicode character for each word and then allow the normal character-level diff operate on the encoded string. Then un-encode it to bring the diffs back to the word level. Very neat. The trick was in defining a word. As it turns out, the best results come from a word being defined as any stretch of non-whitespace characters. Also, each run of whitespace is a separate word, and any punctuation is also a 1-char word.

    Here is what Genesis chapter 1 looks like. The base-line is the 1984 version. If you see run text with strikethrough, it means that this text was removed in the 2013 version. Green background text was not found in the 1984 version, but it was found in the 2013 version. You'll see that brackets are always eliminated. I've got the entire bible like this in my DB... I just need to dump it to a file. I was hoping to make a PDF, but had not gotten around to it.

    Interesting that you can see the extent of the changes with just Genesis 1....

    MMM

  • zound
    zound

    The sun is still created after the plants.

  • DATA-DOG
    DATA-DOG

    The first "plants" were actually fungus. They required no sunlight. Other plants came later.... Sounds good to me! LOL!! BTW, you computer guys are awesome!!

    DD

  • Apognophos
    Apognophos

    Yep, the removal of brackets was actually mentioned as part of the talks at the AGM when the Bible was introduced. That output is probably good enough to make a PDF for those who want to read through it carefully and deliberatively (I guess we're looking at distributing by torrent after all).

    I'm embarrassed to say this, but I am having serious second thoughts about whether I should still take the time to try to group and filter changes by kind. Unfortunately I am right in the middle of a super-important project that I don't think I should postpone. This has been the case for a while, but I didn't mention it because I was planning to put that work on hold while I worked on this NWT project, but this weekend my business partner moved up the timetable that my project is connected to, and now I don't know if I can afford to delay my work on it by even a week.

    Just as importantly, it seems to me that a program that sorts changes by type will only be of limited use, as the importance of a change is not determined just by its nature, but by its position. For instance, an inserted comma could be incredibly important one time out of a hundred, and this important comma would be lost in the mix if I shunted all punctuation changes into their own group, because no one would have the patience to look through the Added/Removed Commas group to see if any were noteworthy. The same goes for other kinds of changes, whether it's dropping the progressive action tenses (mostly unimportant, but may occasionally be significant), or anything else. It might be better after all to just present all changes in one place, like MMM's last sample output above, and let the reader look up specific verses that are doctrinally significant and simply take in all the changes to those verses at once.

    I am irritated with myself for starting this project and then managing to contribute absolutely nothing. It paints me as much more of a flake than I actually am. But the timing of the project simply is very poor for me, as it turns out. I might still come back to the idea of grouping changes when time permits, but I think that for now I need to focus on the other project, which is potentially life-changing for me, if things pan out and if I give it my all. I am also going to disconnect from the forum at large again, as it takes too much of my time. However, I will continue to actively check this thread and respond as warranted.

    I'm very grateful that someone showed up who actually could do this work better than I could, plus had the time to do it! Thanks to MeanMrMustard for all his work so far. If anyone else wants to work on grouping and filtering changes, I would be glad to give advice, though I think the problem I outlined above is a significant one to overcome.

  • konceptual99
    konceptual99

    Wow - I've not checking on this thread for a while but it seems the project has really made significant progress. Top work!

  • besty
    besty

    one way to filter for important verses might be to ascertain the frequency of citation in WT Lib over the last few years and then check those high value targets first

  • Emery
    Emery

    amazing work guys, thanks for the hard work!

  • MeanMrMustard
    MeanMrMustard

    @besty:

    Interesting. Do you have a copy of the WT lib for download? I would be interested in any and all older versions too. I would love to see how they compare with one another (sneaking in changes).

    MMM

Share this

Google+
Pinterest
Reddit