Comparing WT Library - 2011 vs. 2012

by MeanMrMustard 31 Replies latest watchtower bible

  • MeanMrMustard
    MeanMrMustard

    Hello All,

    A couple of years ago, I participated in a thread on this site comparing the old NWT (1984) to the new NWT (2013). I was able to produce a NWT mash up PDF showing the side-by-side differences between the two versions of the NWT.

    Here

    This site claims it will hold the file for download for a long time. Don’t click the Download button, however. Click the file link toward the middle.

    NWT Compare

    The project was originally started as a way to see what kind of new interpretations the WT was trying to slip into the Bible text; however, the results were ambiguous. There were too many minor changes, and any large change in meaning or interpretation was drowned out by a sea of meaningless modifications. Nevertheless, the NWT Diff project did give me an idea for a new project, one that employed the same diff code, but would produce more interesting results. The idea: Compare the WT Library from one year to the next. That is, see what changes/revisions occur to the content within the WT library from year-to-year. This would especially be interesting for the content of older publications. After all, why would the WT fuss with the language of a 1980s WT, for example?

    Also, to me, producing the programming to tackle this problem was the main driver. It could be possible that after all the work is done, no surprising results would be found – perhaps all minor changes. In this case, I would still consider the project a success. Solving the problem of getting all the content of the WT library in two different years and producing a change set between the two, to me, would be the goal. If you are the type that likes computer programming, and you are wondering how one might go about getting the content of the entire WT library, and doing a compare, then this might be the thread for you. Also, if you are interested in taking this project further, I can share the code. For fun, I did the WT library export in C# and the Diff program in VB.NET. If you want the code, you will need a version of Visual Studio 2010 or above. You will have to get your own version of the WT library.

    I started the project a few days after the NWT mash up, and a couple weeks later, I had a working version. However, I switched jobs, and life got very busy. So, the project was pushed onto a shelf. I almost completely forgot about it. But, the other night, while watching a nice warm fire, with my fat cat sleeping on my lap, I suddenly remembered the work I had done on the project. So, I decided to clean up what I had, and post it. So what follows are some of the results. For the project, I originally used the 2011 and 2012 WT library. I think I picked those versions initially because I was going to increment through new versions (2012 vs 2013, etc).

    The project was broken down into two parts. PART 1: get all of the content from the WT into a readable format. Early on, I decided if I could export the WT into a folder structure that mimicked the WT library structure, with plain text files, then that would be ideal. I would compare file-to-file with the same name on different versions. PART 2: create a comparison program to traverse the WT library exported file structure and produce files for any differences found. If no differences are found, then produce no output for that content.

    PART 1 – Getting All the Content of the WT Library

    The WT library encrypts the content of the WT library files. Some work was done on the previous Comprehensive NWT thread to decrypt the contents of the WT files directly by using the java code behind the mobile app. However, I chose to go a different route this time. I chose to automate the WT library. Getting the window handle of the top level WT window, the left list ListView, and the middle content window, I was able to force the WT library to navigate itself through the entire library tree. Once each article is displayed in the content window, I used the clipboard to get it out. This was accomplished by using standard Windows API calls to send the proper integer messages to the appropriate windows.

    You can see the program in action below. It finds the WT ListView window, displaying the window handles. It then flashes the WT ListView confirming to the user that it found the right. The processing begins and the text is extracted. The full extraction runs overnight.

    Video 1


    Memory Leaks

    As it turns out, the WT library never actually releases the memory is consumes when it loads content from an encrypted library file. Why doesn’t it ever become visible to the user? Because the user would have to open hundreds and hundreds of articles before the memory usage becomes burdensome on the system. As soon as the user closes the WT library, then all the memory is naturally released, as the OS reclaims everything the process had reserved.

    But for an automation program, it presents a problem. I want to dump the entire WT library. As the program runs, the memory usage rises and reaches a critical point. The OS steps in and kills the process. When the WT library is killed, the location in the hierarchy goes away, as well as the window handle. I got around it by keeping track of the location in the WT hierarchy and then re-executing the program when I detected my window handles became invalid.

    Preview (you can see the memory increase, and then the WT is killed. The traversal program brings it back):

    Video 2


    PART 2 – Calculating the Differences

    Two traversals are needed, one for WT 2011 and one for WT 2012, and both take about 12 hours. But once it is exported, we don’t have to export again. There will be some expected differences. For example, between 2011 and 2012, there will be new entries for the daily text. There may be some publications removed, perhaps to save space on the CD. Also, some folders have a date range, like 1984-2011. In the 2012 version that folder will be different: 1984-2012. The WT versions also have some insignificant differences. Some Greek letters were changed from version to version – the Greek character is the same, but the Unicode value used is a bit different. So the Diff program detects a change. I included a place to ignore certain changes.

    I decided to do a character level difference, then take the markup and mash it up, and export the changes only, with some text before and after. This way I could produce a small library of changes I could post here, but not be worried about copyright issues. These are just small quotes around the differences only.

    An example of the output is below. It comes from “God’s Love”, 2008 pp 144-159:

    The red characters are removed from the 2011 version. The green characters are included in the 2012 version. The white characters exist in both. If you take your time, you can make it out. But if you reference the WT library itself, this is the change:

    2011 version:

    Easter has also been linked to the worship of the Phoenician fertility goddess, Astarte, who had as her symbols the egg and the hare. Statues of Astarte have variously depicted her as having exaggerated sex organs or with a rabbit beside her and an egg in her hand.

    2012 version:

    Eostre (or Eastre) was also a fertility goddess. According to The Dictionary of Mythology, “she owned a hare in the moon which loved eggs and she was sometimes depicted as having the head of a hare.”

    I wonder if they found the 2011 version to be inaccurate in some way.

    Below is the link to the results. It contains a ZIP file. Inside contains the structure of the WT library with all differences logged in individual files. If there is no file or folder, it means there were no changes between the versions for that particular part of the WT library.

    Results Click Here

    MMM

  • Londo111
    Londo111

    Amazing work!

  • Anders Andersen
    Anders Andersen

    Awesome!

    I have been thinking about doing something like this from quite some time, never got to it.

    I'll be studying your result now :-)

  • ab.ortega
    ab.ortega
    Fantastic!
  • Anders Andersen
    Anders Andersen

    So after a first quick look, I have to say: very well done! Thanks!

    I noted a lot of changes seem to be just spaces and punctuation (e.g. a dot changed into a comma).

    Would it be a lot of work to rerun the diff checker while excluding these (almost certainly) meaningless changes?

    It would be much easier to find the really interesting changes.

    Also, would you be willing to share the wtlib contents you harvested? A lot of interesting analysis could be done on them :-D

    PM me if your public response is no ;-)

  • ab.ortega
    ab.ortega

    The watchtower 1988: "Unmasking the serpent" paragraph 8

    2011 says: " Had it been otherwise, it would not have been a true test of Job's integrity." - Job 1:21; 2:9, 10.

    2012 says: "Thus, this became a very severe test of Job's integrity."- Job 1:21; 2:9, 10.

    Ther watchtower 2010: "Balsam of Gilead - The Balm that Heals"

    They removed "at the beginning" from: But it was not left unanswered. In the synagogue in Nazareth at the beginning of the year 30 C.E., Jesus read from the scroll of Isaiah, saying:

  • darkspilver
    darkspilver

    Hi ab.ortega!

    The watchtower 1988: "Unmasking the serpent" paragraph 8

    2011 says: " Had it been otherwise, it would not have been a true test of Job's integrity." - Job 1:21; 2:9, 10.

    2012 says: "Thus, this became a very severe test of Job's integrity."- Job 1:21; 2:9, 10.


    Original individual magazine version:

    Watchtower 1 September 1988, page 10, paragraph 8, end

    "Had it been otherwise, it would not have been a true test of Job's integrity."

    Bound volume version:

    Watchtower 1 September 1988, page 10, paragraph 8, end

    "Thus, this became a very severe test of Job’s integrity."



  • darkspilver
    darkspilver

    Hi ab.ortega!

    Ther watchtower 2010: "Balsam of Gilead - The Balm that Heals"

    They removed "at the beginning" from: But it was not left unanswered. In the synagogue in Nazareth at the beginning of the year 30 C.E., Jesus read from the scroll of Isaiah, saying:

    Original individual magazine version:

    Watchtower 1 June 2010, page 22, paragraph 5

    "In the synagogue in Nazareth at the beginning of the year 30 C.E., Jesus read from the scroll of Isaiah, saying"


    New CD ROM version:

    Watchtower 1 June 2010, page 22, paragraph 5

    "In the synagogue in Nazareth in the year 30 C.E., Jesus read from the scroll of Isaiah, saying"
  • wifibandit
    wifibandit

    Top Notch work MeanMrMustard! You really are trying to save paper.

    Just to keep the links alive:

    NWT Compare

  • MeanMrMustard
    MeanMrMustard

    @Anders Anderson,

    Sure I have no problem sharing. I will be posting the code soon. I am going to remove all the bin and obj files and post soon. I imagine anything above Visual Studio 2010 will work. This was originally done on a Windows 7 virtual in VS 2010 (you can see it in the videos).

    The WT library contents themselves are about 300 MB unzipped, 164 MB zipped. This is somewhat of a larger upload. But I will give it a shot at some point here. The code is definitely going up soon.

    About the small changes: Yes, I ran across a lot of those. I filter a great deal out. Since the diff code works on the character level, it detects all these annoying little changes. As I mentioned, the WT, in some cases, decided to change the unicode characters it used for the Greek letters. The characters render the same, but are different byte-wise. You have to handle these things on a case-by-case basis. When you see an annoying little change, you can code an exception and it will ignore it.

    Note: if you have your own version of WT library, and you get the code running, you can export your own copy of the WT library.

    I think at some point I will post just the raw program itself. This way a non-programmer could get their own copy (for fun I guess?)

    The code has some fun little gems in it, for those Windows developers that want to mess with the API. For example, you can iterate through the index numbers of a ListView without a lot of trouble. But as soon as you want to get the text value of that ListViewItem, it gets harder. The Windows API allows you to allocate memory and provide the memory address so that it can be filled with the item text. But that allocated memory has to be a part of the ListView process - in this case, the WT library. So at one point I had to allocate virtual memory in the WT library process and then copy the memory back into the automated program.

    MMM

Share this

Google+
Pinterest
Reddit