Comprehensive NWT Comparison Project (calling all technically skilled members)

by Apognophos 223 Replies latest watchtower bible

  • besty
    besty

    wow

  • MeanMrMustard
    MeanMrMustard

    As another update! - DONE! I have both versions. Now on to diffing th text.

    Apognophos, I was thinking of using google-diff-match-patch. There are articles on turning it into word level diff with just a short patch. Plus it seems to be more of what we want: http://code.google.com/p/google-diff-match-patch/

    I am also going to try to work on dumping my DB to files, one file, one verse. There will be 62,206 files when all is said and done. I'm not sure how to share such a thing at the moment...

    MMM

    (also: my last two posts, made late, had a few spelling mistakes. When I said that John 8 1-11 are "not left completely out" of 2013 version, the "not" should be "now". " partse" should be "parse" )

  • besty
    besty

    MMM - did it seem to you the WTS had gone to extra lengths to prevent what you have done? Speculate why?

  • jgnat
    jgnat

    In my limited knowledge, it seems to me that MMM and Simon are saying that they protected it stupid. They encrypted each verse separately, but with the same code. Once the code is broken, the entire package unwraps. But all that extra coding and encrypting means the PDF bible is far bigger than it needs to be.

    Now, I am not in IT, so if I got anything of this wrong, feel free to correct me.

  • Apognophos
    Apognophos

    Cool. We might be able to use that patch file approach, though as you said, we'll have to fiddle with it to get the right results. I'd like to try a few things, would you mind providing the verse text files to me privately (since it's copyrighted material)? I guess they shouldn't be too big if they are zipped together. I'd like to experiment with diffing, and grouping the diffs by kind, and seeing how reader-friendly I can make it.

  • Terry
    Terry

    For the non-cognoscente, what are the practical uses of this hack?

    How does the Ex-JW community benefit?

    If it proves to be very useful, shouldn't somebody use the temporary vulnerability to maximum effect before it gets locked down and changed?

  • Simon
    Simon

    google-diff-match-patch is great and there's lots of support for it. Not sure if word level is necessary - doing a line level diff (i.e. each verse) is probably better for a first pass and then more specific diffs can be done of those later if needed but most times someone will need to read the whole verse and see the change, less so single words in isolation.

    In my limited knowledge, it seems to me that MMM and Simon are saying that they protected it stupid. They encrypted each verse separately, but with the same code. Once the code is broken, the entire package unwraps. But all that extra coding and encrypting means the PDF bible is far bigger than it needs to be.

    That's kind of expected. It would be weird and slow to have a separate key for every verse although some salt / hash could be used to make things like this more difficult. Ultimately it's a tradeoff between protection and performance.

    It is a little weird that they seem so protective of the text of "Gods word" which surely, no religious order would claim to own or have copyright on ... would they?

    So yes, compressed plain-text would be better than encrypting it which will typically reduce the effectiveness of the compression by increasing the entropy (or is it reducing the entropy? I always mix those up!)

  • Apognophos
    Apognophos

    Terry: Well, there's a few different things going on here, but they were all just tangential to getting the text of the Bible into a format that we could perform comparisons on, which text is already available to anyone who wants to download the Society's PDF of the Bible or read it online. I assume the hack you are referring to is MMM figuring out the PUB format used for the JW Library's NWT. This is only primarily useful for getting that text in a format that we needed it to be in, however if the WT Library CD-ROM's files are in the same format, this would allow us to rip that text too.

    I only see this as potentially useful for posting online the older publications that are in public domain. One site I found provides scanned images of the public domain literature, which would benefit from having plain-text companion versions because they cannot be searched. That being said, I think that there are probably plain-text versions out there, since I see members here posting text from pre-1950 literature. So, in short, there's probably no direct benefit to the community from any of the work so far :-)

    The key is now taking the Bible texts and making a comparison in the way that people want to see it, which includes straining out the more interesting changes from the boring ones. I'm open to suggestions from everyone on how they would want to read the list of changes, keeping in mind that we can't distribute the full Bible texts. At this point I'm thinking the final product will probably be a word-processing document, or a separate document for each kind of change, which lists the verses that changed. Although I will attempt to give the reader a headstart by screening out minor changes, it will ultimately be up to the reader to determine how significant a change is. For instance, a moved comma could be minor, or incredibly significant ("Truly I tell you today...").

    No matter the format, I don't see the average ex-JW being able to quickly scan through a list of changes generated by our work, and getting any benefit. Being that the Bible is a complex subject, it will take someone with understanding ("Let the reader use discernment!") to grasp what changes are important and to then tell others about them. For instance, Jeffro has already looked at the scriptures in the revised NWT that concern the 70-year prophecy to see if anything important changed; it takes specialized knowledge to understand whether a change in one of those verses is significant. So in that sense, the product of this work will probably have to undergo a trickle-down effect of gradual analysis by individual readers before the average member of the forum learns anything interesting.

  • MeanMrMustard
    MeanMrMustard

    MMM - did it seem to you the WTS had gone to extra lengths to prevent what you have done? Speculate why?

    It is somewhat reasonable to think that they don't want to let other developers create programs against their data. So if they come out with an android app for the NWTx2, the WT wants people to download their app, not Mr. ex-JW that might link to other resources from jwfacts.com. :)

    But then I catch myself... if that was their reason, they undermine the entire thing by posting the Bible online. In the end, even after getting past the DB encryption, I ended up just taking the text from the web version. Why? It was a lot easier - the text is right there for anyone to get. All I had to do was download it via HTTP and use standard objects to pull the text from the HTML. Why does this matter? Well, if I were making an android (or PC) app, I don't have to hack anything - I would simply link to the scripture and download it live from jw.org (and then insert commentary from jwfacts).

    Ultimately, I don't know what it gains them to encrypt it.

    As I mentioned before, even the WT library is vunerable. I don't need to know how the PUB files in the WB library are formatted to read the content there. Why? Because I know that the WT library program itself will decrypt it and display it in a window, and I know that you can copy the text to the clipboard. You think that can't be automted? PFFT! It surely can. I can create my own WT lib database if I wanted to without knowing any of the encryption.

    So you might be asking why I spent time cracking the DB in the first place? Because the NWTx2 is new and they didn't have an easier way of getting it. The PDF was horrible to parse through, and it looked like my only choice until castthefirststone brought our attention to the android DB of it. If the DB was cracked then I could pull the text out of there without all of the PDF parsing issues that would most likely cause errors to appear in the text.

    But, at the last moment, the WT put out an online version...

    MMM

  • MeanMrMustard
    MeanMrMustard

    For the non-cognoscente, what are the practical uses of this hack?

    How does the Ex-JW community benefit?

    If it proves to be very useful, shouldn't somebody use the temporary vulnerability to maximum effect before it gets locked down and changed?

    I don't know about anyone else, but it gave me this sort of feeling toward the WT:

    http://www.youtube.com/watch?v=FWBUl7oT9sA

    Benefical? Emotionally - very much so.

    MMM

Share this

Google+
Pinterest
Reddit