Comprehensive NWT Comparison Project (calling all technically skilled members)

by Apognophos 223 Replies latest watchtower bible

  • MeanMrMustard
    MeanMrMustard

    I was going to extract each verse into a database (I have access to many-a-sql-server) from the PDF and the website. Normalize the verse - that is take out all of those cross referencing symbols and make sure spaces are the uniform (I'm sure extracting the text from the PDF might cause excessive whitespace). Then using a diff program like Simon suggests, and finally just displaying it all back out in another PDF, formatted with OLD verse and NEW verse side by side. Highlight the changes.

    Buuuuttt.. you want to categorize the changes it looks like. That's what it seems like to me... some change types are more interesting to you, I guess?

  • MeanMrMustard
    MeanMrMustard

    Just a thought... I suggest concentrating on the web version of the 1984 edition first... automate a dump of that data. It won't be long before they take it down, I would assume. If they do, it will become a lot harder to get that text (you would have to get it out of WTLIB - BLA!)

    Edit: ok, found my PDF parser... trying it out

  • Apognophos
    Apognophos

    Well, as I mentioned in my first post ;) the Society made a ton of changes to streamline the wording. Past continuous tense verbs like "had been going" might become "went", etc. etc. Keep in mind that they took like 200,000 words out of the NWT (I forget the actual number now), so a lot of those changes will not be the slightest bit interesting and will inflate the size of the "Bible changes" document to the point where no one will be willing to read through it looking for the meaty bits. Thus, filtering is actually necessary if we're going to produce a useful changeset.

    I do like the idea of outputting from a diff program, if we can get something that visually looks like what you see in the diff program itself. I haven't seen that done, so I didn't think of it.

    I suggest concentrating on the web version of the 1984 edition first

    You did see the PDF download link for the 1984 version, on the right there, right?

  • MeanMrMustard
    MeanMrMustard

    Yes, I saw the PDF link to the 1984 version, but to be honest, the web version provides a much easier way of getting that text - no middle cross reference. If they would just put the NWTx2 on the web, I would get it from there and not the PDF...

    Apognophos, can you answer this: what type of changes would be interesting to you? What would constitute an interesting change? I have a feeling that question is a lot harder to answer than you think...

    MMM

  • Apognophos
    Apognophos

    Yes, I saw the PDF link to the 1984 version, but to be honest, the web version provides a much easier way of getting that text - no middle cross reference. If they would just put the NWTx2 on the web, I would get it from there and not the PDF...

    Ah, okay. Well, by all means, if you prefer to grab the text from the browseable versions of the Bibles, we can just grab the 1984 version for now and put the project on hold until the new NWT appears there. I don't see this project as super-urgent, and they'll probably have it up soon anyway.

    Apognophos, can you answer this: what type of changes would be interesting to you? What would constitute an interesting change? I have a feeling that question is a lot harder to answer than you think...

    "I'll know it when I see it." :) I think step 1 needs to be done before I can even begin to figure out how to group/categorize the changes into something people want to see. But the short answer is that an interesting change is what you have left over after you filter out all the boring changes. My theory is that the boring changes can be summed up in a file with some simple pattern matching like my earlier example.

  • Apognophos
    Apognophos

    Just to be clear, I was going to attempt this whole project myself, but I thought I should check first if anyone else had more expertise or better ideas than me. So whatever I can do to help with step 1 (if that's grabbing the 1984 browseable text, or anything else), I'll be glad to do. I just don't want you to feel like I'm kicking back waiting for you to do the work, MMM :-) I'll do whatever parts anyone else doesn't want to do.

  • DS211
    DS211

    All you need to do is use the interlinear on the JW app haha and most truth comes out. And you can compare the old reference bible to the Revised NWT

  • MeanMrMustard
    MeanMrMustard

    " All you need to do is use the interlinear on the JW app haha and most truth comes out. And you can compare the old reference bible to the Revised NWT"

    Yeah, but then that would deny us programmers the opportunity to do something colorful and useless... don't rob us of that.

    MMM

  • MeanMrMustard
    MeanMrMustard

    BTW- I should be able to dump the text from the webite. They surround the bible text in a <div> tag with an id "bibleText" ... I love it when they make it easy. I can now loop through all the books and chapters and extract the text (no html)... Just need to write to a file, normalize, and insert into my DB. Then onto the PDF.

    One thing to note... The old version of the NWT puts brackets around the words it inserted into the text. For example: "In [the] beginning God created the heavens and the earth", where "the" doesn't appear in the original and they admit that via the brackets. Now, "the" is still there, but the brackets are gone.

    That might be an interesting change... all of the previously inserted words that are still there in NWTv2, but they hide the insert by excluding the brackets.

    MMM

  • DS211
    DS211

    Haha true! evrrything weve been robbed of i wouldnt conceive of it

Share this

Google+
Pinterest
Reddit