Comprehensive NWT Comparison Project (calling all technically skilled members)

by Apognophos 223 Replies latest watchtower bible

  • trackregister99
    trackregister99

    Hello. The web pdf version is the 2006 edition, not the 1984 edition.

    Col 1:16:

    2006 edition: "all other things"

    1984 edition: "all [other] things"

  • MeanMrMustard
    MeanMrMustard

    @trackregister99:

    The web version that I parsed through and recorded (from jw.org, not the PDF) for the "old text", the text before the current 2013 version, has "all [other] things".

    MMM

  • trackregister99
    trackregister99

    Please, you can share the dump for have the two versions?

    WinMerge can use for compare two plain texts.

  • MeanMrMustard
    MeanMrMustard

    @trackregister99:

    I can try to do a dump from my database to a series of text files for just the old version. I do not have the new version. So far the WT has only released this in a PDF version. If you go back you can see there are some difficulties extracting the text from the PDF. So as of now, there is nothing to compare the old version with...

    MMM

  • Django_Unchained
    Django_Unchained

    i'm very much a novice at programming, so if i ask something stupid...lol keep in mind it's coming from a newb :P

    looking at the first page, i see the / marks. would it be possible to take the text from the new version and seperate it into csv to delete the column you don't want?

    i would love to help, but this would be a learning experience for me and i'd probably slow you guys down

  • MeanMrMustard
    MeanMrMustard

    @Django_Unchained:

    No, its not that simple. I can extract the text, and I can even get that nasty middle column out of there. If I convert to an HTML format first, the "Sect" <div> tags hold the column information. But there are several other problems after that.

    1) Sometimes words are joined together.

    2) The cross reference symbols are letter characters so that we end up with many words spelled incorrectly.

    MMM

  • MeanMrMustard
    MeanMrMustard

    Apognophos,

    So I see why it was so hard to extract. The text from the pdf is in tiny tiny chunks. What I mean, the text chunks are not lines, not even words, just a character here and there. So if you were to loop through the text chunks, you are confronted with a horrible task: where are the word breaks? I fussed with it a bit and was able to extract based on position of each chunk. If the beginning of the chunk was pushed over past the character space width, then it is considered a new word. Same for the new line. If the vertical changes, it's a new word. Take a look at the screen shot. I fixed the "acrossthe" issue because the chunks change on the vertical. BUT notice the other red arrows. The WTB&TS justify their text. So on one line they have words that are very spaced out - great for detecting word breaks. But the following line the words are pushed together to such an extent that the whitespace is actually smaller than some printable characters. This causes everything to stick together. Horrible, horrible parsing problem.

    MMM

  • slimboyfat
    slimboyfat

    Maybe you could contact Watchtower headquarters and ask if they could send you the two versions in a format you could easily compare.

  • MeanMrMustard
    MeanMrMustard

    @slimfatboy:

    Ug... that seems like it would almost be worse than going to a Sunday meeting.

    I can see it now: "No, no... I just need the new one..... Yeah, I have the old one. I parsed it from your web page yesterday..... Oh sorry, I didn't mean to cause all the traffic trouble, I didn't think it would make it hard for elders to log in... ... SO you won't give it to me?"

    MMM

  • MeanMrMustard
    MeanMrMustard

    FYI:

    I was able to get around it by carefully choosing to split words by testing on standard white space length for the font / 2.6:

    20 Then God said: “Let the
    waters swarm with living crea-
    tures, and let flying crea-
    tures fly above the earth across
    the expanse of the heavens.”
    a
    21 And God created the great
    sea creatures and all living
    creatures that move and swarm
    in the waters according to their
    kinds and every winged flying
    creature according to its kind.
    And God saw that it was good.
    22 With that God blessed them,
    saying: “Be fruitful and become
    many and fill the waters of the
    sea,
    b
    and let the flying crea-
    tures become many in the earth.”
    23 And there was evening and
    there was morning, a fifth day.
    24 Then God said: “Let the
    earth bring forth living crea-
    tures according to their kinds,
    domestic animals and creeping
    animals and wild animals of the
    earth according to their kinds.”
    c

    MMM

Share this

Google+
Pinterest
Reddit