Corpus Linguistics - Text Analysis of All Watchtower Publications since 1800's

by objectivetruth 48 Replies latest watchtower beliefs

  • LogCon
    LogCon

    m

  • EndofMysteries
    EndofMysteries

    Nice project. marked

  • Julia Orwell
    Julia Orwell

    Cool, u should make your database thing into a website. Also, I noticed in those old magazines you posted they don't use Jehovah. They are always saying "the Lord". Guess they got new light on what god wants to be called under Rutherford eh?

  • objectivetruth
    objectivetruth

    Update for everyone..

    I have found a MUCH BETTER method to convert old Watchtower Documents that are in PDF Images to Rich Text.

    This will make the Process much Faster and effective.

    I have also partnered with someone else on the site, who has a Domain and who is a Web Developer.

  • Quendi
    Quendi

    Interesting research and I will be very happy to see what fruitage it will bear. Jwfacts raises an interesting question about when did Jehovah gain ascendancy over Jesus in Witness theology. I wouldn't venture to be precise, but my guess would be this trend became pronounced with the publication of the 1 January 1926 Watchtower. That issue had as its lead article "Who Will Honor Jehovah?" and I remember other WTS publications citing this as a very important development in the changes that came with Rutherford's presidency. The climax came with the 1931 Columbus, Ohio convention when the new name "Jehovah's Witnesses" was unveiled.

    Quendi

  • smiddy
    smiddy

    I look forward to seeing this project completed .

    smiddy

  • likeabird
    likeabird

    Hi objective truth!

    Great project!

    You probably already know this, but I'll ask anyway ;-) Did you see the work already done to analyse texts online. I came across this site a few months ago while doing some research on text analysis : http://www.wordandphrase.info/analyzeText.asp

    The guy behind it also specialises in corpus linguistics.

  • Simon
    Simon

    There are a few good tools / apps for dealing with text.

    Natural Language Toolkit (python) is goof and there are equivalents for other platforms. Also worth looking at is Pattern which has document analysis / clustering tools.

    ElasticSearch can also provide good insights into lots of text (nice app built on top of Apache Lucene).

    Usually, dropping stopwords helps reduce a lot of noise from any corpus and lets you focus on the meaningful words and phrases (N-Grams) which can then be mapped usage-over-time (see when things first appeared and whether they are growing in use or declining etc...).

    Really interesting project !

  • objectivetruth
    objectivetruth

    Thanks For your advice likeabird & Simon.

    likeabird - I had not seen this tool, but it's very encouraging.. The tool has MUCH more benefit, if it is readily accessible to anyone that wants to find something or research a trend. I'll see if this tool may be a solution for the future online version of the tool.

    Simon - I will look at each of your recommended tools, thank you very much.

    Currently Im hung up on the OCR recognition Step.

    i was first using Adobe Acrobat Pro, but it didn't produce reliable results. (Too many erroneous characters, misspelled words, lost text)

    I then moved onto to "Tesseract-OCR" made by Google. This provided Better results, but still not optimal.

    Im now experimenting with a tool called "Abbyy" this tool provides the Quickest and most accurate text recognition, but it's still not perfect.

    If any one has tips or experience with Batch OCR recognition and Batch Spell Checking please let me know.

    Once I have a satisfactory Database of Text Watchtowers I will export All Words, along with Date of publishing to excel. When in Excel I will run "COUNTIF" (Or similar) statements using words like Jehovah/Jesus - The result will be Line Chart of words used by Year showing increases and decreases by Year.

  • frankiespeakin
    frankiespeakin

    This should be interesting to chart.

Share this

Google+
Pinterest
Reddit