Goal: To digitize all WT statistical data

by ILoveTTATT2 16 Replies latest jw friends

  • ILoveTTATT2
    ILoveTTATT2

    I have the goal of digitizing all WT Statistical Data, as much as possible, as far back as possible.

    User 88JM

    has already done from 1981-2016

    https://drive.google.com/open?id=0B_bzL2A4XRiodmp0OWpWUmJjaFU//

    I recently did 1980 and I had a friend help me with 1979 and 1978.

    Do we have any volunteers to help me with previous years??

    It's a lot of work, but if you know Excel you can make some Macros that will speed up the work.

  • ILoveTTATT2
    ILoveTTATT2

    Anyone?

    Could you please PM me if you'd like to help even with just one year, i.e. 1977, 1976, etc??

    I would really appreciate the help!

  • konceptual99
    konceptual99

    What's the process? How long does it take? Do you have to type in individual figures or proof read tables that have been OCRed?

  • ILoveTTATT2
    ILoveTTATT2

    Hi Konceptual99,

    It really depends on your skills in Excel. If all goes well, you just have to copy something that has been OCRed, paste it in Excel with the text paster that does the brunt of the work, and do clean up. About 80% of the data gets pulled in correctly.

    It takes about an hour to clean up, I use macros that speed up the process (such as recording a macro to shift left, insert a row of specific length, etc).

    As long as it is exactly what's on the report but in Excel, I can work with that to make it in the same format as 88JM.

  • darkspilver
    darkspilver

    Hey ILoveTTATT2

    ok, hope you don't mind me jumping in - I had a test run - so konceptual99 this is what I did....

    FIRST - Get the best original scan image that you can (that is also straight on the page)

    I used one that was 4623 x 3600 pixels (so that's a 16Mb image saved as a 2.9Mb jpeg) - http://imgur.com/a/hMSLM

    I edited to remove the horizontal lines (they confuse the OCR)

    SECOND - Convert to Excel format

    I used my favourite free online OCR website (your's might be different)

    http://www.onlineocr.net/

    Get it converted to Excel, and download in Excel

    TIP: you'll find that the OCR converts a few the of commas in the numbers to full-stops - so do a global search-and-replace

    Place each part of the converted chart all on to one excel sheet, with columns lined-up

    THIRD: Use the Excel column autosum feature to add up each column and compare with the given totals

    I think 8 out of 13 columns in my example added up as the given totals - so I assume the individual numbers are all good in those columns

    About 4 columns didn't add up correctly - but they had obvious errors when a cell had no number in it or something - easy to spot and correct - once done they added up as the given total

    One colum did NOT total up but was out by exactly '1,000' - a bit weird - I took a few minutes to double check the individual numbers and all seemed OK, so actually I think it is a WT mistake....

    End result is something like this: http://imgur.com/a/9C6mH

    Think it took about 30 minutes-all-in (I was interrupted by a phone call) but that is just converting hard-copy to electronic format which is kinda the easy bit because...

    What does take the time is the country-matching from year-to-year....

  • ILoveTTATT2
    ILoveTTATT2

    Thank you Darkspilver!

    However, that is per page... and there are about three pages per year so... 1.5 hours per year.

    The country matching from year to year... I have done that actually!

    You mean the names???

  • darkspilver
    darkspilver

    Hey ILoveTTATT2

    No, it was the whole year I did - hence being able to use Excel's column Autosum feature to match up the totals

    I copied the Excel sheet into my text editor, and did a global find-and-replace to replace all the 'tabs' with an 'equals' symbol (=) - hopefully you can copy the data below and do the reverse before pasting it back into Excel.....


    EXCEPT, the forum software keeps stripping the data out of my post



    START

    Country=1949 Av Pubs=1950 Av Pubs=% Inc over 1949=Peak Pubs 1950=Av Pio Pubs=No Public Meet'gs=No of Comp's=Total Literature=Total Hours=New Subs=Individual Magazines=Back-calls=Av Bible Studies
    U. S. of America=82,958=98,468=19=108,144=5,273=71,964=2,941=7,362,431=15,796,063=322,805=9,028,995=5,631,903=76,377
    Alaska=32=52=63=72=5=40=3=5,607=11,213=473=7,282=4,570=59
    Azores==5=New=8===1=234=511=4=16=155=2
    Bahamas=52=74=42=90=9=47=1=5,335=16,934=240=12,914=7,779=127
    Bermuda=8=3==5===1=27=181=3==159=2
    Fr. Equ. Africa=15=21=40=29==238=3=787=4,180=57=37=3,633=117
    Gambia, B.W.A.=3=2==2=2=3==1,083=2,405=80=447=1,426=10
    Guadeloupe=35=43=23=54=2=72=1=1,240=8,590=51=1,857=2,528=32
    Iceland=4=6=50=9=3=7=1=8,901=4,197=64=2,921=1,495=10
    Korea=13=35=169=61=8=2=1=14,690=15,911=16=484=5,979=119
    Liberia=18=30=67=36=8=9=2=7,411=15,147=428=5,107=8,830=151
    Martinique==6=New=7=4==1=1,630=5,655=12=2,063=2,255=32
    Palestine=12=18=50=25=3=16=5=2,270=7,105=41=619=1,418=17
    Portugal=30=48=60=58=1=16=1=2,895=5,976=140=358=2,820=19
    Sierra Leone=22=43=96=58=3=13=1=4,786=10,026=133=2,427=4,328=84
    Spain=53=79=49=93=1==6=2,615=9,300=73=628=4,326=41
    Argentina=1,135=1,292=14=1,416=74=979=58=60,870=247,150=3,495=153,320=112,693=973
    Australia=3,774=4,502=19=5,163=249=5,288=226=211,326=673,008=8,852=413,044=254,100=3,015
    Fiji=8=12=50=19=1=15=1=1,354=3,016=60=1,876=1,850=16
    North Borneo==2=New=2====78=77=17=2=34=1
    Republic of Indonesia=13=20=54=25=3==1=17,052=6,805=69=286=2,872=27
    Singapore=20=48=140=69=7=27=1=9,202=14,767=704=4,422=8,607=174
    Austria=1,615=2,162=34=2,377=70=1,939=143=206,464=378,829=3,347=333,745=176,038=1,406
    Belgium=1,617=2,150=33=2,462=87=1,069=69=135,661=341,551=2,737=70,201=111,728=1,446
    Luxembourg=53=79=49=88=4=53=5=3,472=17,173=113=9,364=6,766=69
    Bolivia=37=48=30=59=15=19=3=7,257=22,496=340=5,535=9,110=152
    Brazil=1,775=2,858=61=3,873=218=1,734=99=403,456=534,219=6,625=88,122=159,624=1,924
    British Guiana=187=206=10=244=29=217=15=18,332=53,966=515=22,910=19,312=315
    British Honduras=55=65=18=82=6=51=6=1,767=12,259=91=5,993=5,062=106
    British Isles=17,239=20,842=21=22,678=1,126=26,107=624=1,819,284=3,314,965=73,683=510,246=1,438,414=11,757
    Eire=73=94=29=108=35=157=5=13,168=61,465=565=5,272=19,811=140
    Malta=1=1==1====28=40==53=35=3
    British W. Ind.=1,057=1,520=44=1,701=110=2,053=53=56,094=310,321=2,590=88,369=116,618=1,996
    Burma=57=70=23=87=8=75=2=23,539=18,123=513=4,741=8,028=110
    Canada=14,305=16,013=12=18,709=770=9,691=627=470,814=2,079,891=34,456=1,410,539=596,504=7,833
    Chile=211=361=71=547=52=137=14=46,973=97,662=1,194=27,277=40,827=588
    China=86=46==132=10=56=4=7,173=15,297=256=1,554=6,454=115
    Colombia=97=144=48=162=16=36=5=26,561=38,577=280=8,175=16,110=205
    Costa Rica=943=1,139=21=1,345=44=447=32=18,506=161,049=852=25,457=50,913=1,061
    Cuba=5,485=6,619=21=7,505=349=3,216=178=109,838=899,099=3,761=127,278=249,496=4,184
    Cyprus=141=204=45=242=10=143=8=4,751=31,291=144=3,540=8,492=120
    Czechoslovakia=1,290=2,403=86=2,882=2=36=271=14,360=220,792==1,089=91,456=1,599
    Denmark=3,774=4,552=21=4,936=137=2,786=174=182,957=573,832=8,440=376,105=205,669=1,966
    Dominican Republic=216=245=13=292=42=189=8=14,309=73,451=491=15,831=31,474=521
    Ecuador=56=100=79=160=17=20=2=13,006=29,891=164=9,243=13,867=190
    Egypt=134=184=37=221=9=190=8=6,134=34,590=530=9,182=11,161=96
    Anglo-Egyptian Sudan==1=New=1=====155===41=
    Libya==3=New=5=1=1=1=174=409=8=91=274=5
    El Salvador=171=207=21=250=16=76=4=8,136=32,596=305=7,466=14,221=206
    Finland=3,293=3,985=21=4,354=207=6,209=391=204,999=607,013=9,124=173,163=185,312=2,319
    France=3,236=4,526=40=5,441=142=2,562=150=332,833=641,451=10,146=158,861=244,584=2,295
    Saar=326=441=35=549=8=267=12=37,236=77,755=1,166=17,720=42,836=259
    Germany=38,897=47,853=23=52,473=1,765=17,413=1,652=1,472,495=9,154,166=26,877=1,752,285=4,605,783=39,084
    Gold Coast=1,412=2,120=50=2,856=73=2,211=77=51,921=443,735=981=37,004=64,381=1,350
    Ivory Coast==2=New=2=1===23=404==22=145=7
    Greece=2,299=2,676=16=3,441=21=1,139=222=55,332=230,943=2,818=48,802=83,560=659
    Turkey=30=47=57=60=4=1=2=3,948=8,738=57=1,439=2,805=27
    Guatemala=188=210=12=286=20=155=6=17,804=38,820=774=8,511=18,578=327
    Haiti=58=86=48=99=12=94=5=10,451=26,174=199=4,844=10,283=180
    Hawaii=216=290=34=332=30=155=9=37,460=64,452=2,192=26,033=29,030=539
    Honduras=256=208==260=14=120=7=6,923=31,805=326=5,224=12,838=212
    Hungary=1,410=1,910=35=2,307=35==251=19,797=307,643===115,504=1,990
    India=293=376=28=401=26=438=30=31,117=87,446=1,283=25,353=26,618=388
    Ceylon=28=29=4=35=6=45=1=6,141=12,705=355=7,965=5,194=55
    Iran=2=1==1====114=193=7=93=61=1
    Pakistan=27=33=22=37=3=100=1=8,104=10,005=483=4,245=3,876=37
    Italy=593=1,005=69=1,211=47=379=87=90,283=157,107=1,116=12,581=60,309=665
    Jamaica=1,773=2,120=20=2,380=85=1,256=131=36,251=364,206=1,357=61,825=114,235=2,264
    Japan=9=106=1,078=169=20=380=5=36,345=50,148=51=2,626=16,889=401
    Latvia==6=New=6====4=227==22=128=2
    Lebanon=123=211=103=271=9=100=7=13,484=35,768=322=4,238=8,323=85
    Syria==36=89=46=2=2=3=526=5,276=15=444=851=12
    Mexico=5,547=6,669=20=8,052=245=431=344=220,660=983,218=7,727=164,664=231,538=4,541
    Netherlands=4,691=5,365=14=5,716=200=2,639=134=78,822=822,195=3,285=141,766=237,956=2,532
    Netherlands W. Ind.=73=102=40=121=6=121=2=16,614=18,465=933=18,273=9,066=112
    Newfoundland=110=151=37=222=15=180=15=11,704=33,992=696=16,910=10,910=137
    New Zealand=880=1,038=18=1,213=54=880=60=79,019=159,630=3,834=119,738=57,774=769
    Nicaragua=136=147=8=190=14=162=4=6,461=27,245=264=6,548=9,875=117
    Nigeria=6,711=7,549=12=8,370=282=9,356=337=84,320=1,441,451=1,983=103,378=211,747=2,685
    Cameroun=114=149=31=190=4=143=14=2,492=35,887=101=1,744=6,355=110
    Dahomey=155=170=10=290=6=107=5=598=50,488=15=466=6,745=126
    Fernando Po==3=New=7=1==1=113=1,053===153=
    French Togoland=1=1=New=1=1====100===13=4
    Northern Rhodesia=12,857=13,560=6=15,837=17=673=265=77,808=2,627,315=1,021=8,590=389,721=6,315
    Belgian Congo=22=36=64=71===1=236=3,236===863=19
    Kenya==2=New=3====336=290=12=212=183=2
    Tanganyika=89=75==113=1=19=11=935=21,730=6=60=5,248=114
    Uganda==2=New=2====65=69=5=1=21=
    Norway=1,226=1,465=19=1,647=45=998=106=158,172=181,610=3,816=129,127=70,613=555
    Nyasaland=6,833=8,310=22=10,336=107=16,065=610=47,195=1,984,843=674=8,030=429,347=6,841
    Portuguese E. Afr.=318=273==352==276=21=728=53,702=19=132=13,634=193
    Panama=375=461=23=496=41=403=14=24,769=97,678=1,114=35,618=46,155=811
    Paraguay=67=105=55=133=9=24=11=5,036=20,822=288=5,829=6,696=57
    Peru=67=114=70=147=35=42=5=26,300=53,811=375=8,307=22,715=370
    Philippine Republic=5,763=8,648=50=10,055=252=2,850=345=264,928=1,155,139=5,267=39,659=184,375=2,582
    Poland=12,162=14,900=23=18,116=236=9,070=864=106,682=1,523,124=14,622=139,053=350,049=6,874
    Puerto Rico=213=306=44=359=43=204=10=39,639=79,713=1,851=31,192=35,215=601
    Virgin Islands=47=55=17=65=4=61=2=5,839=11,244=564=4,316=5,319=95
    Romania=2,612=2,832=8=4,361=15==389=938=237,274==316=121,476=6,855
    South Africa=5,506=7,074=28=7,658=456=5,223=330=350,604=1,848,838=13,821=281,268=453,341=6,182
    Angola==9=New=14==6=1=20=1,681===464=8
    Basutoland=8=14=75=30=5=3=3=233=9,263=1=43=2,114=32
    Bechuanaland=29=67=131=112=4=37=4=133=12,589=25=66=2,045=50
    St. Helena=10=10==12==12=1=92=600==27=107=2
    South-West Africa==7=New=15=3=6=3=4,896=3,756=244=1,527=1,724=22
    Swaziland=33=60=82=104=2=49=5=102=21,094=6=4=4,689=79
    Southern Rhodesia=4,786=5,773=21=7,060=296=5,193=161=97,437=1,499,070=1,443=44,470=276,113=6,227
    Surinam=74=55==67=10=82=2=2,821=16,360=45=4,240=7,679=156
    French Guiana=1=1==1====11=94===31=1
    Sweden=3,702=4,244=14=4,460=178=3,894=372=197,624=603,128=12,786=527,762=237,884=2,167
    Switzerland=1,933=2,247=16=2,394=58=1,959=96=140,600=305,485=4,281=335,061=136,455=1,675
    Thailand=62=71=15=89=14=60=6=31,007=21,564=363=3,908=6,803=68
    Uruguay=304=404=33=468=35=120=12=21,300=89,872=784=19,394=38,842=597
    Venezuela=91=224=146=353=30=333=7=35,560=64,730=357=15,129=24,770=271
    Yugoslavia=460=422==517====3,940=15,611===2,761=322
    Miscellaneous=8,004============
    TOTALS=279,421=328,572==373,430=14,093=223,941=13,238=15,954,418=54,707,445=622,094=17,376,611=18,782



  • darkspilver
    darkspilver

    Think I've cracked the pasting into the post - can you see the data in my above post ok now??

    I also used the chart from the WT and not the yearbook - it was just a test - I presume more recent years will be progressively harder as more and more countries are added, so they will take longer

    And, yes, the names! - it's an interesting geography lesson in changing names of countries etc - I also struggle with how the WT has actually listed the countries, along with the indented countries..... and of course they listed USA at the top....

  • ILoveTTATT2
    ILoveTTATT2

    Hi Darkspilver,

    Could you upload the excel spreadsheet onto a Google Drive and share the link?

    I have the list of countries as listed in the yearbook, alphabetically, and also the years in which they appeared.

    (I am going to upload it soon, just needs some tweaking).

  • OrphanCrow
    OrphanCrow
    I have the list of countries as listed in the yearbook...

    Haha! Yeah..."lands", according to the org

    A person has to be careful when extrapolating figures from the yearbooks so that actual countries are represented. For example, for many years Newfoundland reported their own figures that were separate from the Canadian report. Likewise with Alaska, Hawaii etc

    I recently did some number crunching using the old yearbook numbers and it was really frustrating to deal with the way that the org breaks down the figures into their idea of what makes up a "land" as opposed to a country

    I remember Angus Stewart touching briefly on this issue during Jackson's testimony and I was puzzled about it at the time. After I worked with the yearbook figures, it occurred to me that Stewart may have been trying to highlight how the org doesn't respect secular boundaries

Share this

Google+
Pinterest
Reddit