jw-media.org robots

by izobcenec 9 Replies latest jw friends

  • izobcenec
    izobcenec

    The Standard for Robot Exclusion (SRE) is a means by which web site owners can instruct automated systems not to crawl their sites. Web site owners can specify files or directories that are allowed or disallowed from a crawl, and they can even create specific rules for different automated crawlers. All of this information is contained in a file called robots.txt. While robots.txt has been adopted as the universal standard for robot exclusion, compliance with robots.txt is strictly voluntary. In fact most web sites do not have a robots.txt file, and many web crawlers are not programmed to obey the instructions.

    and here is the robots.txt file for jw-media.org site:

    User-agent: * Disallow: /scripts/ Disallow: /images/ Disallow: /releases/ Disallow: /video/
  • Winston Smith :>D
    Winston Smith :>D

    So what are the implications of these instructions to something like Goggle?

    what is the affect?

  • RevMalk
    RevMalk

    it just means that the webmaster of jw-media.org doesn't want google (among other search engines) to spider and index those directories;

    jw-media.org/scripts/
    jw-media.org/images/
    jw-media.org/releases/
    jw-media.org/video/

  • Crit
    Crit

    Very interesting. Does it allow indexing of anything?

  • RevMalk
    RevMalk

    Sure, just not those directories. The robots are to stop the spidering, which some search engines perform to index URLs. So, URLs within these directories may be indexed in some search engines, but few and far between. Most likely they don't want to use up bandwidth, since these folders contain large multimedia files.

    I don't know what they have in scripts or releases, but the image files and the video files can use up some bandwidth if the search engines were hitting them often enough, and google for one comes around quite often. There's really no need to index images, and keeping your image folder in your robots file also cuts down on people swiping them.

  • Crit
    Crit

    I am in the know on this subject. I will go grab the txt file myself and the header info and see if they are deliberately keeping the search engine bots (that respect the instructions) from spidering the entire site or just those directories. It will be interesting if they are not 'inviting' them either...

  • drwtsn32
    drwtsn32

    Seems like a reasonable robots.txt file to me... what's the big deal?

  • izobcenec
    izobcenec

    There is no big deal...I was trying to see the old jw-media page
    from the Internet archive page (www.archive.org) and then I got
    this message about robots.txt, that don't allow them to index
    jw media website.

    http://www.jw-media.org/robots.txt

    It was interesting to me, that they disallowed releases...

  • ignored_one
    ignored_one

    Does this mean you couldn't use www.archive.org to show a JW where they've changed those 'for public' Q+As? Like the one about shunning former members.

    -

    Ignored One.

  • izobcenec

Share this

Google+
Pinterest
Reddit