On 09/14/2009 12:01 PM, Plato L. Smith II wrote:
>
> FSU has been using the etd-db platform for all FSU ETDs 2003-Present. At
> the ETD 2009 conference in Pittsburgh in June 2009, a colleague
> expressed concerns of Google crawling, accessing, and caching etds that
> were restricted (i.e. either campus-community only or embargoed) using
> the etd-db platform developed by Virginia Tech. FSU has experienced
>
By way of preface, I should say that I consider easy findability an essential
part of the move to ETD publication. So I'm pro-crawler. Also very pro-OAI
(hints and nudges to several of y'all).
If a restricted ETD's text is truly unavailable to non-FSU users--protected by
IP checking or password accees--Google will have no way to download, store, and
index it. So that's a non-issue, unless I'm misunderstanding it.
> Has any institution encountered issues of Google crawls of restricted
> etds using the etd-db software platform? If so, could you please share
> troubleshoot and response assist in determining if this may be a bug
> with the software (i.e. restricted not preventing Google crawls and/or
> if “norobot.txt” has to be applied in addition to but not in lieu of
> etd-db campus-community/embargoed restrictions directories).
Since you (and your usage statistics) probably don't want Google downloading
the same 2004 thesis every week, I recommend you set up a sitemap.xml file,
which will give you more control over browser behavior that just robots.txt by
itself. <http://www.sitemaps.org/protocol.php>
Thomas Dowling
[log in to unmask]
|