web crawler - What does the plus sign mean in robots.txt? -
for site, want web crawling @ /telecommandes
path. it's robots.txt:
user-agent: * disallow: *telecommande++*
my questions are:
- what plus-sign mean in case?
- and appropriate crawl url
/telecommandes-box-decodeur.html
? respect robots.txt file?
per original robots.txt specification, +
has no special meaning in disallow
values, , neither has *
.
so crawling of /telecommandes-box-decodeur.html
allowed.
disallowed be, example, crawling of /*telecommande++*.html
(literally).
if want polite, take "proprietary" robots.txt extensions account, e.g., google , other search engines. many authors might not realize these aren’t part of official specification, , expect them work other crawlers.
per google’s robots.txt documentation, +
has no special meaning, *
has 1 (it means: sequence of characters).
so crawling of /telecommandes-box-decodeur.html
still allowed.
disallowed be, example, crawling of /foo/telecommande++bar.html
(and still /*telecommande++*.html
).
Comments
Post a Comment