web crawler - What does the plus sign mean in robots.txt? -
for site, want web crawling @ /telecommandes path. it's robots.txt:
user-agent: * disallow: *telecommande++* my questions are:
- what plus-sign mean in case?
- and appropriate crawl url
/telecommandes-box-decodeur.html? respect robots.txt file?
per original robots.txt specification, + has no special meaning in disallow values, , neither has *.
so crawling of /telecommandes-box-decodeur.html allowed.
disallowed be, example, crawling of /*telecommande++*.html (literally).
if want polite, take "proprietary" robots.txt extensions account, e.g., google , other search engines. many authors might not realize these aren’t part of official specification, , expect them work other crawlers.
per google’s robots.txt documentation, + has no special meaning, * has 1 (it means: sequence of characters).
so crawling of /telecommandes-box-decodeur.html still allowed.
disallowed be, example, crawling of /foo/telecommande++bar.html (and still /*telecommande++*.html).
Comments
Post a Comment