web crawler - What does the plus sign mean in robots.txt? -


for site, want web crawling @ /telecommandes path. it's robots.txt:

user-agent: *  disallow: *telecommande++* 

my questions are:

  • what plus-sign mean in case?
  • and appropriate crawl url /telecommandes-box-decodeur.html? respect robots.txt file?

per original robots.txt specification, + has no special meaning in disallow values, , neither has *.

so crawling of /telecommandes-box-decodeur.html allowed.

disallowed be, example, crawling of /*telecommande++*.html (literally).


if want polite, take "proprietary" robots.txt extensions account, e.g., google , other search engines. many authors might not realize these aren’t part of official specification, , expect them work other crawlers.

per google’s robots.txt documentation, + has no special meaning, * has 1 (it means: sequence of characters).

so crawling of /telecommandes-box-decodeur.html still allowed.

disallowed be, example, crawling of /foo/telecommande++bar.html (and still /*telecommande++*.html).


Comments

Popular posts from this blog

Command prompt result in label. Python 2.7 -

javascript - How do I use URL parameters to change link href on page? -

amazon web services - AWS Route53 Trying To Get Site To Resolve To www -