web crawler - What does the plus sign mean in robots.txt? -

April 15, 2012

for site, want web crawling @ /telecommandes path. it's robots.txt:

user-agent: *  disallow: *telecommande++*

my questions are:

what plus-sign mean in case?
and appropriate crawl url /telecommandes-box-decodeur.html? respect robots.txt file?

per original robots.txt specification, + has no special meaning in disallow values, , neither has *.

so crawling of /telecommandes-box-decodeur.html allowed.

disallowed be, example, crawling of /*telecommande++*.html (literally).

if want polite, take "proprietary" robots.txt extensions account, e.g., google , other search engines. many authors might not realize these aren’t part of official specification, , expect them work other crawlers.

per google’s robots.txt documentation, + has no special meaning, * has 1 (it means: sequence of characters).

so crawling of /telecommandes-box-decodeur.html still allowed.

disallowed be, example, crawling of /foo/telecommande++bar.html (and still /*telecommande++*.html).

Search This Blog

MOno

web crawler - What does the plus sign mean in robots.txt? -

Comments

Post a Comment

Popular posts from this blog

'hasOwnProperty' in javascript -

python - ValueError: No axis named 1 for object type <class 'pandas.core.series.Series'> -

java - How to implement an entity bound odata action in olingo v4.3 -