Comments for YaCy
06 Mar 2006 15:42
Re: YAcY is a badly behaved robot
> 1. YAcY doesnt ask for robots.txt, let
> alone follow it.
> 2. YAcY posts the yacy web address as
> the HTTP Refer[r]er header similar to
> spam bots.
This issues have been resolved for some time now.
27 Feb 2006 17:43
YAcY is a badly behaved robot
1. YAcY doesnt ask for robots.txt, let alone follow it.
2. YAcY posts the yacy web address as the HTTP Refer[r]er header similar to spam bots. Well behaved bots may put their url into the Agent header.
I only came across this project whilst researching against HTTP Referrer spammers, nice idea - shame about the implementation.
Re: YAcY is a badly behaved robot
Both is not true:
1) YaCy respects the robots.txt since mid of 2005, it never ignored robots.txt on purpose. At this time it was simply the first time implemented.
2) There is no referrer spam. YaCy shows that the page was indexed by a YaCy peer. Since the corresponding web page is referenced then not only by this peer, but by all peers, there must be a central address where a referred page must see that it was referenced by a non-centralized web crawler. This is a unique problem that other centralized crawlers do not have. In this case YaCy is just honest an references to the YaCy project page. This feature was removed with YaCy 0.43 because of too many people had been confused with this referrer.