Check for fake googlebot scrapers

I noticed a bot scraping using fake GoogleBot useragent string.

Here is a one liner that can detect the IPs to ban:

$ awk 'tolower($0) ~ /googlebot/ {print $1}' /var/www/httpd/access_log | grep -v 66.249.71. | sort | uniq -c | sort -n

It does a case-insensitive awk search for keyword "googlebot" from apache log file removing IPs with "66.249.71." which belongs to google and prints the output in a sorted hit count.

You can validate the IPs with:

IP= ; reverse=$(dig -x $IP +short | grep ; ip=$(dig $reverse +short) ; [ "$IP" = "$ip" ] && echo $IP GOOD || echo $IP FAKE

Replace the IP value with the one you want to check.

Google Tricks and hacks by d00m is undoubtedly the most popular search engine in the world. It offers multiple search features like the ability to search images and news groups.However it's true power lies in it's powerful commands that can be used and misused.I am writing this article on the basis of my experience using google and trying out ideas when i am bored.Now enough of lecturing...let's get

down to business.)

Syndicate content