Network connections
This morning I was attacked by a robot. I quickly noticed that my websites were slow and saw a pretty large amount of traffic on port 80: 208 connections!
This always means you're under attack. I wish there was a way for the firewall to auto-detect and then auto-block such robots. Maybe I'll find a good tool that does that for us at some point.
"Working" connections
So, I looked at the Apache logs. Two of my sites were attacked. I'll show the second one first as it worked better for the bot on that one:
As you can see, the robot went to a few pages and found one with a /comment/reply/... link. A first time on linux.m2osw.com and then on win32.m2osw.com (note that both represent the exact same site.)
At that point, it attempts to POST some random spam messages. All of those failed due to the CAPTCHA and the fact that Drupal forms include an identifier which is required on each POST and each new POST reset the previous identifier so you'd need to re-read the form each time which it doesn't do (surprising, if they know of Drupal, they should know that's required...)
However, notice how fast this goes. The user posts once every few seconds. I have an anti-spam system that would have blocked the robot was it any faster anyway.
When it gets blocked
So... before trying to post on my Linux website (i.e. the one you're reading now!) it hit my http://animals.m2osw.com/ website. That other site got hit over 7,300 times. The reason? The comment pages were refusing the robot with a 403 or a 503. I'm not too sure why he got 403's as you're suppose to be able to access those forms on that system too, although again there's a CAPTCHA.
However, there is a quite interesting side effect to this one. There are 300 lines of the 7,300. Click on the link to open the log.
First of all, the robot checks the home page and attempts a POST. That seems to have worked (the POST failed though, probably because of the CAPTCHA.)
Second it finds the adsense search feature and uses it to search the site.
Then, it checks the home page comment section again (I'm wondering whether the logs were saved in the order I received the requests...) and gets 2 x on the same day and a 3rd time the next day.
Today, all of a sudden, it looks like it switched to a different mode reading many pages at once, including the home page (which was already read once in the first step.)
A few interesting facts:
-
The first 8 requests happened at 09:39:13
-
Most of the others happened within 2 or 3 seconds
-
Fairly quickly it starts getting 503 errors (server not available)
-
The browser designation changes on nearly all accesses
-
Now it starts getting 403 errors (forbidden access)
-
The server continues to return code 200 once in a while so the robot continues...
What I find quite interesting is the browser designation. This makes me think I can write an anti-spam tool that checks that string. If it changes more than once in X seconds, then it's a spammer robot, block that IP address.
I do change the string once in a while to test features such as the iPhone capabilities. But a system that changes more than once within a second cannot be human! It takes me a few seconds between accesses to switch between one and the other, etc.
Oh! And the exact same URLs are being checked over and over again. That's sad. Why would you do that unless you wanted to be detected? Of course, another more complex detection is to notice that the robot reads the data contained in the page but not the linked files such as JavaScript, images, and CSS files.
So... a few things I can work on to better block these attacks.