Site Security Keeping Your Community Safe from Hackers and Other Unwelcome Visitors.

Reply
 
Thread Tools

  #1  
Old 05-27-2007, 10:27 AM
blackjack blackjack is offline
Tazmanian
 
Join Date: May 2006
Admin Experience: Intermediate
Location: Wiltshire UK
Posts: 124
blackjack is on a distinguished road
Default twiceler Robot
Ive seen this robot around my forum quite regularly now for the past week, is it anything to worry about? a quick google search showed it as an experimental bot... what does that mean exactly?
__________________
BOOYAH! WEB HOSTING
Reply With Quote
  #2  
Old 05-30-2007, 06:59 PM
Big_doug Big_doug is offline
TAZ Rookie
 
Real Name: Doug
Join Date: May 2007
Admin Experience: Intermediate
Location: Kent UK
Posts: 4
Big_doug is on a distinguished road
Default Blocking Twiceler
Twiceler is a badbot

Twiceler has been rampaging on one of the sites I administer. It leeched enormous amounts of bandwidth, nearly 2Gb this month until it was blocked. (It visited nearly 70,000 times!)

Twicler does not obey normal robot txt commands and can only be blocked by denying access in the .htaccess file. (Be careful with this file, and back up the exisiting code for it)

This site has a good tutorial for blocking robots

http://www.clockwatchers.com/htaccess_block.html

Twicler was using variations of an IP of address starting with 38.99

Inserting this code in the .htaccess file blocked it from leeching.

order allow,deny
deny from 38.99
allow from all

Other IP addresses can be blocked this way


Twiceler caused major bandwidth leeching on numerous sites one of my sons adminsters. In one case it blocked a site by using up all the bandwidth. He had to insert the above code was inserted in 40+ other sites to prevent bandwidth leeching.

I hope this is of help.

Doug
Reply With Quote
  #3  
Old 06-01-2007, 04:17 AM
blackjack blackjack is offline
Tazmanian
 
Join Date: May 2006
Admin Experience: Intermediate
Location: Wiltshire UK
Posts: 124
blackjack is on a distinguished road
Default
oh boy!
thanks for the heads up, i will continue to monitor the 'visits' and my bandwidth carefully and let my host know too.
I will try and work out how to block them in the .htaccess file. a little scary for me!
many thanks.
__________________
BOOYAH! WEB HOSTING
Reply With Quote
  #4  
Old 06-02-2007, 10:29 AM
Big_doug Big_doug is offline
TAZ Rookie
 
Real Name: Doug
Join Date: May 2007
Admin Experience: Intermediate
Location: Kent UK
Posts: 4
Big_doug is on a distinguished road
Default
Twiceler has come back this morning after 2 days rest with an alternative IP address starting 64.1
------------------------------
IP address record info for the current attack -
http://whois.domaintools.com/64.1.215.164

-------------------------------

This code in the .htaccess file will block it, but also anyone using that IP address properly, however I don't think there will be that many ;-) You just need to vary the first two blocks of the IP address if it changes.

order allow,deny
deny from 38.99
deny from 64.1
allow from all

It does mean that you have to periodically check your site visitors.
So far I only know of Twiceler using IP addresses starting with 38 and 64

If your server allows it, the following code in .htaccess will block it and other user agents. (It will also stop EmailSiphon & Exabot as shown) You can insert the code for other unwanted agents that are indexing your site!
It is not IP address dependent.

RewriteCond %{HTTP_USER_AGENT} EmailSiphon
RewriteRule .* - [F,L]
RewriteCond %{HTTP_USER_AGENT} Exabot
RewriteRule .* - [F,L]
RewriteCond %{HTTP_USER_AGENT} Twiceler
RewriteRule .* - [F,L]
Options FollowSymLinks
RewriteEngine On
RewriteBase /


Doug
Reply With Quote
  #5  
Old 06-29-2007, 11:56 AM
lcx lcx is offline
TAZ Rookie
 
Real Name: Cristian Livadaru
Join Date: Jun 2007
Posts: 4
lcx is on a distinguished road
Default
these are the IP's I found in my logs

208.36.144.10
208.36.144.7
208.36.144.8
38.99.13.123
38.99.13.124
38.99.13.125
38.99.13.126
64.1.215.163
64.1.215.164
64.1.215.165

I blocked them all by firewall
Reply With Quote
  #6  
Old 06-29-2007, 12:12 PM
julia44's Avatar
julia44 julia44 is offline
Tazmanian
 
Real Name: What's It To Ya?
Join Date: Feb 2007
Admin Experience: Intermediate
Age: 32
Posts: 285
julia44 is on a distinguished road
Default
I'm going to sound like a idiot. But how do you learn anything if you don't ask. Where do I go to check my logs of who has visited so I can start seeing if this bot has visited my sites.
Reply With Quote
  #7  
Old 06-30-2007, 05:21 AM
lcx lcx is offline
TAZ Rookie
 
Real Name: Cristian Livadaru
Join Date: Jun 2007
Posts: 4
lcx is on a distinguished road
Default
well this depends on your server and OS.
give us some more information and then we can help
Reply With Quote
  #8  
Old 06-30-2007, 08:28 AM
minstrel's Avatar
minstrel minstrel is offline
Tazmanian
 
Real Name: David
Join Date: Jun 2006
Admin Experience: Advanced
Location: Ottawa
Posts: 5,000
minstrel is a name known to allminstrel is a name known to allminstrel is a name known to allminstrel is a name known to allminstrel is a name known to allminstrel is a name known to all
Default
Julia, look to see what statistics package your host offers - common ones are AWSTATS and Webalizer.

If you access those packages (perhaps through cPanel), you should be able to see the "robots" who have visited your site in order of frequency and their IP addresses.
Reply With Quote
  #9  
Old 06-30-2007, 10:07 AM
julia44's Avatar
julia44 julia44 is offline
Tazmanian
 
Real Name: What's It To Ya?
Join Date: Feb 2007
Admin Experience: Intermediate
Age: 32
Posts: 285
julia44 is on a distinguished road
Default
took a few but I did see the logs in my awstats finally Thanks guys!
Reply With Quote
  #10  
Old 08-22-2007, 05:34 AM
pingu's Avatar
pingu pingu is offline
Stealin ur Bandwidth
 
Real Name: Rick
Join Date: Jul 2006
Admin Experience: Advanced
Posts: 179
pingu is on a distinguished road
Default
the .htaccess method seems to stop it. My forum had several thousand connections from this piece of crap last night and it killed my database server...

for thos einterested its, allegidly, pioneering a new approach to Search... yeah by simulating a DOS attack..

contact details for the muppets that designed it are:

Cuill, Inc.
66 Willow Place
Menlo Park, Ca 94025
(650) 325 1701 Office
(650) 325 1702 Fax


and the whois entry is

Registrant:
Tom Costello
Tom Costello
1127 Thorntree CT
San Jose CA 95120
US
Email: costello@cs.stanford.edu
Registrar Name....: REGISTER.COM INC.
Registrar Whois...: whois.register.com
Registrar Homepage: www.register.com
Domain Name: cuill.com
Created on..............: Thu Apr 07 2005
Expires on..............: Mon Apr 07 2008
Record last updated on..: Mon Feb 27 2006
Administrative Contact:
Tom Costello
Tom Costello
1127 Thorntree CT
San Jose CA 95120
US
Phone: (408) 323-1065
Email: costello@cs.stanford.edu
Technical Contact:
Register.Com
Domain Registrar
575 8th Avenue 11th Floor
New York NY 10018
US
Phone: 1-902-7492701
Email: domain-registrar@register.com
DNS Servers:
dns23.register.com
dns24.register.com
Visit AboutUs.org for more information about cuill.com
http://www.aboutus.org/cuill.com"AboutUs: cuill.com
Register your domain name at http://www.register.com


so compaining to stanford university may be in order too...
__________________
YOU ARE IN ERROR. NO ONE IS SCREAMING. THANK YOU FOR YOUR COOPERATION.

Stay alert! Trust no one! Keep your laser handy!
Reply With Quote
  #11  
Old 08-22-2007, 06:34 AM
Hawke's Avatar
Hawke Hawke is offline
TAZ Rookie
 
Real Name: Caoimhin
Join Date: May 2007
Admin Experience: Intermediate
Location: Dublin
Posts: 28
Hawke is on a distinguished road
Default
Twiceler Robot is an experiemental bot designed by Cuill Inc. and many webmasters would see this bot skimming through their sites on a frequent basis. I think it was designed for the Cuill search engine (which is claimed to be made better than Google as it is taking on a different strategy)

It was also designed by ex-employees at Google, (I think management positions) That's the most I know about it. Was not worried about it, as it has been at my site for two months now, but I guess I should do some more research on this.
__________________
http://www.worldsunited.net
Reply With Quote
  #12  
Old 08-22-2007, 12:49 PM
pingu's Avatar
pingu pingu is offline
Stealin ur Bandwidth
 
Real Name: Rick
Join Date: Jul 2006
Admin Experience: Advanced
Posts: 179
pingu is on a distinguished road
Default
interestingly I had a response from the company..

fair dooos for replying but I wish they would do their testing in a lab rather than on live sites

Quote:
Twiceler is an experimental crawler that we are developing for our new search engine.
It is important to us that it obey robots.txt, and that it not crawl sites that do not
wish to be crawled. If you wish I will glad to add your site to our list of sites
to exclude, but I need you to tell the site name to block as email return addresses
frequently from the domains that wish to be blocked.

I apologize for any inconvenience this has caused you.
Please feel free to contact me if you have any further questions.

Sincerely,

James Akers
Operations Engineer
Cuill, Inc.
__________________
YOU ARE IN ERROR. NO ONE IS SCREAMING. THANK YOU FOR YOUR COOPERATION.

Stay alert! Trust no one! Keep your laser handy!
Reply With Quote
  #13  
Old 08-31-2007, 09:40 AM
mc1457 mc1457 is offline
TAZ Rookie
 
Join Date: Aug 2007
Posts: 1
mc1457 is on a distinguished road
Default
That's BS...I have no issues with any other crawler. Twiceler has DOS'd my server SEVERAL times in the past two weeks. So I need to block ALL bots from my site to keep THEM from bringing it to its knees? Or ask them to please not DOS my server every morning. Bull @#$#.

The 38.99... addresses belong to Cogent. The abuse address is abuse@cogentco.com.

64.1... and 208.36... belong to XO Communications. Abuse address is abuse@xo.com.

Suggest you all send an email to all of these abuse addresses detailing the abusive traffic and cc contact@cuill.com.

It makes me furious that this ass-clown makes it OUR problem for his bots poor behavior. Maybe he should throttle the requests so he doesn't slam our servers. But that would take actual software architecture and programming skills...

I emailed Cuill this morning. As soon as I hear back I plan to complain to their ISPs and include the response if I don't get some assurance that they acknowledge this is their problem and needs to be fixed.
Reply With Quote
  #14  
Old 08-31-2007, 11:39 AM
minstrel's Avatar
minstrel minstrel is offline
Tazmanian
 
Real Name: David
Join Date: Jun 2006
Admin Experience: Advanced
Location: Ottawa
Posts: 5,000
minstrel is a name known to allminstrel is a name known to allminstrel is a name known to allminstrel is a name known to allminstrel is a name known to allminstrel is a name known to all
Default
I get occasional visits from twiceler showing up in my logs but I'm certainly not being pounded - Yahoo Slurp is much more greedy ins spidering on my sites.

That said, if I had your problem, I'd be complaining to any and all I could find as well. Start by taking them up on this statement, with copies to the ISPs you see sending the bot:

Quote:
If you wish I will glad to add your site to our list of sites to exclude, but I need you to tell the site name to block as email return addresses frequently from the domains that wish to be blocked.
Then you have some documentation that your "cease and desist" request has been ignored if it continues.
Reply With Quote
  #15  
Old 08-31-2007, 11:58 AM
pingu's Avatar
pingu pingu is offline
Stealin ur Bandwidth
 
Real Name: Rick
Join Date: Jul 2006
Admin Experience: Advanced
Posts: 179
pingu is on a distinguished road
Default
Quote:
Originally Posted by mc1457 View Post
That's BS...I have no issues with any other crawler. Twiceler has DOS'd my server SEVERAL times in the past two weeks. So I need to block ALL bots from my site to keep THEM from bringing it to its knees? Or ask them to please not DOS my server every morning. Bull @#$#.

The 38.99... addresses belong to Cogent. The abuse address is abuse@cogentco.com.

64.1... and 208.36... belong to XO Communications. Abuse address is abuse@xo.com.

Suggest you all send an email to all of these abuse addresses detailing the abusive traffic and cc contact@cuill.com.

It makes me furious that this ass-clown makes it OUR problem for his bots poor behavior. Maybe he should throttle the requests so he doesn't slam our servers. But that would take actual software architecture and programming skills...

I emailed Cuill this morning. As soon as I hear back I plan to complain to their ISPs and include the response if I don't get some assurance that they acknowledge this is their problem and needs to be fixed.

no need to block all bots. the .htaccess file way stops it (for now)
__________________
YOU ARE IN ERROR. NO ONE IS SCREAMING. THANK YOU FOR YOUR COOPERATION.

Stay alert! Trust no one! Keep your laser handy!
Reply With Quote
  #16  
Old 12-15-2007, 10:03 PM
prepress_forums prepress_forums is offline
TAZ Rookie
 
Real Name: samuel
Join Date: Dec 2007
Posts: 21
prepress_forums is on a distinguished road
Default
Experimental Robot? Experimenting in what? The bot was logged going around IP blocks in .htaccess by just rotating to another IP address. They list 22 different IP addresses on their site.

These guys are saying on another site that they will obey robots.txt after 7 days? WTF, the site says the owners are Ex Google Folks. So, they know very well what they are doing. They have no search capability for a human visitor to search their results... So, they offer the webmaster no traffic at all that is legitimate visitors. Why are they beating the heck out of our sites from 22 servers that autohack around an IP block and ignore a robots.txt?

If you email the admin, they will stop visiting. But what kind of protocol is that?? This should be shut down.
Reply With Quote
  #17  
Old 02-21-2008, 01:03 AM
islandgirl islandgirl is offline
TAZ Rookie
 
Real Name: Lilo
Join Date: Feb 2008
Admin Experience: Intermediate
Posts: 1
islandgirl is on a distinguished road
Default
First post around here. I googled cuill.com and ended up here. Thanks for the .htaccess assistance, Big_doug.

I have noticed this bot around my various logs lately.

Since when does a search engine harvest email addresses or try to do injections on a database driven site? My suspicions are now warranted that this is spam activity, not a legitimate startup:

/var/log/httpd/access_log:64.1.215.166 - - [20/Feb/2008:12:38:40 -0600] "GET /directory/weblist.php?cat=1 HTTP/1.0" 200 8223 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)"
/var/log/httpd/access_log:64.1.215.166 - - [20/Feb/2008:12:48:45 -0600] "GET /directory/weblist.php?cat=2 HTTP/1.0" 200 9459 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)"
/var/log/httpd/access_log:64.1.215.166 - - [20/Feb/2008:14:45:09 -0600] "GET /directory/weblist.php?cat=3 HTTP/1.0" 200 8443 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)"
/var/log/httpd/access_log:64.1.215.166 - - [20/Feb/2008:14:54:42 -0600] "GET /directory/'mailto:someemailaddy@aol.com' HTTP/1.0" 404 8519 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)"
/var/log/httpd/access_log:64.1.215.166 - - [20/Feb/2008:21:00:35 -0600] "GET /directory/'list.php?id=34' HTTP/1.0" 404 8519 "-" "Mozilla/5.0 (Twiceler-0.9 http://www.cuill.com/twiceler/robot.html)"
Reply With Quote
  #18  
Old 06-10-2008, 09:01 AM
amoona amoona is offline
TAZ Rookie
 
Join Date: May 2008
Admin Experience: Beginner
Posts: 13
amoona is on a distinguished road
Default
actually, I don't read my logs very often. Hmm~~seems I need to take some time and check it from now on.
Reply With Quote
  #19  
Old 07-28-2008, 04:57 PM
Big_doug Big_doug is offline
TAZ Rookie
 
Real Name: Doug
Join Date: May 2007
Admin Experience: Intermediate
Location: Kent UK
Posts: 4
Big_doug is on a distinguished road
Default
Twiceler update

For all of you, including me, who have suffered from the 'rampaging' twiceler robot, an interesting bit of news! ;-)

Big_doug

Cuil the search engine who are responsible for the Twiceler Robot is available to use. I typed Twiceler into the search box on Cuil which gave 34,127,017 results for twiceler. After a few pages of results, this message was displayed!

"No results because of high load...

Due to excessive load, our servers didn't return results. Please try your search again."

Webmaster Info for Cuil -

http://www.cuil.com/info/webmaster_info/

This gives the following information -

Webmaster Info

Cuil is the biggest search engine on the planet. In our quest to let users search as much of the Internet as possible, Cuil has indexed more than 120 billion pages so far.

If you would like Cuil to crawl your site and have it included in our index, please let us know.

Twiceler is the name of our robot Web crawler. The user-agent is “twiceler.” We understand that many small sites are bandwidth-limited, so we support the robots.txt Crawl-delay directive. You can read about robots.txt at robotstxt.org and there is a simple generator of the file at mcanerin.com.

If you have modified your robots.txt file for Twiceler, it may take several days for us to re-read the file. If you need something blocked right away, please let us know.

Got a Twiceler question? If you have questions or concerns about Twiceler you can contact Jim. Jim’s the guy who keeps track of Twiceler, when he’s not busy with his horses.

If you would prefer that we not crawl your site at all we are happy to oblige. Just drop Jim a note to that effect and he will place your site or IP address on our do-not-crawl list. Be sure to be explicit about the site to block as email address domains frequently differ from the site in question.

Occasionally, we have seen other Web crawling robots masquerading as Twiceler. You can be sure it’s Cuil crawling your site if the robot has one of the following IP addresses:

38.99.13.121 38.99.44.101 64.1.215.166 208.36.144.6
38.99.13.122 38.99.44.102 64.1.215.162 208.36.144.7
38.99.13.123 38.99.44.103 64.1.215.163 208.36.144.8
38.99.13.124 38.99.44.104 64.1.215.164 208.36.144.9
38.99.13.125 38.99.44.105 64.1.215.165 208.36.144.10
38.99.13.126 38.99.44.106

To all those who have contacted us to let us know that they are happy to have their site included in a Web index for the first time, thank you for being a part of the biggest search engine on the Web—Cuil!
Reply With Quote
  #20  
Old 07-28-2008, 09:34 PM
minstrel's Avatar
minstrel minstrel is offline
Tazmanian
 
Real Name: David
Join Date: Jun 2006
Admin Experience: Advanced
Location: Ottawa
Posts: 5,000
minstrel is a name known to allminstrel is a name known to allminstrel is a name known to allminstrel is a name known to allminstrel is a name known to allminstrel is a name known to all
Default
Isn't Cuil associated with Google?
Reply With Quote
Reply

  Admin Zone Forums > The Community Zone > Managing an Online Community > Site Security





Currently Active Users Viewing this Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off
Forum Jump

Similar Threads
Thread Thread Starter Forum Replies Last Post
Robot members? Ricci Members & Staff 6 03-09-2006 08:39 AM
lwp-trivial/1.41 robot LetheDigital Site Security 29 01-18-2005 02:04 PM
robot.txt for vbulletin immotive vBulletin 9 11-13-2004 08:43 PM
writing robot.txt Scribbller Community Organization 3 10-20-2004 02:40 AM
robot.txt Scribbller Community Organization 1 08-30-2004 10:09 AM


 

All times are GMT -4. The time now is 06:22 AM.


Powered by: vBulletin
Copyright ©2000 - 2014, Jelsoft Enterprises Ltd.
Page generated in 0.11003304 seconds with 14 queries
The Admin Zone copyright 2003-2014 All Rights Reserved. Content published on The Admin Zone requires permission for reprint.