PDA

View Full Version : robots.txt Manager


MUG
08 Feb 2003, 20:59
This script allows you to easily create a dynamically generated robots.txt file, based on specified rules.

If you use this hack, please click 'Install' :)

Screenshots will be attached...

MUG
08 Feb 2003, 21:00
Control Panel

MUG
08 Feb 2003, 21:01
Editor (All Robots)

MUG
08 Feb 2003, 21:01
Editor (Specific Robot)

MUG
08 Feb 2003, 21:01
Generated File (:banana:)

Neo
08 Feb 2003, 21:08
Excellent G.

Dean C
08 Feb 2003, 21:31
Wow very nice!

wooolF[RM]
08 Feb 2003, 22:04
very good job, tho' I can edit one txt file by hands :) No offence, again, very good job :)

May I ask you if you have the list with IPs of some major/all search engines? I kinda need it :) Thanx a lot and again, a nice hack!

MUG
08 Feb 2003, 22:07
The point of the hack is to make administration easier. It keeps track of robots requesting the robots.txt file, allowing you to ban or restrict a bot without having to dig through the server logs. I wrote the hack today, so the only bots included in the database already are the ones that spidered my site during that time. The list is in the .sql file.

wooolF[RM]
08 Feb 2003, 22:37
thanx for the answer :)

djr
08 Feb 2003, 22:50
Is it suppose to write a new robots.txt file everytime or do the bots see the robots.php file as robots.txt?

If your answer is it's suppose to write a new robots.txt file, it isn't working for me :-(

And: do I still need a robots.txt file?

MUG
08 Feb 2003, 23:06
Originally posted by djr
Is it suppose to write a new robots.txt file everytime or do the bots see the robots.php file as robots.txt?

If your answer is it's suppose to write a new robots.txt file, it isn't working for me :-(

And: do I still need a robots.txt file? It uses mod_rewrite to send requests to robots.php. You have to create an .htaccess file with the following:

RewriteEngine on
RewriteRule robots.txt /robots.php (note: this is for the old version. read the new install file :))

Upload robots.php to the root web directory (usually public_html). Make sure you run robots.sql using phpMyAdmin. :)

djr
08 Feb 2003, 23:33
I did that already ;)
So the robots are redirected to robots.php which is feeding them a perfectly rendered robots.txt file? Sorry 'bout asking, but I don't want to break my (high) ranking(s).

- djr

MUG
09 Feb 2003, 00:10
Originally posted by djr
I did that already ;)
So the robots are redirected to robots.php which is feeding them a perfectly rendered robots.txt file? Sorry 'bout asking, but I don't want to break my (high) ranking(s).

- djr Yup. The only difference is the X-Powered-By header generated automatically by PHP.

BigCheeze
09 Feb 2003, 00:16
Thanks! I just installed it. See if I can control those bot's a little more!!

SphereX
09 Feb 2003, 04:09
very nice!


***installs

djr
09 Feb 2003, 11:51
Hi MUG,

Can you add another column 'Owner' and 'Origin' (or whatever you might want to call it) where we can add the owner and origin of the spider?

For example:

googlebot | Googlebot/2.1 (+http://www.googlebot.com/bot.html) | 216.239.46.19 | Google | http://www.google.com | 4 Edit - Delete |


Not every spider describes itself fully. e.g. Mercator-2.0 is one of Altavista's robots, but there's no link to Altavista whatsoever.

Thanks,
- djr

djr
09 Feb 2003, 11:54
I found some good overviews of spiders here (http://www.robotstxt.org/wc/active/html/index.html) and here (http://www.devmag.net/suchmaschinen/robots_namen.htm). If anyone has more of these lists, please add them to this thread.

Thanks,
- djr

MUG
09 Feb 2003, 12:24
Originally posted by djr
Hi MUG,

Can you add another column 'Owner' and 'Origin' (or whatever you might want to call it) where we can add the owner and origin of the spider?

For example:

googlebot | Googlebot/2.1 (+http://www.googlebot.com/bot.html) | 216.239.46.19 | Google | http://www.google.com | 4 Edit - Delete |


Not every spider describes itself fully. e.g. Mercator-2.0 is one of Altavista's robots, but there's no link to Altavista whatsoever.

Thanks,
- djr Ooh, thanks. I was wondering what Mercator-2.0 was. :paranoid:

I'll add a description field, but there's not enough room for it to show on the main page so you'll have to click edit to view it.

MUG
09 Feb 2003, 12:40
Version 1.0 final released. :pirate:

MUG
09 Feb 2003, 13:26
Can this thread be moved to the Full Releases forum?

Velocd
09 Feb 2003, 18:31
I have a slight problem with googlebots, and that is they storm my forum by huge numbers. Currently, for example, I have 7 googlebots crawling my forum. That seems purely excessive to me, and I would like to somehow limit the amount of googlebots to maybe 2.

What is the command line for robots.txt to do this? Or maybe there is some other alternate method.

Thanks ;)

MUG
09 Feb 2003, 18:42
Originally posted by Velocd
I have a slight problem with googlebots, and that is they storm my forum by huge numbers. Currently, for example, I have 7 googlebots crawling my forum. That seems purely excessive to me, and I would like to somehow limit the amount of googlebots to maybe 2.

What is the command line for robots.txt to do this? Or maybe there is some other alternate method.

Thanks ;) Honestly, I don't think that is possible with robots.txt. If you created something that would dynamically insert text into a robots.txt file based on the number of Googlebots spidering your site, Google might "take the hint" and never come back. :ermm:

Velocd
10 Feb 2003, 03:43
Drat.. :ermm:

Wish it were possible somehow, oh well. My current bandwidth is being consumed quicly by these googlebots, so I guess I'll simply have to restrict them from the threads.

Automated
10 Feb 2003, 13:40
Originally posted by Velocd
Drat.. :ermm:

Wish it were possible somehow, oh well. My current bandwidth is being consumed quicly by these googlebots, so I guess I'll simply have to restrict them from the threads.

restricting them from the threads :confused: whats the point of getting spidered then ?

djr
11 Feb 2003, 22:04
We have two different domains, but only one MySQL-database. Is it possible to place the robots.php on both the domains (and thus using the same tables)?

- djr

djr
13 Feb 2003, 10:46
Already found it. Just rename the robots_log table to robots_log_domain1 and create another one with _domain2 and update changes in robots.php.

- djr

mheinemann
16 Feb 2003, 16:48
Installed, works great!

MUG
17 Feb 2003, 00:54
Glad that you like it. :cool:

Any suggestions? :)

mheinemann
17 Feb 2003, 14:19
The only suggestion I can think of is being able to import your current robots.txt

I had disallowed "turnitin" and would like to be able to still block them.

MUG
17 Feb 2003, 18:56
Originally posted by mheinemann
I had disallowed "turnitin" and would like to be able to still block them. I thought that I already included TurnitinBot in the .sql file?
Originally posted by mheinemann
The only suggestion I can think of is being able to import your current robots.txt
Good idea... it shouldn't be too hard to implement. :)

mheinemann
02 Mar 2003, 01:47
And maybe being able to manually edit it as well.

stryka
09 Mar 2003, 03:18
My current robots.txt file is not being overwritten when I click submit?

I made the changes to the .htaccess file... is there anything else i should look @ ??

Thanx

MUG
12 Apr 2003, 16:05
1.1 Beta released. It includes the following bug fixes / additions:[list=1] Stripping of comments from generated file (although it is in the robots.txt specification, some bots choke on them)
Repairs newlines in generated file (old version sometimes produced \r\r\n)
Cleaner interface for control panel
Several other things I can't remember :confused:[/list=1]

Mickie D
29 May 2003, 14:36
thanks for this hack its very useful and i think it should be a full release :)

i have one problem and it has nothing to do with your script :)

where can i find info on what bots i should ban i never had turnit in bot banned b4 .. why is that bot bad ???

PixelFx
29 May 2003, 18:34
very cool, now I don't need to do this manually all the time ;) thank you for sharing :)

stryka
30 Jul 2003, 23:00
I get an error after i updated to 1.1

Fatal error: db_connect(): Failed opening required '' (include_path='') in /home/name/public_html/robots.php on line 63

MUG
30 Jul 2003, 23:38
Did you change $vB_Config_Path to the correct path?

daFish
07 Oct 2003, 10:06
Great addition.
Ar their any plans for a new version?

-Fish

sabret00the
04 Nov 2003, 21:19
nice little hack this :)

gmarik
12 Nov 2003, 20:00
.htaccess error

I wanted to have "club/rules"

RewriteEngine On
Options +FollowSymlinks
RewriteBase /
RewriteRule ^club/rules\/?$ /announcement.php?s=&forumid=10 but this does not works, other rewrites work, but I could not make it either with forum names (club/f12, club/f45 taking the ID's)

peterska2
01 Dec 2003, 15:00
I like this idea but how do I create a .htaccess file?

Is it just an extension like .txt or .php ?

Or is it something completely different?

*blushes* Still a n00b to this *blushes*