PDA

View Full Version : Cache Cannon - Search Engine friendly + plus archive tool.


buro9
11 Mar 2002, 20:34
This is a first release beta.

I had hoped to finish it before now, but the past week of vb 2.2.3 > 2.2.3b > 2.2.3c > 2.2.4 means that I have lost nearly all my spare time in upgrading boards and re-applying hacks!!! Thanks vb!

Anyhow... the cache cannon is a simple script that will loop through the database and will, in essence, splurge every post onto the docroot in the form of a static file.

In this first beta the functionality of splurging onto the docroot is intact, as is the control panel interface.

Still to be done is indexing the resultant directory structures (for the search engine spiders to follow) and prettying up the resultant file.

This is released here as a first beta only, and is for developers only to aid it's improvement.

I do aim to finish this within the next fortnight, but in light of my lost time though it wiser to just get it out there for the meantime!

Cheers

David K

PS: Should've stated more clearly it's purpose! It's designed to perform this splurging primarily to aid with sites being indexed by search engines. Because they are static files they can be indexed quicker, and you can tweak the format to enhance the hit rate.

A second purpose of it is as a static archive of your board. It you are going to upgrade... want to start over, or are just spring cleaning and would like to archive posts in a static method outside of the database, then this will also serve you.

Edited for new version on 2002-03-20 21:11 GMT

Brian
11 Mar 2002, 22:03
I installed and changed the chmod from 755 to 777 as I am running php in safe mode. However i get the following error:

Please wait... doing page 1
Dumping to ../archive/11335/8679.htm
error opening file

MrLister
12 Mar 2002, 04:18
any demo maybe?

Robert9
12 Mar 2002, 06:02
yeah, demo would be great!

buro9
12 Mar 2002, 06:59
Yeah, where's the demo!

erm... well give me a chance to finish it and a demo will be up.
Probably be my live site...

As there is no indexer yet, the only demo would be looking at the docroot to see what is produced.

I'm currently working on foxserv at home... once I finish the indexer and move to live I shall make available the indexes as the demo.

Again, at this point this beta is for developers and is not a finished hack by any means... core functionality in place... yet to finish indexer, without the indexer this script is pretty useless (no point in dumping files if there's no links between them as nothing for users or spiders to follow).

Brian, re: the 777 permissions... this will grant read/write/execute permissions... are you sure?

755 is bad enough with the global read/execute, does this not work for you?

i'm not running php safe mode, but i shall investigate precisely the implications of this and will provide a switch to accomodate this in the final version.

for the meantime, any developers wishing to install are recommended to try it at home on foxserv first, and then move to their live environment when happy.

Cheers

David K

buro9
12 Mar 2002, 07:11
Brian,

Had a look at safe mode, it shouldn't affect what we're doing.

The chmod I had left in there merely sets the directory permissions. 0755 directory root (read/write/execute), local(read/execute), global (read/execute).

The files were not affected since they are static, we just need permissions to write the file to the folder.

In the PHP manual there was concern from one user that chmod didn't work whilst in safe mode:
http://www.php.net/manual/en/function.chmod.php
But I shall find out for sure.

You should not need to change from 0755 for the purpose of creating directories. Some providers will set 0777 as a security flaw and will block writing to the folder (mine does this), so using 0777 may actually prevent it from working.

Does this still happen under 0755?

Let me know.

Cheers

David K

rawnet
12 Mar 2002, 14:58
Can we use this under a Windows2000 environment or is this *nix only?

buro9
12 Mar 2002, 16:19
You should be fine for use in Windows 2000, since that is what I developed it in.

I unfortunately don't have a dedicated *nix box, so I develop using FoxServ on Windows XP or Windows 2000.

Whilst this means I may have to add tweaks once I get it to my redhat shared server, it will mean that it will always work on Windows environments.

Cheers

David K

jamesdasher
13 Mar 2002, 00:04
I'm excited but for a different reason :-) you are saying that this dumps all posts from a dynamic page into a static page?? If so it sounds like it will give me a start on my Palm/AvantGo related thing that I wanted to mess with ;)

James

buro9
13 Mar 2002, 07:28
Indeed, as static files there would be no reason you couldn't zip and download to a palm, or even format as wml... the process would be built for you, you'd just need to create a new template for it's display!

Wait until we finish it though ;)

eva2000
13 Mar 2002, 07:45
Originally posted by buro9


eva2000, I shall endeavour to make sure that this does not generate files for private forums. This will be perfect for you since entering private forum id's would not be possible, since the files are static. Though it should be noted that as this will generate static files... should you later turn a public forum private then you would have to delete those files manually, hence including the $forumId in the proposed folder structure.

Proposed storage:

The folder structure...

$forumpath/archive/$forumId/$year/$month/$day

For the file names...

$postId.htm

I shall start this on Tuesday next week, and hope to have it finished by Saturday next week (I'd do it sooner, but it's my birthday and this isn't that important!).

The files will be standalone and I shall develop them with vb v2.2.2 though as I shall only be accessing user, post, thread and forum (I guess... I'll have to look at the schema) this should be backwards compatible to at least 2.x boards. Though I will only be supporting the latest version at any time.

If I run into trouble or need assistance with the schema I shall let you know.
i knew there's a reason i should use email notifications :o

how about having the ability to rerun a script to regenerate the static html files if you later change a public forum to private ? you could have a forumid setting in the script which you can edit to either

1. remove all static html files and recreate the static files based on new forumid settings

OR

2. remove only the static html files for the forumid which when private

would be nice to be able to have the option to set the path to where the static html files are to placed

nuno
13 Mar 2002, 12:51
any demos out there?
seems a nice hack :)

buro9
14 Mar 2002, 07:47
OK, one step further... I've sketched out the indexer and shall be trying to apply it early next week (massive party in Taunton this weekend means that I will be too drunk to be let near code of any sort.. generally means I look when I'm sober and spend hours unpicking each line I added!).

The basis of this goes like this:

The directory and file names will follow this naming convention

k_id

Where k = key, it can be one of these:

f = forum
p = post
t= thread
y = year
m = month
d = date, e.g. 31, as in day of month

id = primary key or data.

if k = f, then id = forumid, e.g. f_2 is forum 2
if k = y, then id = year, e.g. y_2002 is year 2002

You get the idea.

This way the indexer can be relatively dumb, but knows that anything before the underscore is the key to the data after the underscore.



The next step is the recursive index proc.

One function will simply index a single directory, based upon the above rules. Templates will be provided for the display end of the index.

Another function will wrap the 'index this directory' function... and that will simply call that function for each subfolder, this in turn is called from the 'index this directory' function, resulting in a recursive indexing.

I don't suppose I explained that well, but I've done it before in TCL and ASP, so I know how it works ;)



Once the indexer is in place, hopefully before the end of next week, I shall ask you all to try it out and see if there are any improvements that could be made at this point. Once we're happy we'll launch version 1.



I have in mind that there will be three versions.

Version 1: Splurge and Index
Version 2: More file management, deletion of pre-indexed items.
Version 3: Possibility for more than one archive, additional templates for views (these allow you to archive for support on other platforms).

A very last feature that I'll leave someone else to consider is the possibility that the Cache Cannon could be fired in single instances when a post is made on the board... thus never requiring firing manually. But without Oracle and Triggers, I'm not sure how clean adding this feature may be... could possibly be a very nasty hack, whereas at the moment it's quite elegant and standalone.

Cheers

David K

buro9
20 Mar 2002, 21:15
New version uploaded now includes indexer.

Please test it and let me know what you think.

I have also activated the auto-redirect stuff, and put all of the meta tags into the headers.

It is pretty ready in this state... but I would like to apply the forumhome default style to all these templates (just the stylesheet and replacement variables) for the sake of prettiness... so if anyone can tell me how the hell to access this stuff i'd be grateful... spent a few hours poking around and haven't conclusively got an answer yet (don't want to just hit replacementset for the -1 default set.).

Cheers

David K

PS: Attachment is on the first post of the thread. I've just updated it.

rawnet
20 Mar 2002, 21:26
David - you are a star.

My FTP has just died, but I'll be giving this a go as soon as I can.

As well as Posts, could you consider writing out Members (along with their related userfields), so people searching for information they've entered in their userfields will come to the boards. It would be useful for me as I have a lot of user fields, but I'm sure others would benefit too!!

fastforward
21 Mar 2002, 05:40
buro9,

Nice work!

I successfully installed this tonight and test archived a few forums. I'm going to edit the templates and during the quiet periods over the weekend I'll start archiving everything.

I'm on an older version of vB so I had to make a few modifications to get the forum select dropdown working and the control panel redirect.

I also changed the .htm extension to .html. Other than that, it worked flawlessly.

Thanks.

buro9
21 Mar 2002, 07:06
Now that the major part of the application is in place, there is no reason why you couldn't add additional functionality such as caching member information too.

It would only take minor changes to add this functionality.... which would probably be best under a folder of it's own (../archive/members/) and then a file for each members details named u_userid, so as an admin you'd likely be u_1.htm.

A seperate button on the cache cannon page would fire caching of members... or it could be tied to the end of the posts firing mechanism.

Indexing would be done at the same time as the rest of the indexing.

Yeah, that all makes sense ;)

I'm not going to start such additions though, not until what is currently there is finished to a higher spec. I'm not one of those developers who never quite finish something before moving on. My work is used widely because I do finish it properly, and I hope that is true of this too.

Remaining on Version 1 is merely the ability to create a single stylesheet *.css file and place a reference to that in each of the created files. This stylesheet I want based upon the default style for the forum... so please, anyone who knows the best way to get hold of this from within a hack... let me know ;)

Once it's created, a change to the amount of data in the posts file (add more, such as user details for who posted), and applying the stylesheets that would've been created.

After that I'll launch version 1.

Again, plan is:

Version 1 = Splurge posts, index them.
Version 2 = File management, flush archive, etc.
Version 3 = Additional templates for prettiness, additional functionality.

That's just the order in which I feel functionality will be desired.
Each version will be a finished hack implementable safely.

Cheers

David K

jamesdasher
22 Mar 2002, 01:35
Two Changes, that have to be made :)

1. It is indexing Private Forums...this has got to be changed, my moderator and administrator forums can be read...I understand it is for indexing but it shouldn't be accessible.

2. Strip Smilies...you really need to strip the smilies out of the pages...they are coming up as missing images.

James

buro9
22 Mar 2002, 07:08
Private forums, I understood that it wasn't, but I shall create a private forum to test this and then stop it from doing so.

Regards the smilies, I was converting them to the images, but it's probably best to strip them totally.

I'll put both of these on my todo list.

Cheers and thanks for the feedback.

David K

rawnet
22 Mar 2002, 09:03
Thanks David. We all appreciate the effort you are putting into perfecting this.

99SIVTEC
24 Mar 2002, 17:53
Please wait... doing page 1

Warning: MkDir failed (Permission denied) in /home/sportcom/public_html/vbulletin/admin/cachecannonfunctions.php on line 44

Warning: MkDir failed (No such file or directory) in /home/sportcom/public_html/vbulletin/admin/cachecannonfunctions.php on line 51

Warning: MkDir failed (No such file or directory) in /home/sportcom/public_html/vbulletin/admin/cachecannonfunctions.php on line 51
Dumping to ../archive/f_19/t_298/p_3335.htm

Warning: fopen("../archive/f_19/t_298/p_3335.htm", "w") - No such file or directory in /home/sportcom/public_html/vbulletin/admin/cachecannonfunctions.php on line 94
error opening file

I got that error after installing and attempting to run. I'm on a linux server and it appears to be permissions problems. What would I do to remedy this?

buro9
24 Mar 2002, 18:13
line 44 is part of the following:

function dumpFile($dirArray,$fileName,$content) {
global $path;

// Path to where ALL dumped files will start
$thisPath = $path;

// Create $path if it does not exist
!file_exists($thisPath);
if (!file_exists($thisPath)) {
mkdir($thisPath,0755);
}

// Create directory structure as per $dirArray
foreach ($dirArray as $dir) {
$thisPath = $thisPath . "/" . $dir;
if (!file_exists($thisPath)) {
mkdir($thisPath,0755);
}
}

// Append .htm if $fileName has no extension
if (!strstr($fileName, ".")) {
$fileName = $fileName . ".htm";
}

// Put togther the full path and file name
$fileName = $thisPath . "/" . $fileName;

// If $content is empty, there's been an error, output
// <!-- NULL --> to show that the cannon has at least
// fired successfully and to prevent the creation of
// zero byte files.
if (strlen($content) < 2) {
$content = "<!-- NULL -->";
}

// Print status
echo "Dumping to ".$fileName."<br>";

// Write file
writeFile($fileName,$content);
}

specifically it's the first mkdir line.
if a directory doesn't exist, it will attempt to create it.
it probably failed on the first iteration of this... that's the 'archive' directory if you kept the defaults.

what this means is that you need to make the 'forum' directory local writeable so that the script can create the directory.

once it's got one created, the others should be fine as it assigns the relevant permissions as it goes.

you can change the permissions on the forum directoy by telneting in and using the chmod command, something like:

chmod 0755 forum

this should solve it

cheers

david k

boatdesign
01 Apr 2002, 00:47
Very nice! I am excited to see how this progresses. Exactly what I was looking for in my attempt to make my forums more search engine friendly yet at the same time not risk overloading the SQL server due to mod rewrite full speed spiders...

fastforward
05 Apr 2002, 23:53
How is this progressing Mr Buro9? I'm waiting patiently. :)

99SIVTEC
08 Apr 2002, 02:16
Awesome job on this hack. I have it installed over at www.sportcompactracing.com/vbulletin/archive/index.htm and at www.haiparts.com/vbulletin/archive/index.htm

The only thing I can suggest is for you to allow us to specify the number of posts to INDEX at once. Mine does fine pulling the threads from the forum, but when it comes to indexing it always times out before it finishes.

JackG
18 Apr 2002, 00:00
Hi, I also get the permission error -
But on my Windows IIS the SYSTEM is given
Full Control. Am i missing something?

Thanks in advance.

buro9
18 Apr 2002, 21:04
Bad news for you all I'm afraid, I've been fighting with safe_mode for the past few weeks and I simply cannot code around it.

Believe me I've tried.

safe_mode is on at my host, and the person I was really making this for also has safe_mode on (though I didn't really appreciate the implications of that!).

Without a way to write files, overwrite them, delete from within PHP and without root permissions this is of no use to me.

As such I'm going to stop development.

It's heavily commented and pretty cleanly laid out (not bad for my first php stab) so if anyone else wants to pick it up, please feel free.

It's good enough to work on all non-safe_mode boxes, and with small tweaks you could add the other features that I had intended.

And if you ever need to to splurge files onto a docroot, it's quite neatly in here ;)

Hope this isn't too much of a blow, but you must appreciate that I've been trying to make this hack for myself, someone here and a few other vb users... all of whom this is useless to.

Though I am considering writing the same thing in PERL and ASP so that we can all have it anyway but not get restricted by a silly option in php.ini

Cheers and thanks for the support... not a bad experience for a first hack, so I'll be tempted to try another one equally ambitious soon ;)

Cheers

David K

http://www.buro9.com/forum/

jamesdasher
18 Apr 2002, 21:51
Hey, buro9 I am not all that good at this kind of thing, but I did like where you were going with it and would like to work with it more. Could you email me what you have?

jamesdasher@wwdb.org

Thanks

James

Learner29
02 May 2002, 05:44
dommage......

it was great to read you guys all over from the very beginning of overgrow's thread to here...

filburt1
08 Jul 2002, 02:06
It's not working for me:
Warning: Unable to jump to row 0 on MySQL result index 6 in d:\web\flatfile\admin\cachecannon.php on line 53
So every single file generated has <!-- NULL --> in it and nothing else.