PDA

View Full Version : vbSpiderFriend - Search Engine Friendliness


Pages : [1] 2

Overgrow
30 Apr 2001, 22:27
I am tired of my 200,000 posts not being listed in Google. I was inspired by phpbuilder.com this morning and I wrote:

vbSpiderFriend - the search engine indexer for all of your posts

Purpose: Allow search engine spiders to crawl a linked list of all of your posts.

Project Requirements:

-Friendly URLs (no query strings)
-Good dynamic meta tags
-Never have to touch the script again.. It is Y3K compliant, simply re-submit to the engines to update your listings

Install Requirements:

-vBulletin 1.x or 2.x
-about 10 minutes


1) Download the attached Zip.

2) Open class.mysql.php and put your database login info at the top.

3) Create a new directory called archive under your forum, like /forum/archive

4) Open the included .htaccess and change the Error 404 to your new archive path.

5) Open index.php and change the self-explanatory variables at the top of the file.

6) Upload all 3 files to your archive directory.

7) Submit (http://128.121.225.20/submit/index.html) /forum/archive/index.php to search engines and watch em crawl


DISCLAIMER: I don't use 2.x but I checked the schema and this should work fine.

NOTES: This uses ErrorDocument and query string parsing to get the variables needed. I do not have the time or energy to troubleshoot this if it does not work on your server. Sorry!

Overgrow
30 Apr 2001, 22:29
...

VERSION 1.0

NEW VERSION, May 21 9:38 am PST

Upgrading: Unzip and copy your variables from the top of the old index.php to the new index.php.


FIXES:

v1.1b release

-That forwarding scheme added in 1.0b is considered 'cloaking' by Google so it has been turned off by default. If you wish to enable it, make $refresh=1 in the top options.

-Added a new link at the top saying that this is the text-only version, click for the real thing. Idea by robertusss.

v1.0b release:

-Added a forwarding scheme. If the archive is spidered they will see the search-friendly version of the thread. If a user clicks onto that page from a search engine, they will be automatically forwarded to the real showthread.php. This is done with the REFERER tag.

-Added the top link to the footer as well for more keyword density

-Fixed all minor bugs

v0.1a:

-Made the private forums actually hide themselves

-Made the query string parser more flexible to work on any install location

-Added $privateForums variable so they will not be shown

-Added header("Status: 200 OK"); for the really picky engines

Overgrow
30 Apr 2001, 22:32
View an online example:

http://www.overgrow.com/edge/archive

eva2000
30 Apr 2001, 22:58
woah nicely laid out output too https://www.vbulletin.org/forum/

Overgrow
30 Apr 2001, 23:06
I know you must be piling on the sarcasm.. cracking on my UI like that! :) Look how flexible it is-- you can change the spacing to however many nbsp's you want! hehe

This is never meant to be viewed by a user except when they click through the search engine to the plain-text thread. Then the point is to get them to click on one of the top links to bring them into your real forum system. Before I launch this on my site I am definitely going to "pretty it up" but I figured I'd release the raw code and let everyone else have at it as well.

p.s. my site is damn slow today, I'm sure when installed on a real system it won't be so slow as that example.

eva2000
30 Apr 2001, 23:30
strange i just installed this on my vb 2 rc2 forum and it loads with all forums but no threads are shown when i click on any of the forum links/dates ?

i'd post the url but my private forums are revealed as well

is this meant to be set to mysql

var $CONN = "";

Overgrow
30 Apr 2001, 23:40
Does it tell you "No posts, please go back" ?

Does it give you a totally blank screen?

or does it at least show what forum you are in?

This change is just for eva to troubleshoot... FIND


echo spacer(2)."Dates: $date1 to $date2<br>&nbsp;<br>";

$query = "SELECT title,threadid,lastpost FROM thread
WHERE lastpost > '$ts1' AND lastpost < '$ts2'
AND forumid='$forumID' ORDER BY dateline ASC";


change to


echo spacer(2)."Dates: $date1 to $date2<br>&nbsp;<br>";

$query = "SELECT title,threadid,lastpost FROM thread
WHERE lastpost > '$ts1' AND lastpost < '$ts2'
AND forumid='$forumID' ORDER BY dateline ASC";

echo $query;


Then when it spits the query out, run it in phpmyadmin and see if it is a valid query.

eva2000
30 Apr 2001, 23:43
it outputs the 'No posts, please go back' message

eva2000
30 Apr 2001, 23:48
i ran this in phpmyadmin and it got mysql query error


SELECT title,threadid,lastpost FROM thread WHERE lastpost > '1136102400' AND lastpost < '1136620800' AND forumid='2000' ORDER BY dateline ASC

must be the forumid ? i only have 94 forums

Overgrow
01 May 2001, 02:36
It's just parsing the query string incorrectly, using the year instead of the forum. If you're having trouble like eva, change this:


$forumID=$urlArray[3];
$year=$urlArray[4];
$month=$urlArray[5];
$week=$urlArray[6];
$threadID=$urlArray[7];


to this


$a=0;

while($urlArray[$a] != "archive") {
$a++;
}

$forumID=$urlArray[($a+1)];
$year=$urlArray[($a+2)];
$month=$urlArray[($a+3)];
$week=$urlArray[($a+4)];
$threadID=$urlArray[($a+5)];

eva2000
01 May 2001, 02:55
great works perfectly now

one last thing.. how do i prevent private forums from being listed/outputted and indexed ? :D

thanks much appreciated :)

Chris Schreiber
01 May 2001, 03:25
Excellent and easy to install hack, thanks :)

I even liked the URL submission tool you linked to!

Overgrow
01 May 2001, 03:25
There ya go.. that is an important addition, eh? :)

There's a new zip file up top yonder.. the index.php has a new variable that holds your private forumid's.

Brian
01 May 2001, 04:44
One suggestion would be to show a suffix (.php , .shtml etc) so that is dynamic so the search engines know to spider it slower than if it was html. If they assume its html they might take on a tone at once as doing that to normal html files should be fine but this is actually doing all the db calls etc.

Just a suggestion but this is very nice!!

-Brian

Streicher
01 May 2001, 08:06
I have tested your hack and my private forums are not hidden.

And also some thread are not found, by the index.php

When i click on some forumlinks the forumpage is simply reloaded.

Overgrow
01 May 2001, 10:49
Thanks guys,

Streicher, you need to specify which forums are private at the top of index.php. Look for

$privateForums="|17|18|";

and put whatever forums in there that should be hidden. Enclose them with |pipes|. Anyone else having these same problems with reloading, etc?


Brian, my main target here is google. It does fine on other parts of my site where I do these type of query strings. If you do find that some engines are hesitant, let me know or make the change and tell us what you did.

Stephan Whelan
01 May 2001, 11:03
Overgrow,

I'm interested in the hack but can't seem to get it to work.

I've followed your instructions but am getting an Error 500 Internal Server Error when I try and run the script.

I'm running on a Cobalt RAQ4i so I don't know if there is a problem with the .htaccess file and the way the RAQ handles it.

Any ideas?

NickyNet
01 May 2001, 11:07
hi..

great hack.. :)

but not working for me..

http://www.nicky.net/foren/archiv/ :(

Overgrow
01 May 2001, 11:15
Nicky:

Here is your error

"The requested URL /foren/archiv/11 was not found on this server.
Additionally, a 404 Not Found
error was encountered while trying to use an ErrorDocument to handle the request."

That bottom line is the key.. it says your .htaccess is not pointing to the proper file. Edit the .htaccess and play around with it until you get it right. Sometimes the link has to be ../relative sometimes you can make it absolute /foren/archiv/index.php


Stephan can you give me a link? I use a Raq4i as well and it works for me. I did have to change the access.conf and put Allow Override ALL in there so htaccess could control the errordocument.

Stephan Whelan
01 May 2001, 11:29
Originally posted by Overgrow
Stephan can you give me a link? I use a Raq4i as well and it works for me. I did have to change the access.conf and put Allow Override ALL in there so htaccess could control the errordocument.

http://forums.deeperblue.net/archive/index.php

Streicher
01 May 2001, 11:33
Originally posted by Overgrow
Thanks guys,

Streicher, you need to specify which forums are private at the top of index.php. Look for

$privateForums="|17|18|";

and put whatever forums in there that should be hidden. Enclose them with |pipes|. Anyone else having these same problems with reloading, etc?

I know that and i have done it already. But it does not work.

eva2000
01 May 2001, 11:44
Originally posted by Streicher


I know that and i have done it already. But it does not work. same here some private forums disappeared but some didn't :(

KeithMcL
01 May 2001, 11:57
Just finished installing this hack but am having some problems.

I'm getting the error:

"Fatal error: Call to unsupported or undefined function htmlheader() in /home/keith/webdevforums-www/archive/index.php on line 76"

where line 70-78 reads:
header("Status: 200 OK");

//echo "<br>&nbsp;<br>$forumID - $year - $month - $week - $threadID";

if (empty($forumID)) {
htmlHeader();
forumList();
exit;

You can see the page at http://www.webdevforums.com/archive/

Overgrow
01 May 2001, 12:15
Sorry... please forgive my sloppy coding. I'm running a fever and worshipping the porcelain god. You can either replace the forumList function with that below, or download the new zip:


function forumList() {

global $db,$baseURL,$privateForums;

echo "Archives<br>";

$privateForums=preg_replace("/^\|/","",$privateForums);
$privateForums=preg_replace("/\|$/","",$privateForums);

$pfs = explode("|",$privateForums);
$whereclause="";
$wherecounter=0;


while($pf=array_shift($pfs)) {
if(ereg("[0-9]",$pf)) {

if($wherecounter==0) {
$whereclause= " WHERE forumid!='$pf'";
$wherecounter++;
} else {
$whereclause .= " AND forumid!='$pf'";
}
}
}



$query = "SELECT title,forumid FROM forum$whereclause ORDER BY forumid ASC";

$data = $db->select($query);

if(!empty($data)) {

while ( list ( $key,$forum ) = each ($data)) {

echo spacer(1)."<a href=\"$baseURL/$forum[forumid]\">$forum[title]</a><br>";
}
}

}

Overgrow
01 May 2001, 12:21
No clue, coolKeith... Why is your server not capitalizing the function name? See in the code it is htmlHeader() and your server says htmlheader()... Look down in the very bottom of the index.php and you will see the function htmlHeader()... it is there and working. All I can say is try and download it again. If that still doesn't work, change the function name to lower case and see if that helps.

Streicher
01 May 2001, 12:52
Thanks Overgrow. It works fine.

And i have find out that only forums with id <10 (1-9) makes a reload of index.php, when clicking on the link.

Peteruk
03 May 2001, 18:51
Originally posted by Overgrow

Anyone else having these same problems with reloading, etc?
.

I have sorted it now it was a problem within .htaccess

Peteruk
03 May 2001, 19:04
Just one problem (I hope :) ) it all loads up ok no errors or anything but when you click on a forum it loads up but where it should look like below

May
week1
week2
week3
week4

It just displays a page like below

Archives
Competition
2001




vbSpiderFriend by ~shabang~ of Overgrow

any idea what the problem is no doubt something I've done :)

Peteruk
03 May 2001, 19:08
Here I go again I have sorted the above and the problem was the date is in MM/DD/YYYY I thought I saw DD/MM/YYYY :rolleyes: :p

Overgrow
03 May 2001, 19:10
Sorry.. typical N.American-centric time code eh! But that's the way that the php function strtotime is written...

I'm glad people have this working. Now let's hear the success stories in a few months when spiders actually list our posts.

eiko
03 May 2001, 19:27
Having read this thread, I'm getting the same error:

"Fatal error: Call to unsupported or undefined function htmlheader() in /home/user/domain-www/forums3/archive/index.php on line 77"

I downloaded the latest zip as well. Any thoughts?

Also, the question was asked but I didn't see the answer?
In class.mysql.php, at the top, what is "conn" supposed to be set to?

Thanks

Peteruk
03 May 2001, 19:36
lol overgrow :)

Come to daddy you cybersurfin beauties

Top class site by the way love it, the design is top draw, and many thanks for this hack much appreciated.

Overgrow
03 May 2001, 19:50
Thanks peteruk :p

To the htmlheader() folks, are you by chance using PHP3? The reason I ask is because I define my functions last, but PHP3 needs them defined first. If so, try this version... I bet it will give an array_pop error or something since that's not a PHP3 function but that will tell us what's wrong...

ps. conn should be left blank

eiko
03 May 2001, 23:27
tes to php3

the error:
"Parse error: parse error in /home/user/domain-www/forums3/archive/index.php on line 284"

Thanks for the effort!

JenniferS
05 May 2001, 01:26
the above error can be fixed by replacing "=" with "echo"

unfortunately, i don't know what to do about:

Fatal error: Call to unsupported or undefined function array_shift() in /home/kdpublis/kdpublish-www/ssoda/archive/index.php on line 247


such a cool hack! think i'll change my host so i can have php4 already...

Dontom
07 May 2001, 16:04
Cool hack,
hasselfree installation
:D
Tom

veedee
09 May 2001, 09:30
Heh !

great hack !

Looks like Google have caught on and are ready for a bit of mass spidering :D:D:D

see pic !

supernut
09 May 2001, 12:20
would be nice if there is away to add adverts to the top and bottom of the created pages

Overgrow
09 May 2001, 14:41
Simple!

After this:

$pagetitle="Grow Marijuana @ Overgrow";


Insert this:

$header=""; // OPTIONAL header to insert at the top of all pages
$footer=""; // OPTIONAL footer to insert at the bottom of all pages



After this:

} else {
showThread();
}


Insert this:

echo $footer;



and one more

After this:

<BODY bgcolor="#ffffff">


Insert this:

<?=$header?>



These changes have been reflected in this file if you want to download it and change the other variables again ----> next post

Overgrow
09 May 2001, 14:45
p.s. Funny google gif! I thought you were going to show us how your board was somehow listed already. :rolleyes:

veedee
09 May 2001, 14:52
Originally posted by Overgrow
p.s. Funny google gif! I thought you were going to show us how your board was somehow listed already. :rolleyes:

No mate, i have too much free and time i like to have a laugh :)

I will install this hack when i

a) learn to do it
b) stop pissing about

good luck in your future hacks :)

limey
09 May 2001, 20:58
this is an incredible hack...thank you shabang!

Overgrow
11 May 2001, 16:10
I know the formatting looks really plain but it is really simple to customize. At the bottom of the script, look for

function HTMLHeader() {

and in there you will see normal HTML. Edit it with colors, a style sheet for fonts, etc, however you want. If you look through the script you will find loops that print out normal HTML if you want to change the background color of rows, etc...

It's intended to create simple doorway pages that will rank very high on search engines-- hence the way it dynamically pulls post title into the metas (one reason phpbuilder's pages rank so high). The pages have very little text before the meat of the content for high keyword prominence.

Be sure to add your main site keywords into the options at the top for better ranking overall. Somehow I expect some of my forum posts to rank higher than my normal site because of the search-engine-friendliness of the pages and the number of times keywords are repeated in long threads.

gmtalk
12 May 2001, 03:22
Overgrow. You asked if anyone was having the reload error.

I like the concept of this hack and have installed it on my server. I get the list of the forums. The private forums are hidden and all. When I click one of the listings I get the listing page over again.

Any suggestions?

http://discussions.gmforums.com/archive

TIA for any help

John

gmtalk
12 May 2001, 03:43
Got it. Nevermind my last post. I figured it out and it is working very well.

eiko
13 May 2001, 03:41
MAN! I'd LOVE to make this hack work :(

Fatal error: Call to unsupported or undefined function htmlheader() in /home/name/dir/forums3/archive/index.php on line 80

anybody had any luck with this?

Overgrow
13 May 2001, 16:08
Two options:

1) I continue back-porting this hack to PHP3.

2) You get your host to upgrade to PHP4.

In the ideal world, #2 is the correct choice :D It will help with any other hacks you try and install plus everything, including your VB, will run faster.

eiko
13 May 2001, 16:32
Problem is is a dedicated server... it's all up to me, which is unfortunate, big time newbie when it comes to installing new software, packages and the like.

I don't hold much hope for the NOS to do it for me. Patience I guess.

limey
13 May 2001, 17:08
Use this and skip the frontpage extensions instructions.
http://www.linuxnewbie.org/nhf/intel/webserving/a_m_f1.html

eiko
13 May 2001, 17:22
Thanks for the link! Good stuff.
I'm wondering about RedHat and "packages" ... will have to look into it, but it seems that it's a matter of uploading a php "package" and executing it. Just begining to read up.

Thanks for the input.

Gilby
14 May 2001, 16:55
Originally posted by Overgrow
This is never meant to be viewed by a user except when they click through the search engine to the plain-text thread.

How about doing a redirect for users? So when they click on the link from a search engine like google, the script will redirect them to the vbulletin page that has that thread instead of showing them the page that isn't formatted as much.

Overgrow
14 May 2001, 17:02
Oooh yea ya know that thought had crossed my mind.. then I quickly smoked it out of there.

I wondered how spiders treat redirections? Will it piss them off if it's always redirected and get the site delisted? Do they follow javascript hrefs? Should I use meta or jscript or both?

Help me out on how the spiders will handle this and I will get it implemented quickly.. I think it's a great idea.

Good point also is that it won't matter when you fix that-- the search engines will have your info and if you put a redirect in, all further links in will use it.

eiko
14 May 2001, 17:05
Most spiders ignore pages with redirects.

Gilby
14 May 2001, 17:10
Originally posted by Overgrow
I wondered how spiders treat redirections? Will it piss them off if it's always redirected and get the site delisted? Do they follow javascript hrefs? Should I use meta or jscript or both?

I was thinking of a redirect from the script side, it will check out the referrer, and if it is from a link external to the site (such as from a search engine), then it will execute the php code to redirect it, so it'd be:
header("Location:/forums/showthread.php?threadid=1234");

Overgrow
14 May 2001, 18:10
Ok... it's easy to detect if the Referer is from your host. Getting the header(location:) code to work is being a bugger! I've used this many times before.. I know the error that happens if you sent previous headers, but using the header(location:) here just fails! So I used both metas and jscript refresh.

Try this index.php and let me know if it works for you before I make it an overall change in the main zip. You won't notice any difference if you follow a link down the tree like a spider. IF you copy and paste out that final thread URL, go to a new site, then paste that URL back in then you should be forwarded to the real thread...

ie, if you come through a search engine, and not from your own site, the user should be pushed to the real deal

Gilby
14 May 2001, 18:27
Originally posted by Overgrow
ie, if you come through a search engine, and not from your own site, the user should be pushed to the real deal

Didn't work. I got:
Warning: Cannot add header information - headers already sent by (output started at
/home/sites/site3/web/forums/archive/index.php:88) in /home/sites/site3/web/forums/archive/index.php on line 316

Also, when I got that error, it redirected me to the real thread, which it's not supposed to. Make sure that when the referrer is blank, that it doesn't redirect. It should only redirect when the referrer is defined and is from an external link.

Overgrow
14 May 2001, 18:47
This is the if statement


if ((!strstr(getenv(HTTP_REFERER),$homeURL)) or (strlen(getenv(HTTP_REFERER)) < 1)) {


It is checking that the referer is greater than 0 length. This does work for me :( Can you troubleshoot it a bit on your end?

Gilby
14 May 2001, 18:57
Originally posted by Overgrow
This is the if statement


if ((!strstr(getenv(HTTP_REFERER),$homeURL)) or (strlen(getenv(HTTP_REFERER)) < 1)) {


It is checking that the referer is greater than 0 length. This does work for me :( Can you troubleshoot it a bit on your end?

Looking at that line, I see the problem. strstr() is case sensitive, and I defined my $homeURL to be a different case than what I accessed it from, so you should change that to use stristr() instead.

Overgrow
14 May 2001, 19:16
Thanks! I changed to stristr, good point. As soon as you let me know that this works I will make the changes in the main download-zip. That is a new index.zip up there a few posts back (with stristr).

steven
15 May 2001, 02:04
This is one awesome hack, the only problem that I am having is that the spider is accessing forums that I have specified it not to. Is there a quick fix for this?

Below is my index.php file


<?

/*

vbSpiderFriend v0.1a by ~shabang~

** Free License: YOU MUST LEAVE THE FOOTER INTACT **

*/

$privateForums="|4|5||6||7||8||9||33|"; // Hidden forumids, enclosed by | pipes
$firstPost="07/04/2000"; // MM/DD/YYYY of your forum's first post
$spacer="&nbsp;&nbsp;"; // The characters or spaces to use as one indent
$forumURL="/v2"; // Base URL of your forum

$homeURL="http://www.baddealings.com"; // The link URL for the top of the page
$homeLink="Baddealings.com - Online Source For Consumer Complaints"; // The link text for the top of the page

$keywords="complaints, complain, consumer complaints, consumer, fraud, scams, consumer fraud, grievances, auctions, electronics, feedback, shopping, buyers/sellers, forums, community, ripped off, chat"; // SET META INFORMATION HERE
$description="Baddealings.com provides an arena for members to complain about a product or service. Search our online database for the lastest information on consumer complaints and scams."; // The script will add to these fields
$pagetitle="Baddealings.com - Online Source For Consumer Complaints"; // with the info from the thread

include("class.mysql.php");



// NOTHING MORE TO EDIT BELOW /////////////////////////////////////////////////////////////




$baseURL=$forumURL."/archive";
$dateSplit = split("/",$firstPost);
$firstMonth = preg_replace("/^0/","",$dateSplit[0]);
$firstYear = $dateSplit[2];
$currentYear=date("Y",time());
$currentMonth=preg_replace("/^0/","",date("m",time()));


// setup the DB connection

$db = new MySQL;

if (!$db->init()) {
echo "no DB connection<br>";
exit;
}



// parse query string

if (ereg("/archive/[0-9]",getenv('REQUEST_URI'))) {

$urlArray=explode("/",getenv('REQUEST_URI')); //split the URL path
$a=0;

while($urlArray[$a] != "archive") {
$a++;
}

$forumID=$urlArray[($a+1)];
$year=$urlArray[($a+2)];
$month=$urlArray[($a+3)];
$week=$urlArray[($a+4)];
$threadID=$urlArray[($a+5)];

if (eregi("[a-z]",$forumID) or $forumID < 1 or strstr($privateForums,"|".$forumID."|"))
$forumID=1;
}

header("Status: 200 OK");

//echo "<br>&nbsp;<br>$forumID - $year - $month - $week - $threadID";

if (empty($forumID)) {
htmlHeader();
forumList();
exit;
} else if (empty($year) or empty($month)) {
htmlHeader();
weekList();
} else if (empty($threadID)) {
htmlHeader();
threadList();
} else {
showThread();
}

echo "<center><font size=1><br>&nbsp;<br>vbSpiderFriend by ~shabang~ of <a href=\"http://www.overgrow.com/\">Overgrow</a></font></center></body></html>";


function showThread() {

global $db,$baseURL,$forumID,$threadID,$homeURL,$homeLink,$forumURL,$keywords,$descript ion,$pagetitle;

$query = "SELECT thread.title as ttitle,forum.title as ftitle FROM thread LEFT JOIN forum ON thread.forumid=forum.forumid WHERE thread.threadid='$threadID'";

$data = $db->select($query);

if(!empty($data)) {

while ( list ( $key,$forum ) = each ($data)) {

$forumline = "Forum: <a href=\"$forumURL/forumdisplay.php?forumid=$forumID\">$forum[ftitle]</a><br>";
$threadline = "Thread: <a href=\"$forumURL/showthread.php?threadid=$threadID\"><b>$forum[ttitle]</a></b><br>";
$threadtitle = $forum[ttitle];
}
} else {
echo $query;
}

$ks = str_replace(" ",",",$threadtitle);

$keywords.=",$ks";
$description.=": $threadtitle";
$pagetitle.=" : $threadtitle";
htmlHeader();

echo "<center><a href=\"$homeURL\"><b>$homeLink</b></a><br>&nbsp;</center><br>";

echo spacer(1).$forumline;
echo spacer(2).$threadline;

$data=$db->select("SELECT post.dateline as dateline,post.postid as postid,post.pagetext as pagetext,post.username as fakename,
post.title as title,post.userid as userid, user.userid as userid,user.username as username,
user.usertitle as usertitle,user.posts as posts
FROM post
LEFT JOIN user ON (user.userid = post.userid)
WHERE post.threadid=$threadID AND visible=1
ORDER BY dateline");

if(!empty($data)) {

echo "<table border=0 cellpadding=10>";

while ( list ( $key,$posts ) = each ($data)) {

$username=$posts[username];

if(empty($username))
$username=$posts[fakename];

echo "<tr><td valign=top>$username</td><td>".nl2br($posts[pagetext])."</td></tr>";
}

echo "</table>";
}

}



function threadList(){

global $db,$baseURL,$forumID,$year,$month,$week;

echo "<a href=\"$baseURL/index.php\">Archives</a><br>";

$query = "SELECT title FROM forum WHERE forumid='$forumID'";

$data = $db->select($query);

if(!empty($data)) {

while ( list ( $key,$forum ) = each ($data)) {

echo spacer(1)."<a href=\"$baseURL/$forumID\">$forum[title]</a><br>";
}
}

switch($week) {

case("1"):
$fw = "01";
$lw = "07";
break;
case("2"):
$fw = "08";
$lw = "14";
break;
case("3"):
$fw = "15";
$lw = "21";
break;
case("4"):
$fw = "22";
$lw = "31";
break;

default:
$fw = "01";
$lw = "07";

}

$date1 = "$month/$fw/$year";
$date2 = "$month/$lw/$year";

$ts1 = strtotime("$date1");
$ts2 = strtotime("$date2");

echo spacer(2)."Dates: $date1 to $date2<br>&nbsp;<br>";

$query = "SELECT title,threadid,lastpost FROM thread WHERE lastpost > '$ts1' AND lastpost < '$ts2' AND forumid='$forumID' ORDER BY dateline ASC";

$data = $db->select($query);

if(!empty($data)) {

while ( list ( $key,$thread ) = each ($data)) {

echo spacer(3)."<a href=\"$baseURL/$forumID/$year/$month/$week/$thread[threadid]\">$thread[title]</a><br>";
//echo spacer(3)."<a href=\"http://www.overgrow.com/edge/showthread.php?threadid=$thread[threadid]\">$thread[title]</a><br>";
}
} else {
echo spacer(3)."No posts, please <a href=\"".getenv(HTTP_REFERER)."\">go back</a>.";
}

}

function spacer($n) {

global $spacer;

while($n!=0){
$return.=$spacer;
$n--;
}
return $return;
}

function removeZero($n) {

preg_replace ("/./","",$n);
return $n;
}


function weekList() {

global $db,$baseURL,$forumID,$firstYear,$currentYear,$currentMonth,$firstMonth;

$months = array("blank","January","February","March","April","May","June","July","August","September","October","November","December");

echo "<a href=\"$baseURL/index.php\">Archives</a><br>";

$query = "SELECT title FROM forum WHERE forumid='$forumID'";

$data = $db->select($query);

if(!empty($data)) {

while ( list ( $key,$forum ) = each ($data)) {

echo spacer(1)."$forum[title]<br>";
}
}

for($y=$firstYear;$y<=$currentYear;$y++) {

echo spacer(2)."$y<br>&nbsp;<br>";

if($y==$firstYear)
$fm=$firstMonth;
else
$fm=1;

if($y==$currentYear)
$lm=($currentMonth + 1);
else
$lm=13;


for($m=$fm;$m<$lm;$m++){

echo spacer(3).$months[$m]."<br>";

for($w=1;$w<5;$w++) {

if (strlen($m) < 2)
$mo="0".$m;
else
$mo=$m;

echo spacer(4)."<a href=\"$baseURL/$forumID/$y/$mo/$w\">week $w</a><br>";
}

echo "&nbsp;<br>";
}
}

}


function forumList() {

global $db,$baseURL,$privateForums;

echo "Archives<br>";

$privateForums=preg_replace("/^\|/","",$privateForums);
$privateForums=preg_replace("/\|$/","",$privateForums);

$pfs = explode("|",$privateForums);
$whereclause="";
$wherecounter=0;


while($pf=array_shift($pfs)) {
if(ereg("[0-9]",$pf)) {

if($wherecounter==0) {
$whereclause= " WHERE forumid!='$pf'";
$wherecounter++;
} else {
$whereclause .= " AND forumid!='$pf'";
}
}
}



$query = "SELECT title,forumid FROM forum$whereclause ORDER BY forumid ASC";

$data = $db->select($query);

if(!empty($data)) {

while ( list ( $key,$forum ) = each ($data)) {

echo spacer(1)."<a href=\"$baseURL/$forum[forumid]\">$forum[title]</a><br>";
}
}
}


function htmlHeader() {

global $keywords,$description,$pagetitle;

?>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<HTML>
<HEAD>
<TITLE> <?=$pagetitle?> </TITLE>
<META NAME="Generator" CONTENT="EditPlus">
<META NAME="Author" CONTENT="">
<META NAME="Keywords" CONTENT="<?=$keywords?>">
<META NAME="Description" CONTENT="<?=$description?>">

<style type="text/css">

BODY {font: 10pt verdana,arial,sans-serif;}
TD {font: 10pt verdana,arial,sans-serif;}

</style>

</HEAD>

<BODY bgcolor="#ffffff">

<?
}

?>

Thanks Steven

steven
15 May 2001, 02:17
It appears as though the spider is only not indexing the first 2 forums that are specified for exclusion, but if there are more than 2 forums specified in the index.php file, then it still spiders the forums that I don't want to be included.

steven
15 May 2001, 20:29
Solved my problems, on last question. Does anybody know how often that the vbspiderfriend spiders you site? So let's say I got 100 posts on my site today and get 200 tommorrow, when does the vbspiderfriend spider those new posts?

Thanks
Steven

Overgrow
15 May 2001, 20:55
I have a new version that fixes all of the bugs with redirection etc, I'll try and get it uploaded in the morning. It won't require any major changes if you update, just copy over your existing variables.

Glad you got your problem sorted, maybe post the solution too so others that run into it know how to fix it as well.

re: how often does it spider?

vbSpiderFriend is not a spider itself, it is simply a new way to view your forum-- a way that is more friendly for a spider to navigate and archive. The data you see from vbSF is totally fresh and updated all the time since it reads it dynamically from the database.

The question is-- how often do you re-submit your site to the search engines? If you submit it today with 100 posts, it will always have those same 100 posts on a search engine until you re-submit. At that point, the engine spider can crawl all of your new posts. Some spiders automatically return after a certain time, some do not.

MrLister
17 May 2001, 02:44
on mine when i click on one of the forums it says this page cannot be found. that's when it goes to something like /archive/14 why does it do this?

Overgrow
17 May 2001, 17:12
Your .htaccess ErrorDocument is not pointed correctly. You MAY need to edit your access.conf to allow override all for .htaccess files.

You probably just have to edit the .htaccess and mess around with the file pointer. It needs absolute directory names most likely.

Overgrow
17 May 2001, 17:22
EVERYONE: Please download the latest version from the first page of this thread. Thanks!

MrLister
17 May 2001, 22:25
i'm using microsoft iss. how would i do it under there? if i select the file as a error thing it shows the source code and nothing else when you click on one of them.

Overgrow
17 May 2001, 22:49
Sorry :( No clue on how to do the URL trick under IIS.

Gilby
18 May 2001, 04:17
Originally posted by mrlister
i'm using microsoft iss. how would i do it under there? if i select the file as a error thing it shows the source code and nothing else when you click on one of them.

From http://www.microsoft.com/technet/iis/steps.asp

Custom Error Messages

In Apache, you provide custom error messages by editing the Error Document and referring to it by using the command:
ErrorDocument 404 http://www.domain.com/404.html

To customize error messages in IIS 5.0, in the IIS snap-in open Properties for the Web site. On the Custom Errors tab, you'll see the location of the error message files. From here, you can map custom error messages to a file or to a URL on the local server.

cyrus
19 May 2001, 23:21
hi

I would really like to get this thing working but Icant for some reason :(

here is the link to my arschive files

http://66.78.8.167/~thehood/archive/index.php

i keep getting 404 errors

do I need to make archives of the forums or sumthing ?? and if so, how do i make these.

thanx all !!!

Overgrow
20 May 2001, 00:03
OK here is the important line on your error screen:

>>Additionally, a 404 Not Found
error was encountered while trying to use an ErrorDocument to handle the request.

That means your .htaccess ErrorDocument is not setup correctly. Try different paths-- absolute not relative normally.

cyrus
20 May 2001, 00:20
hi overgrow

i went to my space and i cant seem to see a .htaccess file in it.
i tried uploading it again, but it doesnt seem to want to uplaoad into there.

any hints ?

MrLister
20 May 2001, 05:22
ok i set the 404 error to be at /archive/index.php and now instead of it saying 404 error not found it pretty much just reloads. any ideas?

cyrus
20 May 2001, 08:23
I am still having the same proble, and cannot see a way to resolve it yet.

the addy to it is this :-

http://66.78.8.167/~thehood/archive/index.php

I put this in my .htaccess file which I uploaded to the forum directory:-

ErrorDocument 404 /archive/index.php

I do not see what I am doing wrong :confused:

also, wat is the var $CONN = ""; in index.php ??

Can someone please help me !!!!!!!! ?

Thank you

Streicher
20 May 2001, 10:36
The Version Number in the ZIP-File is still 0.1a. I have compare the file with the previous downloaded version and find no fixes.

When i click on a forumlink with an ID <10 the page reloads only.

Overgrow
20 May 2001, 15:47
EVERYONE: Please download the newest version on page 1 and make sure it's titled "vbSpiderfriend_v1"... I had some trouble getting the latest version to upload, it seems to be there now. The top of the file will say v1.0b if you have the correct one.

Streicher: Please give a link to the archive-- no one else has said anything about a problem like this with forums < 10.

Cyrus/Lister: $CONN should be left blank.

/archive/index.php is not the full path to the file. It will be something like:

/~thehood/archive/index.php

or

/home/cyrus/archive/index.php

or

/home/wwwdocs/HTML/~cyrus/archive/index.php

etc, etc..

cyrus
20 May 2001, 17:40
hi

i did everything u said, even updated to the latest hack version

but it still refuses to work :(

my htaccess file has thsis in it now

/home/thehood/archive/index.php

which is the FULL path to it.

It doesnt work still 4 me.

it goes to some file like "2" in archive ... but theres nothing there, how is it supposed to open that ???

Streicher
20 May 2001, 18:07
@Overgrow:

here is the Link: http://www.studenten-city.de/forum/archiv/

If have updated to Version 1.

cyrus
20 May 2001, 18:35
hmm .....


I think this is related to my problem :-

This uses ErrorDocument and query string parsing to get the variables needed

if it is, can anyone help me with this plsssssssssss ?????????

:)

Gilby
21 May 2001, 02:03
Originally posted by Cyrus
http://66.78.8.167/~thehood/archive/index.php

I put this in my .htaccess file which I uploaded to the forum directory:-

ErrorDocument 404 /archive/index.php

I do not see what I am doing wrong :confused:


You need it as
ErrorDocument 404 /~thehood/archive/index.php

cyrus
21 May 2001, 07:41
hi

thanx Gilby, its finalyl been done.

one problem tho, when i go to see posts, it says no new posts although I KNOw there should be.

any hints ?

robertusss
21 May 2001, 14:37
On each page you should add a link saying:

this is the text-only lowtech version of this thread.
Click HERE to see this thread with all graphics, options, features and links.

Overgrow
21 May 2001, 16:29
After reading that the forwarding-mechanism I'm using would be considered 'cloaking' by Google, I have now uploaded a new version which initially starts with the auto-refresh turned off. If you wish to have it turned on, simply enable $refresh=1;

Also incorporated robertussss' idea since we're no longer auto-refreshing.


Streicher: Sorry :( I have no idea on your problem-- you are the only one who has reported this and I cannot duplicate it.

cyrus
21 May 2001, 18:25
wat about me overgrow ??

what shall "I" do :(

Overgrow
21 May 2001, 20:09
Hi Cyrus,

The first thing that I will ask you to do is to restore the footer as it was with my name, copyright, and link to Overgrow. I ask nothing from you when I give out my code except that you leave the footer/links intact-- it says so at the top of the file:

** Free License: YOU MUST LEAVE THE FOOTER INTACT **

Restore that and then we can work see about the problem. Thanks.

cyrus
21 May 2001, 21:38
lol

oops, sorry. I didnt read that !

itll be restored in 5 mins

sorry

:(

eva2000
22 May 2001, 01:23
Originally posted by Overgrow
After reading that the forwarding-mechanism I'm using would be considered 'cloaking' by Google, I have now uploaded a new version which initially starts with the auto-refresh turned off. If you wish to have it turned on, simply enable $refresh=1;

Also incorporated robertussss' idea since we're no longer auto-refreshing.


Streicher: Sorry :( I have no idea on your problem-- you are the only one who has reported this and I cannot duplicate it. well i only have the original released you made installed... so what's with the cloaking ?

Overgrow
22 May 2001, 01:28
The cloaking applies to 1.0b so if you have the original you can leave it as is. Cloaking refers to giving the spider a different page than you give the user-- no matter how well intentioned you are-- it is a bannable offense on google.

It was a nice idea for a feature but should not have been included as the default, so now it is an option.

I would upgrade to the latest simply because it provides a better link at the top, but there's no real need.

Overgrow
22 May 2001, 01:40
Cyrus I'm still looking into your problem, I will probably have a piece of code for you to paste in tomorrow.

cyrus
22 May 2001, 10:05
excellent, thanx

ill be waiting for it.

Overgrow
22 May 2001, 16:05
Cyrus, OK here is your problem.. your VB has the month/day reversed from standard American time :rolleyes: hehe...

Find:

$month=$urlArray[($a+3)];
$week=$urlArray[($a+4)];


Replace with:

$week=$urlArray[($a+3)];
$month=$urlArray[($a+4)];


That's the easy fix. It won't have the dates right on the top of some of the pages but it won't affect the spider.

robertusss
22 May 2001, 17:55
(removed a stupid question here)

cyrus
23 May 2001, 02:03
hi overgrow,

i made that fix, but i get the same problem :(

what shallI do now :confused:

Overgrow
23 May 2001, 03:14
Uhh Cyrus? It does work sometimes...


http://www.t-hood.com/archive/2/2001/02/3

http://www.t-hood.com/archive/2/2001/04/2

http://www.t-hood.com/archive/9/2001/03/3


Sorry, I don't know what to tell you. The reason you have problems is because your date format is not the one the program was intended to work with.. since it reads the date from the query, it is touchy about how it's formatted.

Here is a great chance to learn PHP and figure out how to fix it. Otherwise I would need FTP access to your site so I can try new scripts, there is no other way for me to troubleshoot it.. (besides switching my vb over to the other format but mine is a bit busy to be doing that with)

Overgrow
23 May 2001, 03:18
robert, I just looked over the class and it seems like it should work. I'm sorry I wrote it using my personal mysql class file instead of just including VB's.. but at the time I didn't have a copy of 2.0 handy and I didn't want to write to 1.x in case it had changed.

Having said all that, I don't see a problem.. I would re-download the latest version, and type in the info again.. Otherwise it would not be difficult to remove my class stuff and just use VB's if you wanted to edit the script.

cyrus
23 May 2001, 03:27
Hi Overgrow,

I know wat u mean.

But seeing the dates arent correct, can I revert the dates to the what they should be ?? Will it work fine then ?? I do not mind doing that, its fine with me :)

Overgrow
23 May 2001, 04:19
No guarantee, but it's a good bet!

krohn
28 May 2001, 07:39
http://www.forumoc.com/archive/

not working here either...

.htaccess looks like this
ErrorDocument 404 /home/web/forumoc.com/archive/index.php

robertusss
28 May 2001, 08:48
@krohn try:
ErrorDocument 404 /archive/index.php

here is the man-page for apache:

http://httpd.apache.org/docs/mod/core.html#errordocument


But on my site it doesn't work either... maybe there is a switch somewhere in httpd.conf to enable customized error-pages globally...

jojo85
28 May 2001, 12:02
Nice hack!!!
Congrats
I love it!

etones
30 May 2001, 21:07
Overgrow ... Ya just a god damm GENIUS!

Your hacks are just toooo good man, keep em up..

and is that you with the afro...? If so.. damm your a kewl succer :D

chrispadfield
30 May 2001, 21:51
just installed in less than 5 minutes, amazing hack thanks so much

Overgrow
30 May 2001, 22:33
Welcome guys... I don't sport the fro no mo... (hey man slap mah fro)


....did anyone get their archive in this last Google update? I was too slow for this one, but I have seen the googlebot back a few hundred times since then so I think he's starting to take the bait.

Overgrow
30 May 2001, 22:38
OK! If everyone has left their footer intact (you did read the big licensing agreement at the top, right?) --- then we officially have our FIRST VBULLETIN SPIDERED INTO GOOGLE!

Congrats Streicher of Studenten-city.de!

The one guy that had some of the worst trouble installing this piece-of-hack-- he's the guy that manages to get in Google first! Only a few pages so far but now the spider knows where you live :)

http://www.google.com/search?q=vbspiderfriend

chrispadfield
30 May 2001, 23:01
someone's referral stats are going to get pushed up by this in google ;) but you deserve it and that 5 times over.

my footer is of course still there :)

www.ascifi.com/forums/archive/

chrispadfield
30 May 2001, 23:03
http://www.alltheweb.com/cgi-bin/search?type=all&query=vbspiderfriend

jamez
31 May 2001, 00:00
I can get it to work, but not on a subdomain. Is there away to make it work with subdomains?

robertusss
31 May 2001, 06:57
sorry, but I do not get it to work. is there a hhtpd.conf setting I am missing?

I have the .htacces-file in place in /archiv, but i still get the 404-message that is defined in the .htaccess in htdoc-root - doesn't the .htaccess in /archive-directory overwrite the root one?

Overgrow
31 May 2001, 15:46
jamez: sorry no clue on subdomains, you need more of a linux expert than me.

robert! didn't know you were still having trouble.. I believe the setting you are looking for will be in access.conf (possibly httpd.conf)... it is called "Allow Override" for .htaccess on a directory level. It is fairly well documented inside of the file. You need to Allow Override All or at least the part about redirects.

hope this helps... I'm sure someone with more apache knowledge can offer better advice.

robertusss
31 May 2001, 15:57
Yeah, I found that on apache.org and in httpd.conf and I enabled it. Now my customized error document defined via a .htaccess file in / works fine, but not in /archiv.

anyone here that can help me?!

Jesse69
03 Jun 2001, 08:33
Hi - Im very interested at the script - but cant get at the first Page. - please tell me where to get it ....

Jesse

http://161.58.84.213/forum/showthread.php?s=&threadid=15628
produces errors

Jesse69
04 Jun 2001, 05:13
Ok , seems someone fixed that prob- no I got the Hack , but :
Fatal error: Call to unsupported or undefined function htmlheader() in /homepages/2/d21873614/htdocs/forum/archive/index.php3 on line 69

any hints ?

Jesse

Overgrow
04 Jun 2001, 13:57
Hi Jesse,

You need to upgrade from PHP3 to PHP4. The script will work and everything will be faster and more pleasant.

og

chrispadfield
04 Jun 2001, 14:08
google is being incredibly slow to index us... um..

Overgrow
04 Jun 2001, 14:19
The google update has already happened for last month and the databases appear stabilized across www-www2-www3. Check next month around this time and I'm sure we'll have more VBSFs in there.

Jesse69
04 Jun 2001, 14:32
@ Overgrow

php4 ? No chance for this year at my provider ... :-(

looks like the searchengines will not found us anymore ...

Jesse

Overgrow
08 Jun 2001, 18:01
IMPORTANT MESSAGE FOR SPIDERFRIENDLY FORUMS:

This little piece of Google info came from a great interview about Google's Page Rank and Term Vectors (http://www.webposition.com/mp-current.htm#SEVEN).

...while Google might spider your site, it won't be added to the database until someone else links to it.

I was guilty of this myself with the vbSpiderFriend here. I didn't want this link to be visible to my users-- or they might start demanding a bloat-free text only version of my forum! So I manually submitted my Archive to Google hoping it would be added to the database.

According to that article (and interview with a head Goog) if a website is an island-- with no links leading into it from anywhere-- it will never be added to the database. So! Lesson learned: you must put a link to your archive somewhere on your site that Google normally indexes each month. Also, your Page Rank will of course be higher since it was found and not submitted.

eva2000
08 Jun 2001, 18:12
Originally posted by Overgrow
IMPORTANT MESSAGE FOR SPIDERFRIENDLY FORUMS:

This little piece of Google info came from a great interview about Google's Page Rank and Term Vectors (http://www.webposition.com/mp-current.htm#SEVEN).

...while Google might spider your site, it won't be added to the database until someone else links to it.

I was guilty of this myself with the vbSpiderFriend here. I didn't want this link to be visible to my users-- or they might start demanding a bloat-free text only version of my forum! So I manually submitted my Archive to Google hoping it would be added to the database.

According to that article (and interview with a head Goog) if a website is an island-- with no links leading into it from anywhere-- it will never be added to the database. So! Lesson learned: you must put a link to your archive somewhere on your site that Google normally indexes each month. Also, your Page Rank will of course be higher since it was found and not submitted. hence i linked the my url when i installed your hack to the bottom of my forum pages - i have already been indexed too :)

Brian
14 Jun 2001, 00:17
How could you make this only display the months/weeks when their are posts made.. IE for not popular forums some areas dont get posted often and I would rather the search engine not get the "No posts, please go back.:" phrase so often.

Overgrow
14 Jun 2001, 15:48
Doing this would make the display time unbearable I believe. Maybe not if your forum DB is really small.. but if you want to remove the dead weeks you would have to query each week as it is displayed in the list..... adding many many queries to the week-list.

Good idea, easy to implement, but mind-boggingly slow if it's done.

Brian
14 Jun 2001, 16:04
gotcha make sence

Brian
14 Jun 2001, 17:02
Here is an odd question. If I have more forums private then public it would be easier for me to on the top just list the ones I want displayed. What would I need to do to change it so it only displays the ones I want. I know I could simply put all the ones I dont want on top but I have a ton of the ones I dont want displayed and just a few of the ones I do want displayed.

Overgrow
15 Jun 2001, 22:14
(This is untested) Make your changes to the top variable and...

Find:

if (eregi("[a-z]",$forumID) or $forumID < 1 or
strstr($privateForums,"|".$forumID."|"))


Replace with:

if (eregi("[a-z]",$forumID) or $forumID < 1 or
!strstr($privateForums,"|".$forumID."|"))



Find:

while($pf=array_shift($pfs)) {
if(ereg("[0-9]",$pf)) {

if($wherecounter==0) {
$whereclause= " WHERE forumid!='$pf'";
$wherecounter++;
} else {
$whereclause .= " AND forumid!='$pf'";
}
}
}


Replace with:


while($pf=array_shift($pfs)) {
if(ereg("[0-9]",$pf)) {

if($wherecounter==0) {
$whereclause= " WHERE forumid='$pf'";
$wherecounter++;
} else {
$whereclause .= " AND forumid='$pf'";
}
}
}

julius
22 Jun 2001, 00:29
Overgrow, great hack!

I don't know why, but some threads are not displayed.

If I put a link to the archive, at the bottom of myhomepage.com (not forum index) of the same color of the background, so it is invisible, it would be fine to google?

gmyachtsman
22 Jun 2001, 06:01
This looks great !!

I'd love to get the search engines' notice.

Yesterday I just purchased VBulletin. I am hosted on a server that supports .php, .php3 .phtml, and .php4 (HostRocket). When I went to download from VBulletin, I was faced with this surprise:

Click here to download vBulletin version 2.0.1 with .PHP file extensions (default).

Click here to download vBulletin version 2.0.1 with .PHP3 file extensions.


Well, I chose .PHP3 file extensions, not knowing the differences.

Will your hack work on that version? Should I change to .PHP instead?

jarvis
24 Jun 2001, 04:25
Does anyone have this running under IIS? I have set the custom error 404 to point to my /archive/index.php but all I get from the links generated by the hack is:

a) the unparsed php code of /archive/index.php

or

b) 404 file not found (only if I let IIS do the 404 default redirects)

Ideas?

Thanks!!

etones
24 Jun 2001, 08:45
i's change to the .PHP version, it will be better for you in the future.

jarvis
24 Jun 2001, 09:15
i's change to the .PHP version
PHP version of what? My homepage? My question pertains more to getting this running under IIS which is handling vB fine, but many of the hacks are intended for unix-based systems. I'm just curious if anyone has my configuration?

Thanks!

gmyachtsman
25 Jun 2001, 04:48
Thanks, etones. I replaced the php3 version with php. In addition to installing the php files in place of the php3 file, I had to edit my database to replace all the "php3" references with "php". The results were very good!! Just about all my changes were preserved.

Then I went on to install VBSpiderfriend. It is working fine now, but I had the following problems:

1. I had not realized that VB adds a /~databasename folder into the file path. I was trying to make the path work that was in my own directory (not what actually shows up on the URL itself).

2. I had named my archive folder "webarchive" instead of "archive". Before I fixed (1) above, while my first page list of forums showed up fine, when I clicked through, I got a standard 404 file not found error. After I fixed (1) above I got a file not found error right on the front page (this was a different error template...one I had modified for my own site's use). What I learned: the archive folder must have that exact name.

3. Every time I uploaded it, my .htaccess file would disappear without a trace. But I guessed it was there, since it kept overwriting itself and now it is working (though still invisible).

Next for the customizing, the outside links, better Meta tags, etc (the easy parts).

Thanks, Outgrow, for a beauty.

Bane
27 Jun 2001, 04:00
I have tried every variation I can think of in my .htaccess file and cant seem to get this to work.

Either it will go to http://www.influx9.com/archive/3
or http://www.influx9.com/archive/index.php and just relist

I have tried a lot of variations on the .htaccess file.. even just the http://www.influx9.com/archive/index.php (which will make it reload) any ideas what Im doing wrong? My FTP client puts the file at /home/influx9/www/archive/index.php

gmyachtsman
27 Jun 2001, 04:52
Bane, read my last post above; then open your Vbulletin board and look at the address in the URL box.

http://www.influx9.com/index.php


Then go to your VBulletin Forums Admin Control panel and look at the URL there:

I don't know what it is, but it will be different.


I think that difference may have the clue you need.

I am not familiar with the code you use to change the web pages (i.e., php? action=forums), but it is really cool. Is that or is your directory structure (archive not being in the forums folder) making a problem? Too early for me to tell, though others probably could. But before wondering any more do what I just had to above.

I am far less expert than the others around here, but I hope that this may help.

chilliboy
02 Jul 2001, 10:18
Have a look at this - it may be the 'perfect' alround solution, using a samll script in a customised 404 error page:

http://vbulletin.com/forum/showthread.php?s=&threadid=21723

Here's the key stuff if you can't be bothered to read all:

that 404 is not complex, I think that's the best content management system...a lot of hosts let you do a custom 404, but most of us will have their own server/clients server for their files I think?

now the script I posted here has nothing to do with the 404 thing, the 404 is just a few lines, you just put

header("Status: 200 OK"); before any html, and then you do:

$url = explode("/",$REQUEST_URl);
$page=$url[1];

then you check if that content exists

$content = mysql_query("SELECT content FROM content WHERE page=$page");
if (!$content) header ("HTTP/1.0 404 Not Found"); (wich will override the status: 200 OK)

and then you just put the content there!! I didn't test this, but I'm planning to use this if nobody else knows a better content management system, but I believe this is the way it's done. There are peeps using this, so it does work, just not sure all my code is correct ;D

Overgrow
03 Jul 2001, 12:40
Congrats to Eva of Animeboards who now has his vBulletin spidered by Google! He has a good page rank evidently since Google went very deep on only it's first pass. Animeboards now has 11,700 new pages listed in Google with the spiderFriend!

Also to Deadbodies.org who got their spiderFriend noticed, but without much page rank, the bot didn't travel down the links yet. Hopefully next pass...

julius-- do not use invisible links.. if Google notices this, your site will be banned. Better to use a real link, let a few users see it if they must, but keep your position in Google.

Also another note in case you missed it... do not ORPHAN your archive. It MUST be listed as a link from a page that Google already knows (ie, your home page). If you have an orphan archive and submit that link directly to Google, but the bot never finds it's way there itself, your page will never be listed in the search results. Google hates orphans...


edit: changed Eva's gender to more correctly reflect the actual person :)

chilliboy
03 Jul 2001, 12:50
OverGrow - I haven't really had time to check exactly how you area getting this hack to work but I guess you are doing something with the '404' method I posted two posts up. I did some checking on sitepoint and found some threads by you on how you use this method for most of your site.

Do you think it would be possible to develop your hack further so that these 'perfect' URLs are not only used by search engines but users as well? ie this sort of link is used throughout your vB as standard, and not just an extra trick for getting search engine listed.

It would be really cool if you could use the "<!-- breadcrumb, nav links -->" as the URL eg this post would have a URL like:

vBulletin_Community_Forum/Customising_vBulletin/vBulletin_Code_Hacks/Releases_Version 2.x/vbSpiderFriend_ -_ Search Engine Friendliness/

These would then also be available as for use as dynamic meta keywords.

Cheers

Overgrow
03 Jul 2001, 19:30
oh man dynamic meta keywords inserted as the page name........... BRILLIANT! Google will eat it up...

I dunno when I might get around to implementing this, but I love the idea. Thanks a million, I'll be sure to mention your name if I ever publish the hack.

I do use the 404 trick for the rest of my site including user links... now when people post a link to an article that they've bookmarked, its:

http://www.overgrow.com/article/2/3

(article 2, page 3).. same thing for the FAQ...

I haven't done it with vB yet since it would require a huge overhaul for little benefit (in my eyes). OTOH, Wayne Luke has been modifying the SitePoint forums to do just that. I'm sure a search here or there will find the W.Luke threads on vB urls.

Overgrow
03 Jul 2001, 19:35
ps. after thinking it through a bit, I'm not sure if we can do the vB message title as the final part of the URL... without having to query to match the title back with the threadid when the user requests it. I was actually more excited about using it for my articles and FAQ. It would be easy to build a lookup table for those without having to run a huge query like you would for the vB.

So I don't want to sound to excited about implementing your idea with the spiderFriend.. not sure if it's possible or feasible.. but it triggered a thought of how to do it with the rest of the site.

julius
04 Jul 2001, 13:29
I've disabled html in Vb posts.
But, if someone put some html or javascript tag, with the spyderfriend they will run.
Maybie to prevent this it's better to censor in vB some dangerous words like "javascript"?

I found some threads are not in the list. Any idea?

ldydvr
05 Jul 2001, 13:01
Just wondering ...

Is the zip in the second post the latest updated version of the hack?

I noticed the date read 05-21-2001 and just wanted to make sure before starting.

=-)

dwh
10 Jul 2001, 04:48
Originally posted by Overgrow
Some spiders automatically return after a certain time, some do not.

!! That's the first time I heard this! Which spiders (or major ones anyway) crawl only once?

Nice hack btw.

JackG
11 Jul 2001, 05:22
Will this work on Windows NT ?

Has anyone tired?

Godfather1
12 Jul 2001, 22:30
I have the same problem like Streicher when i click on a forumlink with an ID <10 the page reloads only.
you have an idea why?


cu

robertusss
13 Jul 2001, 11:23
soon after I included this hack on my site, google kicked me out completely... :(
Hope it doesn't have to do anything with this hack...

chilliboy
13 Jul 2001, 12:21
Hi Overgrow,

These are just some general questions as regard to this hack.

As google and a few others can spider ? and & URLs what other advantages does this hack have other than all the dynamic meta tags?

What I'm getting at is - if I can do all the dynamic meta tags etc straight into my vB then removing them from the equation why would this hack improve the ability for search engines to spider.

As far as I can see the only difference would be the / in the URL's as opposed to ? and &'s. Are there any other advantages? Are they easier to spider because 'less clutter' from CSS, images etc?

Just wondering as I'm not that enlightened on exactly how search engines (specifically google) work, and what are all the benifits of this script over just inserting dynamic meta tags straight into your actual forum pages.

Cheers

eva2000
02 Aug 2001, 01:28
Originally posted by Overgrow
Congrats to Eva of Animeboards who now has his vBulletin spidered by Google! He has a good page rank evidently since Google went very deep on only it's first pass. Animeboards now has 11,700 new pages listed in Google with the spiderFriend!

edit: changed Eva's gender to more correctly reflect the actual person :) thanks overgrow, my pagerank boost was from linking to my main site eva2000.com which has a very very high rank :)

unfortunately your hack doesn't seem to show threads anymore? :(

any updates?

eva2000
02 Aug 2001, 02:05
never mind i found my problem, each forum upgrade i start with a new database and perserve the previous version database as a copy but forgot to change the config file for your hack to point to the new db :rolleyes:

robertusss
02 Aug 2001, 06:21
eva2000: did you submit your page to google again, or did google visit you automatically?
If you submitted, did you submit www.yourdomain.com or did you submit www.yourdomain.com/forum/archive/ ?

eva2000
02 Aug 2001, 18:28
Originally posted by robertusss
eva2000: did you submit your page to google again, or did google visit you automatically?
If you submitted, did you submit www.yourdomain.com or did you submit www.yourdomain.com/forum/archive/ ? i submited my main domain page along with a particular archive thread i really want to see indexed and just wait :)

dwh
07 Aug 2001, 18:28
Are you sure you folks aren't putting out 404 error pages?

http://www.phpbuilder.net/annotate/message.php3?id=1000365

JohnL
15 Aug 2001, 00:01
Did anyone ever figure out why some people had a problem where any of the links reloads the index.php page?

Here is an example http://www.reefcentral.com/vbulletin/archive/index.php

I'd love to get this hack to work.

ForzaGrifo
22 Sep 2001, 00:04
What are the major search engines other than Google that this hack would work on?

macuser
24 Sep 2001, 09:14
The forums with id=1,2,3,4,5,6,7,8,9 reloads the index.php.

http://www.macuser.de/forum/archive/index.php

Some people have the same problem .
Any idea ?

MacUser

ForzaGrifo
24 Sep 2001, 20:55
Originally posted by Overgrow
I am tired of my 200,000 posts not being listed in Google.

I just read in clickz that Google can now crawl dynamically generated pages. Is that true??

the article is here:
http://www.clickz.com/search/opt/article.php/874131

Alun
25 Sep 2001, 15:48
I just read in clickz that Google can now crawl dynamically generated pages. Is that true?? = Yup.

Search Google for "Thorikos" and fifth link down is my forum. I think I've had a few referrals from posts to do with oikoi too.

macuser
27 Sep 2001, 11:35
Originally posted by MacUser
The forums with id=1,2,3,4,5,6,7,8,9 reloads the index.php.

http://www.macuser.de/forum/archive/index.php

Some people have the same problem .
Any idea ?

MacUser

Someone Please help. :confused:

JohnL
27 Sep 2001, 18:53
Are you using an IIS server also? I am and can not get the links to work properly either. I would really love to get this working.

Razzie
06 Oct 2001, 13:11
Small Bug with displaying private forums:

If you have a private forum with id 5 and try acessing page - /archive/5 it won't show the threads BUT if you add a zero to the front and try - /archive/05 it WILL lists the private forum.

To fix find: while($urlArray[$a] != "archive") {
$a++;
}

$forumID=$urlArray[($a+1)];
add below:$forumID = $forumID + 0;

CoolaShacka
06 Oct 2001, 14:40
Thanks, Overgrow.
Realy nice work.

I have a litle Problem. Could you please tell be what's the Bug?

******************************
link deleted
******************************

Thank you in advance

Razzie
06 Oct 2001, 18:09
Have you set the .htaccess file?

CoolaShacka
06 Oct 2001, 18:25
Yes.
my .htaccess is ErrorDocument 404 /home/csbvu/public_html/archive/index.php I am hosted by VO

JohnL
06 Oct 2001, 19:40
Is there any way to get this to work on a W2K server with IIS?

I did the redirect for the 404 error but it just reloads the index page. Here is an example http://www.reefcentral.com/vbulletin/archive/index.php

Razzie
07 Oct 2001, 01:00
Originally posted by CoolaShacka
Yes.
my .htaccess is ErrorDocument 404 /home/csbvu/public_html/archive/index.php I am hosted by VO

The redirect should be an absolute web address not a unix path!

Replace with:
ErrorDocument 404 /archive/index.php

Razzie
07 Oct 2001, 01:03
Originally posted by JohnL
Is there any way to get this to work on a W2K server with IIS?

I did the redirect for the 404 error but it just reloads the index page. Here is an example http://www.reefcentral.com/vbulletin/archive/index.php

This is probably occuring because the PHP file isn't able to get the original URL that was requested.

Create a new index.php and add <? phpinfo(); ?> in it. Now try to cause a 404 and look at the enviromental variables that hold the requested URL. Use that enviromental variable inside the original index.php file.

JohnL
07 Oct 2001, 02:09
THANK YOU...THANK YOU...THANK YOU :)

OK all you IIS guys. Thanks to Razzie...

Do a search for REQUEST_URI and replace it with QUERY_STRING and you are good to go!

Originally posted by Razzie
This is probably occuring because the PHP file isn't able to get the original URL that was requested.

Thanks again Razzie :)

JohnL
07 Oct 2001, 03:33
Originally posted by Razzie
Small Bug with displaying private forums:

Excellent catch! Razzie, you da man, grandma!

CoolaShacka
07 Oct 2001, 22:55
Originally posted by Razzie


The redirect should be an absolute web address not a unix path!

Replace with:
ErrorDocument 404 /archive/index.php
Thank you Man. :D
All works fine now. :)

Pk
21 Oct 2001, 14:49
Originally posted by CoolaShacka

Thank you Man. :D
All works fine now. :)


My god, so many pages, anyway is this hack done now?
If so could someone just post it on the hack release forum? :]

OmniSlash31
25 Oct 2001, 18:15
I just started my forum, but when i go to for exaple: vbb/archive/2 it goes to my error404 page:confused:

JohnL
25 Oct 2001, 18:24
Originally posted by OmniSlash31
I just started my forum, but when i go to for exaple: vbb/archive/2 it goes to my error404 page:confused:

You need to redirect 404 errors to the /archive/index.php file as has been discussed in this thread.

OmniSlash31
25 Oct 2001, 19:03
Ok, i have a folder called "vbb" on my server for the forum.
So, .htaccess should be
ErrorDocument 404 /vbb/archive/index.php, right?

CoolaShacka
25 Oct 2001, 19:08
No.
.htaccess should be
ErrorDocument 404 /archive/index.php, but you have to put the .htaccess in your vbb folder

Chen
25 Oct 2001, 19:09
OmniSlash31 please enter your license information in your profile. Thanks.

Chamber
01 Nov 2001, 11:59
Excellent hack idea!! Wanna install it - is it good to go on V2.2.0 ?

MrLister
01 Nov 2001, 14:31
it sure is... it would work on any..

JJR512
01 Nov 2001, 23:42
Does anybody know how to get this to work with sub-domain addresses (like forums.jjr512.com, instead of www.jjr512.com/forums)?

eva2000
02 Nov 2001, 00:08
Originally posted by Chamber
Excellent hack idea!! Wanna install it - is it good to go on V2.2.0 ? yup working on my 2.2.0 forums

as to subdomain not sure i think it's the same as domain just need to put the modrewrite stuff in the virtualhost entry for the subdomain?

Logtenberg
17 Nov 2001, 01:13
Has anyone had their archive site crawled by google yet?

mjames
29 Dec 2001, 17:44
Got it working!

djr
31 Dec 2001, 02:11
Although I installed this a few days ago (which was a breeze actually!), I'm a little confused now where to put the .htaccess file.
In the instructions Overgrow put it like this:


1) Download the attached Zip.
2) Open class.mysql.php and put your database login info at the top.
3) Create a new directory called archive under your forum, like /forum/archive
4) Open the included .htaccess and change the Error 404 to your new archive path.
5) Open index.php and change the self-explanatory variables at the top of the file.
6) Upload all 3 files to your archive directory.
7) Submit /forum/archive/index.php to search engines and watch em crawl


However, in this thread I see several mentions of putting the .htaccess in the forums path (/forums instead of /forums/archive).

Where do I put the .htaccess? In /forums or in /forums/archive?

- djr

FWC
31 Dec 2001, 03:58
Originally posted by djr
Where do I put the .htaccess? In /forums or in /forums/archive?

- djr Put Overgrow's .htaccess file in /forums/archive.

djr
31 Dec 2001, 12:47
FWC,

Thanks, that clarifies a bit :-) Another question though: do I need some setting (.htaccess/meta) elsewere to prevent Google/others from spidering the normal forums, so it spiders the /archive directory instead?

- djr

LanciaStratos
06 Jan 2002, 04:49
Could someone explain what is so bad about Google considering a page to be "cloaked"? I'd really like to set that $refresh option to "1", but I'm worried about what kind of effect it could have on my forums being indexed. :(

Never mind, I found my answer here... http://www.google.com/webmasters/faq.html

eva2000
06 Jan 2002, 16:47
had to remove this hack as it allowed people to snoop in to private forums via entering forum id numbers which were not displayed (invisible) on the page listings :(

djr
06 Jan 2002, 16:55
George,

This worked for me:
http://www.vbulletin.org/forum/showthread.php?postid=181959#post181959

- djr

eternal
13 Jan 2002, 21:30
does this work with the newest version of vbulletin?

djr
13 Jan 2002, 21:34
Installed here (and working: pdaclub (http://pdaclub.nl/forums/archive), dutch language) on vBulletin 2.2.1, so the answer is YES :D

eternal
13 Jan 2002, 22:57
I installed the script, it seems to work, however, i get the error

The requested URL /forums/archive/1 was not found on this server.

Additionally, a 404 Not Found error was encountered while trying to use an ErrorDocument to handle the request.

when i click on a particular link.. even though i see all the catagories for my website

M.h.f
13 Jan 2002, 23:12
thx

eternal
15 Jan 2002, 07:10
please anyone?

qwaz
16 Feb 2002, 01:57
Same here, I've edited access.conf to allowoverride all too.

Still nothing.

fury
20 Feb 2002, 05:39
I DID IT! :D :D :D

Excellent hack! I added page navigation to it though, cause there are some huge long threads on my site and it could kill the server to select all the posts in them.

Brian
21 Feb 2002, 01:05
Hello,

I am writing to see if some one would be interested in taking this hack one step further to allow for the creation of html pages for search engines to index. This would prevent problems of some search engines overloading a server if trying to index they dynamic content to fast.

What I propose, is for a script to make pages identical to what we have with this script, however it actually makes the html page like what is dynamic and puts them into a similar folder structure.

The script would need to be able to cycle through all of the posts initially so it doesn’t cause problems doing them all at once, and it should then be run via a cron job or manually every so often to archive new posts, or re archive edited posts since the last run.

I feel this would meet a lot of the needs of sites on shared servers, and if need be would be willing to pay for this to be done.

Let me know if anyone is interested.

-Brian

JJR512
21 Feb 2002, 02:15
Originally posted by eva2000
had to remove this hack as it allowed people to snoop in to private forums via entering forum id numbers which were not displayed (invisible) on the page listings :(
This doesn't happen for me. I just logged out of my board and tried putting in some private forum ID numbers. In all cases, even though the URL in the address bar showed the private forumid, the page actually went to forumid 1. Regardless of where I started, if I put in a private forumid number, it went to the first forum.

buro9
22 Feb 2002, 08:06
Originally posted by Brian
Hello,

I am writing to see if some one would be interested in taking this hack one step further to allow for the creation of html pages for search engines to index. This would prevent problems of some search engines overloading a server if trying to index they dynamic content to fast.

What I propose, is for a script to make pages identical to what we have with this script, however it actually makes the html page like what is dynamic and puts them into a similar folder structure.

The script would need to be able to cycle through all of the posts initially so it doesn’t cause problems doing them all at once, and it should then be run via a cron job or manually every so often to archive new posts, or re archive edited posts since the last run.

I feel this would meet a lot of the needs of sites on shared servers, and if need be would be willing to pay for this to be done.

Let me know if anyone is interested.

-Brian

I already have what we've called a Cache Cannon on one of our sites.
All it does is whip through the database, and for all search query results for a particular query it will cannon hundreds of small files onto the docroot.

Now, we use this to pre-generate content on our site, thus massively reducing the database hits for the dynamic content (very important for us, we get over 100,000 unique users per hour on our top content sites).

Once a day it is fired and the site is made fresh. News is fired hourly or manually when needed.

In our application it's good for security too, since the database resides on a different machine and no access is needed by the webserver (the Cache Cannon resides on an interim machine and simply copies the files to the web server).

...anyhow, yesterday when I saw this thread I realised that the main flaw is that it is too slow in generating the content for the spiders. That the spiders would prefer static html so they can trawl faster, and that the pages were not optimised enough for a high ranking on the search engine. Also you probably get hit by several spiders a day (take a peek at your logs and requests for \robots.txt for an indication), and the work to pre-generate is probably less than the work to serve it all each time.

Thus I will probably be making another implementation of this hack 100% new, but based upon our existing Cache Cannon theory.

It will create a single html file for each post, and you could fire it for given date ranges (reducing server load) or forums, at given time intervals (manually, say weekly) or via a cron job.

I shall also include client side javascript in these files to redirect a user to the proper version of the post in the appropriate forum onload. This should be googlebot safe as I believe it ignores client side script, but will ensure that when a user comes from a search engine, they are simply bounced to the correct entry in the real forum.

eva2000, I shall endeavour to make sure that this does not generate files for private forums. This will be perfect for you since entering private forum id's would not be possible, since the files are static. Though it should be noted that as this will generate static files... should you later turn a public forum private then you would have to delete those files manually, hence including the $forumId in the proposed folder structure.

Proposed storage:

The folder structure...

$forumpath/archive/$forumId/$year/$month/$day

For the file names...

$postId.htm

I shall start this on Tuesday next week, and hope to have it finished by Saturday next week (I'd do it sooner, but it's my birthday and this isn't that important!).

The files will be standalone and I shall develop them with vb v2.2.2 though as I shall only be accessing user, post, thread and forum (I guess... I'll have to look at the schema) this should be backwards compatible to at least 2.x boards. Though I will only be supporting the latest version at any time.

If I run into trouble or need assistance with the schema I shall let you know.

Cheers

David K

http://www.buro9.com/forum/

Brian
27 Feb 2002, 02:03
I just wanted to touch base to see if you had yet worked on this.

-Brian

buro9
27 Feb 2002, 10:18
Started to put the basics in place last night whilst building another PC.

Got reasonably far with the function that will dump files on the docroot. Just subject to load testing for that.

Also built the query that will extract all posts for dumping... have been working on this to make sure it excludes private forums... need to install foxserv at home to test this on a test forum.

I think I shall have the back end fully over the evenings this week. The front end will be the thing that actually takes time, but I'm hoping that's just gonna be Saturday and not need more work. Problems stem from wanting to decrease server load by breaking the generation into managable amounts (monthly or by forum).

I'll let you know when I have something substantial, and then we can start a new thread here for discussion whilst it's improved.

I think it's realistic to say it should be ready as a Beta over this weekend, and that a Release version should follow next week once everyone is sure that it does what they want it to (though I'll not be including code to toast bread).

Cheers

David K

Brian
27 Feb 2002, 13:43
Wow cant wait!

rawnet
05 Mar 2002, 13:30
How did this go Buro? I'm looking for a solution like this which also works on Win2k (withou htaccess, etc). Did a search for Cache Cannon as well but couldn't find it?

buro9
05 Mar 2002, 15:02
I've e-mailed Brian offline about this, but in essence it's built.

What I have thus far is an adequate interface offering caching for:

All Forums
Specific Forums
Specific Forms + Sub Forums
Within the past x days.

The Cache Cannon then will fire for all applicable posts, and uses a template to render the display in the html files.

The only missing thing is the final parsing through all resultant folders and files, constructing the index.htm files that will tie it all together for the spiders... and I have plans on the best way to do this already.

Awaiting feedback from Brian, but if you wish I can send you an example of the current code tonight and you can offer your comments on how to progress it.

I do not want to release it until it works fully on the backend, I'm not bothered by cosmetic things at the moment (since that will be template driven and user adjustable), just that it all works a dream... if you wish to be a private beta tester and help me push it forward, then get in contact.

Cheers

David K

Brian
05 Mar 2002, 15:22
It all sounds great! Cant wait to test it out :)

If its ready to test, you can email me at Brian@FutureQuest.net

-Brian

Brian
09 Mar 2002, 01:53
I just wanted to follow up to see if you have a version available for us to download yet.

Thanks,
Brian

buro9
09 Mar 2002, 06:45
yeah, sorry!

what with the upgrade to 2.2.3b, i lost a few evenings whilst i upgraded and re-applied hacks for the four boards i run.

i'm busy this wekeend, but i've booked monday off work specifically to add the final touches (the indexer), i'll open a thread in beta's and place a link to it from here on moday afternoon.

so you can download and try it then ;)

cheers

david k

Brian
09 Mar 2002, 15:19
cool

buro9
11 Mar 2002, 19:39
OK, I have created a thread in beta's.

http://www.vbulletin.org/forum/showthread.php?s=&postid=228358#post228358

Having spent the day doing nothing that I wanted, but simply upgrading everything to 2.2.4 I am fuming at vb...
I hope to find time to finish this hack just as soon as vb can release a version that is stable!

limey
11 Mar 2002, 21:24
I feel like a jackass...this hack used to work for me until I recompiled apache...

Now the index page pulls up the forums, but the archive pages end up with 404 errors.

I have tried:

-changing the httpd.conf AllowOverride All
-adding a special directory addition to my domain.conf file..
-restarted apache a bunch of times.

.htaccess is setup as
ErrorDocument 404 /archive/index.php

I dont use srm.conf or access.conf. Everything is in httpd.conf.

I recompiled apache and php last night. Maybe this has something to do with it? I doubt it though. I did recompile php without --with-apache flag and I stil ldon't know if that matters.

thanks.

eva2000
12 Mar 2002, 23:50
Originally posted by buro9


eva2000, I shall endeavour to make sure that this does not generate files for private forums. This will be perfect for you since entering private forum id's would not be possible, since the files are static. Though it should be noted that as this will generate static files... should you later turn a public forum private then you would have to delete those files manually, hence including the $forumId in the proposed folder structure.

Proposed storage:

The folder structure...

$forumpath/archive/$forumId/$year/$month/$day

For the file names...

$postId.htm

I shall start this on Tuesday next week, and hope to have it finished by Saturday next week (I'd do it sooner, but it's my birthday and this isn't that important!).

The files will be standalone and I shall develop them with vb v2.2.2 though as I shall only be accessing user, post, thread and forum (I guess... I'll have to look at the schema) this should be backwards compatible to at least 2.x boards. Though I will only be supporting the latest version at any time.

If I run into trouble or need assistance with the schema I shall let you know.
i knew there's a reason i should use email notifications :o

how about having the ability to rerun a script to regenerate the static html files if you later change a public forum to private ? you could have a forumid setting in the script which you can edit to either

1. remove all static html files and recreate the static files based on new forumid settings

OR

2. remove only the static html files for the forumid which when private

would be nice to be able to have the option to set the path to where the static html files are to placed

buro9
13 Mar 2002, 06:32
You should really post on the thread in the beta's forum.
I hate to have hijacked this one, which is for someone else's release.

Anyhow, deleting by forumid, threadid, or time range is definately something that could be accomodated, since it would only really mean having to delete certain folders and their contents.

Though to be honest, this would be something built after the initial release.

Phase 1 will concentrate on generating the files.
Phase 2 can be the management of them, including deletions.
Phase 3 can be any advanced formatting and/or allowing multiple templates for cross-platform archives.

Cheers

David K

drazq
14 Mar 2002, 17:02
Hi,

I just installed the hack - and it works ...

however, there seems to be a bug in it .. Not all messages are shown. Eg. in a week with 10 messages, maybe 6-7 are shown in the archive ..

any clues to why I have this problem?

- draz Q.

buro9
14 Mar 2002, 20:29
Originally posted by drazq
Hi,

I just installed the hack - and it works ...

however, there seems to be a bug in it .. Not all messages are shown. Eg. in a week with 10 messages, maybe 6-7 are shown in the archive ..

any clues to why I have this problem?

- draz Q.

Not really, it's hard to determine the reason why not all of the posts were generated... the following spring to mind:

The 7 day date range is too specific, and you calculation of 10 in a week is based upon a more broad range.
Some posts may be in hidden forums, that only administrators and mods could see.

I would really need to have a dump of your database and re-create the problem here to be able to debug thoroughly.

I have tested on a forum of 49,767 posts, and simply right click the archive folder, and the file count matched perfectly. That said, I do not have hidden forums... but the queries are based upon guest access so these should be excluded from any caching.

For a proof of concept, extend the days younger range to several thousand, this should act as a catch-all and generate for all posts ever (it takes a while!, make a cup of tea)... this way you can verify against the not-logged in users post count on the forum home page.

Cheers

David K

drazq
14 Mar 2002, 21:15
Well,

Just now I posted a new post in one of my open forums, and it didn't show up in 2002 - March - Week 3 or Week 4. It should be there right ..

- draz Q.

pigsy
14 Mar 2002, 23:22
Those folks that have installed this hack, have you seen a significant increase in traffic coming from Google?

buro9
15 Mar 2002, 06:10
Originally posted by drazq
Well,

Just now I posted a new post in one of my open forums, and it didn't show up in 2002 - March - Week 3 or Week 4. It should be there right ..

- draz Q.

Without an indexer I'm unsure where you're looking for that information. The directory structure created in that version doesn't have week numbers!?

The indexer isn't yet in the hack, which is why it's status is only as a Beta.

At the moment it will produce the files on the docroot for each and every post... but there are no index.htm files tying them all together.

In response to pigsy, this means that the few people who have installed would not be seeing gains yet, as no spider would be able to crawl a non-indexed directory structure.

The indexer is work for me next week.
Latter part of next week is improvements to the templates to optimise them for search engines.

Again... unless you're are a developer, there's no point in you're having this script yet... when it's finished it will be in the normal Full Releases forum and then I shall support it fully. At the moment it is a development Beta, to find bugs on various platforms, and take suggestions for improvements prior to it's completion.

I have never finished a piece of code construction without a Beta review stage, and just because this is a hack it doesn't mean I'm going to lower my coding standards for the testing and release process.

When it's finished, then go ahead and install and criticise ;)

Cheers

David K

drazq
15 Mar 2002, 10:19
pigsy: I should think so. If you have a message board on a certain topic, and some users discussing that topic, then you will have a lot of messages - with a lot of keywords - all indexed in Goolge.

So .. people searching on the topic should be able to find one of your messages :)

.. *bump* .. anyone know what my problem is?

- draz Q.

drazq
15 Mar 2002, 10:25
Ups .. buro posted a second before me! :)

>Without an indexer I'm unsure where you're looking for that
>information. The directory structure created in that version
>doesn't have week numbers!?

I'm browsing with IE! :) .. we are talking about the same hack here? Overgrow's hack? ..

I know it doesn't create any .html files, it just forwards the 404 response to that single index.php and retrieves the messages form the db using info from the $REQUEST_URI variable ..

Anyhow, the messages should be there if there isn't some error in the way the messages are retrieved from the database ..

- draz Q.

buro9
15 Mar 2002, 13:09
lol, this is precisely why I didn't want my hack tagged onto the end of overgrow's!

the last few pages are discussing a new hack that produces static files... this hack is in beta and can be viewed here:
http://www.vbulletin.org/forum/showthread.php?s=&threadid=36000

the difference is that overgrow's relies on the database, whereas i've chosen to produce static files and rely on those.

if you have issues with overgrow's hack, you're in the right place. any issues with my hac... move over to the other thread.

cheers

David K

drazq
15 Mar 2002, 13:19
hehe .. :) ok .. well, overgrow, are you alive?? :)

- draz Q.

dlst
23 Mar 2002, 15:37
Hi Overgrow, thanks for the hack and all the hard work.

I'm having the same troulbe eva2000 reported a while back but the fix doesn't work for me, and I'm stumped. The URL to the archive is:

http://www.offshoreonly.com/forum/archive/index.php

I get "no posts" page no matter which week I choose. I did what you suggested below, and here is a sample query:

SELECT title,threadid,lastpost FROM thread WHERE lastpost > '1010469600' AND lastpost < '1010988000' AND forumid='3' ORDER BY dateline ASC

Looks fine to me, and indeed, when run in phpmyadmin it returns no errors, but oddly, it also returns no rows. I thought this might be a date problem but the date format on the boards is stock, and is the US format.

Everything else works, including when you enter in a manual thread id:

http://www.offshoreonly.com/forum/archive/3/2002/01/2 (no posts)

http://www.offshoreonly.com/forum/archive/3/2002/01/2/1143 (there's a thread)

I don't really know if this is a valid way of accessing them, or if it really helps to know, but it seems that everything is working but the query to show a list of thread titles...

Anyway, any help you could offer would be greatly appreciated.

-dlst

Originally posted by Overgrow
Does it tell you "No posts, please go back" ?

Does it give you a totally blank screen?

or does it at least show what forum you are in?

This change is just for eva to troubleshoot... FIND


echo spacer(2)."Dates: $date1 to $date2<br>&nbsp;<br>";

$query = "SELECT title,threadid,lastpost FROM thread
WHERE lastpost > '$ts1' AND lastpost < '$ts2'
AND forumid='$forumID' ORDER BY dateline ASC";


change to


echo spacer(2)."Dates: $date1 to $date2<br>&nbsp;<br>";

$query = "SELECT title,threadid,lastpost FROM thread
WHERE lastpost > '$ts1' AND lastpost < '$ts2'
AND forumid='$forumID' ORDER BY dateline ASC";

echo $query;


Then when it spits the query out, run it in phpmyadmin and see if it is a valid query.

veedee
05 Apr 2002, 20:24
Is this hack still recent?
Will it work on my version 2.2.4
Can I just use the files listen on the first post?

So many questions...

Learner29
02 May 2002, 02:19
and the question remains open .....

we have two hacks here.....

Overgrow's hack that I truly do not have the courage to install as so many users are complaining about the no-post and other- similar errors....

and now we have the new hack of buro9

Like everybody else, I am going to check buro's hack.

But, what about overgrow's ??? is the hack available for download the final one ???

MyMatchmakerCom
21 May 2002, 09:37
<< url removed >>

Get a 500 error and I tried just about every kind of path and still no cigar.

You should write a self install script for it please.

DarkReaper
22 May 2002, 21:15
Is there a way to get it to know which forums are private automatically, and not display those? I've got too big of a forum to manually maintain all the private forum ids.

Pilot
23 May 2002, 19:26
Absolutely, it would be awful (and very easy) to forget to add a new private forum to the exclude list and find your private posts indexed by search engines, you might even get sued for breach of privacy and you can never get rid of the things once in the search engines. Really a big risk.

MUCH safer to list the forums that you want indexed and have the code ignore any others. No risk of mistakes then.

tpearl5
27 May 2002, 15:29
I finially got everything indexed on google! YEY! The only problem I see is this:

http://www.google.com/search?num=100&hl=en&lr=&ie=UTF8&oe=UTF8&safe=off&q=+site%3Adegster.com+midibuddy+overgrow&btnG=Google+Search

See all the 'No posts, please go back' ? I'm afraid google indexed too many of these pages and not enough of the actual threads. Is there a way to turn off the pages without topics?

DarkReaper
28 May 2002, 22:19
Originally posted by DarkReaper
Is there a way to get it to know which forums are private automatically, and not display those? I've got too big of a forum to manually maintain all the private forum ids.

...

apfeifer
01 Jun 2002, 04:38
For those of you who were having posts not show up, you may have to change some code. The original has it skip all but one second of the last day of each of the 'weeks' to fix this, find:

$date1 = "$month/$fw/$year";
$date2 = "$month/$lw/$year";

$ts1 = strtotime("$date1");
$ts2 = strtotime("$date2");

echo spacer(2)."Dates: $date1 to $date2<br>&nbsp;<br>";

$query = "SELECT title,threadid,lastpost FROM thread WHERE lastpost > '$ts1' AND lastpost < '$ts2' AND forumid='$forumID' ORDER BY dateline ASC";

And replace it with:

$date1 = "$month/$fw/$year";
$date2 = "$month/$lw/$year";

$ts1 = strtotime("$date1");
$ts2 = strtotime("$date2");
$ts3 = $ts2 + 86399;

echo spacer(2)."Dates: $date1 to $date2<br>&nbsp;<br>";

$query = "SELECT title,threadid,lastpost FROM thread WHERE lastpost > '$ts1' AND lastpost < '$ts3' AND forumid='$forumID' ORDER BY dateline ASC";


This may not be the problem everyone is having, but it is the problem I was having. Hope it helps someone!

apfeifer
01 Jun 2002, 04:44
Another thing, if you don't want it to display categories that can't contain threads find:

$query = "SELECT title,forumid FROM forum$whereclause ORDER BY forumid ASC";

And change it to:

$query = "SELECT title,forumid FROM forum$whereclause AND cancontainthreads='1' ORDER BY forumid ASC";


Just another little nit-picky thing I wanted to add.:)

Thomas P
01 Jun 2002, 16:17
Too bad, doesn't work for me, it simply reloads the page again and again... :(

my code:

<?

/*

vbSpiderFriend v1.1b by ~shabang~


*/

$privateForums="|8|14|9|15|16|12|19|20|30|35|36|"; // Hidden forumids, enclosed by | pipes
$firstPost="11/05/2000"; // MM/DD/YYYY of your forum's first post
$spacer="&nbsp;&nbsp;"; // The characters or spaces to use as one indent
$forumURL=""; // Base URL of your forum
$refresh=0; // Change this to 1 if you want the archive to automatically
// forward the user if they come in from a search engine.
// This option is initially turned off because Google considers
// this to be 'cloaking.'

$homeURL="http://www.mcseboard.de"; // The link URL for the top of the page
$homeLink="Zurück zu MCSEboard.de - Deutsches MCSE Forum zu Windows 2000 & XP"; // The link text for the top of the page

$keywords="(...some stuff...)"; // SET META INFORMATION HERE
$description="(...some stuff...)"; // The script will add to these fields
$pagetitle="MCSE Forum zu Windows 2000 und Windows XP - MCSEboard.de"; // with the info from the thread

include("class.mysql.php");

// NOTHING MORE TO EDIT BELOW /////////////////////////////////////////////////////////////


$baseURL=$forumURL."/archiv";
$dateSplit = split("/",$firstPost);
$firstMonth = preg_replace("/^0/","",$dateSplit[0]);
$firstYear = $dateSplit[2];
$currentYear=date("Y",time());
$currentMonth=preg_replace("/^0/","",date("m",time()));
$fullHomeLink= "<center><a href=\"$homeURL\"><b>$homeLink</b></a><br>&nbsp;<br>";

(...)


Any ideas?

http://www.mcseboard.de/archiv/

Thanks,
-Tom

diettalk
01 Jun 2002, 18:58
Try this line...

$forumURL=""; // Base URL of your forum

should be something like:

$forumURL="/forums"; // Base URL of your forum


Originally posted by Thomas P
Too bad, doesn't work for me, it simply reloads the page again and again... :(

my code:

<?

/*

vbSpiderFriend v1.1b by ~shabang~


*/

$privateForums="|8|14|9|15|16|12|19|20|30|35|36|"; // Hidden forumids, enclosed by | pipes
$firstPost="11/05/2000"; // MM/DD/YYYY of your forum's first post
$spacer="&nbsp;&nbsp;"; // The characters or spaces to use as one indent
$forumURL=""; // Base URL of your forum
$refresh=0; // Change this to 1 if you want the archive to automatically
// forward the user if they come in from a search engine.
// This option is initially turned off because Google considers
// this to be 'cloaking.'

$homeURL="http://www.mcseboard.de"; // The link URL for the top of the page
$homeLink="Zurück zu MCSEboard.de - Deutsches MCSE Forum zu Windows 2000 & XP"; // The link text for the top of the page

$keywords="(...some stuff...)"; // SET META INFORMATION HERE
$description="(...some stuff...)"; // The script will add to these fields
$pagetitle="MCSE Forum zu Windows 2000 und Windows XP - MCSEboard.de"; // with the info from the thread

include("class.mysql.php");

// NOTHING MORE TO EDIT BELOW /////////////////////////////////////////////////////////////


$baseURL=$forumURL."/archiv";
$dateSplit = split("/",$firstPost);
$firstMonth = preg_replace("/^0/","",$dateSplit[0]);
$firstYear = $dateSplit[2];
$currentYear=date("Y",time());
$currentMonth=preg_replace("/^0/","",date("m",time()));
$fullHomeLink= "<center><a href=\"$homeURL\"><b>$homeLink</b></a><br>&nbsp;<br>";

(...)


Any ideas?

http://www.mcseboard.de/archiv/

Thanks,
-Tom

Thomas P
01 Jun 2002, 21:49
Originally posted by diettalk
Try this line...

$forumURL=""; // Base URL of your forum

should be something like:

$forumURL="/forums"; // Base URL of your forum




Thanks for helping, I do have vB installed on my Root, so I tried

$forumURL="http://www.mcseboard.de"; // Base URL of your forum

and

$forumURL="/"; // Base URL of your forum

Both had the same effect as the empty one, strange... :(

Thomas P
02 Jun 2002, 11:48
It works now :)


Yeah, cool and now it says: no posts go back :ermm:


Update: Ok, it works now sometimes correct(?)
Think I still have to tweak here and there...

Till
09 Jun 2002, 19:45
Hi,
I have an odd error. Whenever I click on a link from the index page (where it lists all the forums), I get a 404. (I took out the ErrorDocument, cause it would just reroute me to the archive's index and I was wondering if it was your script or a 404. Anyway ...)

I looked at your script and saw that you are reading everything from $REQUEST_URI, instead of using $PATH_INFO. As I would have guessed.

I would parse it like that:


list($dummy,$forumID,$year,$month,$week,$threadID)=explode('/',get_env('PATH_INFO');


$dummy being "archive", the rest is self explainatory. (Equal to your variables.)

My server runs Apache 1.3.x with PHP4.x.x (installed as an Apache Module). No clue really how to fix it, since I am a bit off with your script (reading someone else' code can be a tough on indeed.). :-/

Any help is appreciated. :)

Thanks!

Till

P.S.

A "bug", that I might add.

I have categories and forums. So for example, "Site related" holds "News" and "Chat", yet they all get output on the same page although "Site related" is not allowed to have any threads, let alone posts.

If you have a look at the vbulletin database, table "forum", the is a field called "cancontainthreads", which' value is 0 if it's a category and 1, if it's an actual forum. (Kind of hard to explain, but I am sure you get the drift.) So my suggestion would be to select those first and kind of group by.

Maybe not really a bug, more a feature request. ;)

mvigod
13 Jul 2002, 19:10
Is the 404 page it's sending out going to be archived by Google? When google spiders it they will get a page of text (the error page delivered by this script) but with a 404 header which is "not found"....because of this will they bother to index it? Shouldn't this be a rewrite rule to redirect google transparently rather than a 404 error?

mvigod
18 Jul 2002, 15:51
Has anyone ever witnessed Google going more than one level deep? Since they get the 404 respose code with this hack they don't index those pages or go any deeper. They are not in the habit of indexing "Not Found 404 error" pages or following the links in them. They will follow the first set of links from the index.php but each of those really doesn't exist so the 404 error is the end of the line even though the pages have content.

I think this has to be rewritten with mod rewrite to rewrite the requests for all pages back to index.php so a 200 OK response code is given and it will be archived.

With all these posts didn't anybody realize this?

James Cridland
21 Jul 2002, 14:32
What might be interesting is to check for Google - or "search" - in the referrer, and don't give a <META REFRESH, but properly redirect them to the proper version of the page automatically.

That means that Google won't be given the changed location when it spiders, but any normal user, following a Google search, will get the proper redirect.

You can do this by...

1. Changing $refresh=0 to $refresh=1

2. Find $forwarding=1

Replace:
$forward= "<script language=\"javascript\">document.location.href='$forumURL/showthread.php?threadid=$threadID';";
$forward.= "document.write('<font face=verdana,arial size=2><b><center>Please wait while the new version is loading...</center></b></font><br>&nbsp;<br>');</script>";
$metas="<META HTTP-EQUIV=\"REFRESH\" CONTENT=\"0;$forumURL/showthread.php?threadid=$threadID\"></head><body></body></html>";
$forwarding=1;

with
header ("Location: $forumURL/showthread.php?threadid=$threadID");
exit;

3. Just above that chunk of text, find
if ((!stristr(getenv(HTTP_REFERER),$homeURL)) or (strlen(getenv(HTTP_REFERER)) < 1)) {with
if ((!stristr(getenv(HTTP_REFERER),"text")) or (strlen(getenv(HTTP_REFERER)) < 1)) {where text is a bit from your URL. I.e. if you're hosted on www.mydomain.com then put "mydomain" in there.

What does the panel think?

veedee
21 Jul 2002, 15:57
I hate to keep asking this in all the hacks but..

I'm running 2.2.6 will it work for this?

Cheers,

veedee

James Cridland
21 Jul 2002, 19:47
Yes it will.

Incidentally, I've used this hack as a basis to create an Avantgo / iSilo PDA version of my forums... check them out at http://forums-lite.mediauk.com/

I've rather heavily hacked this out, though.

(Amusingly, this is not search-engine friendly...!!)

EchoHype.com
25 Jul 2002, 09:39
Great hack mate!

Learner29
27 Jul 2002, 19:18
oh. I am sorry to bother but I really tried to find by myself but could not !!

I installed the hack and the index.php page of the /vb/archive directory is displaying nicely.

Yet all the links on this index.php page point toward non-existent folders!!!

example

http://www.mydomain.com/vb/archive/29
and
http://www.mydomain.com/vb/archive/13

but checking that archive folder by FTP, there are NO Folders called 13 or 29 ....

I am so frustrated as I DID install this hack in the past and it worked nicely then.....

Whoever would help, I would be more than grateful and thankful.

Learner29
30 Jul 2002, 20:22
I can't believe nobody bothered to help me in almost 3 days now....

Anybody ??? Any ideas ???

tiger
03 Aug 2002, 08:35
I have the same problem

Learner29
03 Aug 2002, 13:56
will any kind person step out and help a bit.. ??

James Cridland
04 Aug 2002, 14:33
All the links are supposed to link to non-existent folders, that's how this hack works.

Read carefully the bit about .htaccess

Learner29
04 Aug 2002, 16:12
James,

first thank you for your reply. I really appreciate that you are trying to help.


second, this hack supposed to generate an index.html file, something like a site map with links towards every week of every month of every year.... see this example http://www.overgrow.com/edge/archive/2/index.html

in my case, the directory "2" and all other same level directories do NOT exist...

Please help.
:rolleyes: :rolleyes: :rolleyes:

Learner29
04 Aug 2002, 16:14
James,

At the end of the day, the script or the hack is supposed to generate TEXT ONLY versions of your forums and your threads.

see this example and look well at the subfolders of the "archive" folder

http://www.overgrow.com/edge/archive/2/1999/08/1/2274



N.B. : Overgrow has put a "adult only" mark for his site.

James Cridland
04 Aug 2002, 16:50
Again: the pages are not supposed to exist.

The .htaccess thing catches the "this page doesn't exist" error, and runs this script instead. If the .htaccess thing doesn't work on your server, then you need to ask your ISP nicely for a custom 404 error page.

Learner29
04 Aug 2002, 20:36
I thank you very much. I will check and tell you if it works.

Learner29
04 Aug 2002, 23:13
Originally posted by James Cridland
Again: the pages are not supposed to exist.

The .htaccess thing catches the "this page doesn't exist" error, and runs this script instead. If the .htaccess thing doesn't work on your server, then you need to ask your ISP nicely for a custom 404 error page.

I have placed a .htaccess on the root, and another .htaccess in the public_html folder, and another .htacess in its subfolder "vb".

Should I make them all the same .htaccess file with that custom 404 line

ErrorDocument 404 http://www.mydomain.com/vb/archive/index.php

or should I simply delete those .htaccess files in the subfolders

I have SSH / Telnet access to my server, and can delete eventual .htaccess files.

Learner29
04 Aug 2002, 23:27
Ok, I have asked and got an answer about .htaccess files.

finally, I deleted ALL of my .htaccess files, except for the one on the root.

that one, holds this ErrorDocument 404 http://www.mydomain.com/vb/archive/index.php correctly.

it should work then now....

I tried, it does not

FWC
05 Aug 2002, 01:39
You need to put an .htaccess file like this in the /forums/archive folder:

ErrorDocument 404 /forums/archive/index.php

Change forums to the correct folder name for you board directory. This is all that's required along with the index.php and the class.mysql.php files.

mvigod
05 Aug 2002, 01:56
People...I guess you missed my earlier post in this thread...the 404 not found error WILL NOT WORK! Yes you will get the content page but the engines receive a 404 header thus thinking the page they are getting is an error page and thus is NOT indexed nor are links followed within it. Don't waste your time on this with the hack as is...

Learner29
05 Aug 2002, 02:16
People...I guess you missed my earlier post in this thread...the 404
not found error WILL NOT WORK! Yes you will get the content page but the
engines receive a 404 header thus thinking the page they are getting is an
error page and thus is NOT indexed nor are links followed within it. Don't
waste your time on this with the hack as is...

my friend this hack is one of the best hacks about search engine
friendliness. I have searched well on vbulletin.org, and did not find any
serious as competitive alternative.

Plus, this hack worked VERY fine for sooooo many people, why not us then ???
I would highly appreciate any help.....