PDA

View Full Version : How did you solve searching in large forums?


99Percent
21 Jun 2006, 17:23
We are experiencing server performance when a user searches through our fairly large forum. The fulltext search aparently locks thread and post tables causing Apache processes to wait until the search is done. This in turn causes a surge of new Apache connections to open. If the search takes too long everyone experiences a downtime of the forum while the search is done.

What is the solution to this? How did you solve it?

The only thing I can think of right now is to install another instance of mysql on this same server and configure it as slave with replication. This instance would have its own database files and a different port.

Then configure vBulletin to use the slaveserver for the search script.

However I notice that vBulletin (in 3.5.3, function db->connect) checks to see if the master and slave have different names. This will not work if both of them are set to localhost even if they are using different ports. So at least a hack will have to be done here.

Joe
21 Jun 2006, 23:14
I would love to hear some solutions as well. :)

silvrhand
22 Jun 2006, 01:33
Put an alias in your hosts file and connect to that instead, it'll get past the localhost issue.

We are experiencing server performance when a user searches through our fairly large forum. The fulltext search aparently locks thread and post tables causing Apache processes to wait until the search is done. This in turn causes a surge of new Apache connections to open. If the search takes too long everyone experiences a downtime of the forum while the search is done.

What is the solution to this? How did you solve it?

The only thing I can think of right now is to install another instance of mysql on this same server and configure it as slave with replication. This instance would have its own database files and a different port.

Then configure vBulletin to use the slaveserver for the search script.

However I notice that vBulletin (in 3.5.3, function db->connect) checks to see if the master and slave have different names. This will not work if both of them are set to localhost even if they are using different ports. So at least a hack will have to be done here.

Paul M
22 Jun 2006, 01:43
How big is your forum, because we don't have any issues with searches - also which search method do you use ? Do you allow guests to search ?

dbembibre
22 Jun 2006, 13:15
If you dont have load problems with vBulletin search, dont touch.
I have a combination of fulltext search engine without boolean search and google domain web search.
My server a Dual Dual Xeon 2.8 Ghz 2MB cache 2 Gig of Ram and Ultra SCSI with 1.500.000 of posts and 700 - 800 constants users online has a max load of 1.5 and a media of 0.60 - 0.80

InfidelMatt
22 Jun 2006, 16:01
How big is your forum, because we don't have any issues with searches - also which search method do you use ? Do you allow guests to search ?

As 99Percent hasn't responded yet, I will (I help manage the same server).

Our forum has 3.5 million posts with 3000 or so active members, 300-500 constants. We use fulltext searching. We don't allow guests to search. Our post table is in the neighborhood of 3GB.



Here is our problem: Our load averages are just fine, but when someone runs an advanced search, it locks the entire post table for a bit and apache soon hits the max connections limit as a bunch of requests are waiting due to the locked table.

This graph explains better:

http://img20.imageshack.us/img20/6258/19th4ur.gif

As you can see, everything is just fine until someone runs a search requiring the entire post table to be locked (Blue: apache connections).

A big portion of the problem is our hardware. We are unfortunately on IDE non-raided drives so things don't move as fast as they should.

Until we resolve the hardware issues, having a slave DB on the same server will allow for searches locking the post table (now of the slave DB) while not slowing down or outright stopping those accessing the master post table nor overwhelming Apache.

We'll try the alias method recommended above. Thanks.

Matt

BoardTracker
22 Jun 2006, 16:09
Search is always going to be a problem once your board gets big but of course it is an essential service. You may want to have a look at our search mod which offloads ALL the processing (indexing and search) to our servers while the results are still fully integrated in your board. Its currently being used on some of the biggest boards out there such as boards.ie (http://boards.ie)

There could also be other issues affecting server performance such as db configs and we'd be happy to help out in any way we can.

99Percent
22 Jun 2006, 16:22
Put an alias in your hosts file and connect to that instead, it'll get past the localhost issue.
Good idea, I hadn't thought about that.

silvrhand
22 Jun 2006, 18:21
switch to fulltext search, I did last night on 2.2 million post forum and it took 22 minutes to add the first index, then about 2 mins on the second one. So be prepared for a wait.

99Percent
23 Jun 2006, 05:16
switch to fulltext search, I did last night on 2.2 million post forum and it took 22 minutes to add the first index, then about 2 mins on the second one. So be prepared for a wait.
We already have fulltext search.

dbembibre
23 Jun 2006, 08:42
We already have fulltext search.

Disbale boolean search to all users and allow to admin and mods.
Combine your fulltext search with google search (i attach a image to you)

And a link to implement http://www.vbulletin.org/forum/showthread.php?t=109906

99Percent
23 Jun 2006, 23:27
I have a question regarding table locking during searches. Does the search actually cause a table lock? If so, is it done programatically by vBulletin, or is built in into mysql itself?

Why would it need to lock ables if no modifications are actually being done to the them?

Paul M
23 Jun 2006, 23:33
mySQL locks the table(s) with a read lock, obviously because it's reading them. That will prevent any write locks until the query has finished, and as I recall - depending on your settings - those queued write locks will then cause further read lock requests to queue, thus a query queue will build up.

99Percent
24 Jun 2006, 17:26
mySQL locks the table(s) with a read lock, obviously because it's reading them. That will prevent any write locks until the query has finished, and as I recall - depending on your settings - those queued write locks will then cause further read lock requests to queue, thus a query queue will build up.
But if a read lock is issued to a table, that doesn't prevent other reads from happening, only writes, no? So I don't understand why other people cannot still browse the forum while a user is doing a search for example. Only people who are making a post would have to wait until the read lock is cleared.

What we notice happens sporadically in our forum is that a surge of new Apache connections occurs. Here are several Apache Server Status just seconds apart which I managed to capture in real time just before the server crashed:
Current Time: Wednesday, 21-Jun-2006 16:49:28 CDT
Restart Time: Wednesday, 21-Jun-2006 11:57:53 CDT
Parent Server Generation: 0
Server uptime: 4 hours 51 minutes 34 seconds
Total accesses: 197069 - Total Traffic: 911.2 MB
CPU Usage: u4947.26 s378.21 cu.8 cs0 - 30.4% CPU load
11.3 requests/sec - 53.3 kB/second - 4848 B/request
115 requests currently being processed, 14 idle workers

WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWW_WWWWWWWWWWWWWWW_WWWWWWRWWWW__WWWWW___W_______W
.................W..............................................
................................................................
................................................................
..............................



Current Time: Wednesday, 21-Jun-2006 16:49:54 CDT
Restart Time: Wednesday, 21-Jun-2006 11:57:53 CDT
Parent Server Generation: 0
Server uptime: 4 hours 52 minutes 1 second
Total accesses: 197189 - Total Traffic: 911.5 MB
CPU Usage: u4790.07 s365.54 cu.78 cs0 - 29.4% CPU load
11.3 requests/sec - 53.3 kB/second - 4847 B/request
139 requests currently being processed, 16 idle workers

WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW_WWWWW_WWWW
__WWW__W_WWWWWW_WW______W__.....................................
................................................................
................................................................
..............................



Current Time: Wednesday, 21-Jun-2006 16:50:20 CDT
Restart Time: Wednesday, 21-Jun-2006 11:57:53 CDT
Parent Server Generation: 0
Server uptime: 4 hours 52 minutes 26 seconds
Total accesses: 197264 - Total Traffic: 911.8 MB
CPU Usage: u4748.97 s361.83 cu.77 cs0 - 29.1% CPU load
11.2 requests/sec - 53.2 kB/second - 4846 B/request
170 requests currently being processed, 15 idle workers

WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
_WWWWWWWWWWWWWWWWWWWWWWWWWWWWWW_WWWWWW_WW___W__W_WW______.......
................................................................
................................................................
..............................



Current Time: Wednesday, 21-Jun-2006 16:50:47 CDT
Restart Time: Wednesday, 21-Jun-2006 11:57:53 CDT
Parent Server Generation: 0
Server uptime: 4 hours 52 minutes 54 seconds
Total accesses: 197299 - Total Traffic: 912.2 MB
CPU Usage: u4749.94 s361.89 cu.77 cs0 - 29.1% CPU load
11.2 requests/sec - 53.2 kB/second - 4848 B/request
199 requests currently being processed, 11 idle workers

WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW_WWWWWWWWWWWWWWWWWWWWWW
WWWWWW_W_W________..............................................
................................................................
..............................


Current Time: Wednesday, 21-Jun-2006 16:51:39 CDT
Restart Time: Wednesday, 21-Jun-2006 11:57:53 CDT
Parent Server Generation: 0
Server uptime: 4 hours 53 minutes 45 seconds
Total accesses: 197505 - Total Traffic: 913.5 MB
CPU Usage: u4754.06 s362.19 cu.77 cs0 - 29% CPU load
11.2 requests/sec - 53.1 kB/second - 4850 B/request
262 requests currently being processed, 14 idle workers

WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW_WWWWWWWWWWWWWWWCWWWWWWWWWWWWWWWW
WW____WW_W__WW______............................................
..............................



Current Time: Wednesday, 21-Jun-2006 16:52:10 CDT
Restart Time: Wednesday, 21-Jun-2006 11:57:53 CDT
Parent Server Generation: 0
Server uptime: 4 hours 54 minutes 17 seconds
Total accesses: 197650 - Total Traffic: 914.0 MB
CPU Usage: u4756.07 s362.31 cu.77 cs0 - 29% CPU load
11.2 requests/sec - 53.0 kB/second - 4849 B/request
327 requests currently being processed, 10 idle workers

WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWCWWW__________.............



Current Time: Wednesday, 21-Jun-2006 16:56:06 CDT
Restart Time: Wednesday, 21-Jun-2006 11:57:53 CDT
Parent Server Generation: 0
Server uptime: 4 hours 58 minutes 12 seconds
Total accesses: 197721 - Total Traffic: 915.4 MB
CPU Usage: u4689.97 s356.97 cu.77 cs0 - 28.2% CPU load
11.1 requests/sec - 52.4 kB/second - 4854 B/request
350 requests currently being processed, 0 idle workers

WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWWW
WWWWWWWWWWWWWWWWWWCWWWWWWWWWWW

And then the server status request through http also fails because all the Apache slots have been filled and are busy. At this point, only a restart of Apache service restores the server to normal.

During this time, I can access other non vBulletin pages of the server, while actual vBulletin pages remain hanging, so this indicates to me that its either a PHP or a mySQL problem which is causing the Apache connections to wait and surge. We can make this happening to a degree with a search but sometimes the connections surge to some and then it falls back to normal after a couple of minutes. So we suspect that with a specific type of search is how the server ultimately crashes.

Marco van Herwaarden
24 Jun 2006, 20:00
Have a look here (and many other pages): http://dev.mysql.com/doc/refman/4.1/en/internal-locking.html

Simply put:
MyIsam uses table level locking (ie. it doesn't lock a single row, but the entire table).
There are 2 lock wait queues: WriteLock and ReadLock queue.
Many simultaneous Read-lock can be placed on a single table at once.
Only 1 write-lock can be active on a table.
Writes get priority over reads.

Scenario A with searching and nobody is trying to write to the table:
The search is a longer running read, that places a read-lock on the table.
Others also want to read from the same table. This is not a problem since you can have multilple read-locks on the same table.

Scenario B with searching, 1 want to write to the table, and many others want also to read from teh table:
The search is a longer running read, that places a read-lock on the table.
A write request is made. A write-lock can not be obtained, since there is already a read-lock in place. So write is placed in the write-lock queue and waits for the previous read to finish.
Others also want to read from the same table. Since Writes go before reads, these new reads are placed in the read-lock queue, and will not be server until the write-lock queue is empty.

If you are unlucky with scenario B, new writes will be queued, while we are still waiting for the search to finish. Meanwhile the read-lock queue is building up, etc....

In some situations it might be advisable to change some tables from MyIsam (table-level locking, no transactions) to InnoDB (row-level locking, with transactions). Using this kind of tuning, requires probably an experienced MySQL-administrator.

PS A lot more documentation on tabletypes and locking can be found on this at the mysql.com website.

BoardTracker
24 Jun 2006, 22:18
Marco van Herwaarden's review is correct in principal.
There are some things to take into account when describing MySQL's locking mechanism, including DELAYED INSERT, LOW/HIGH PRIORITY and row level locking which effect how locks and tables interact

However..
In case of FTS search, this doesn't really help. When using MyISAM table which are the foundation of MySQL's FTS built in index, locks are always Table-level locks.
Further more, when causing many reads to be queued, the load on the DB server, memory usage and overall caused slowness, can greatly drag the server performance down causing it great difficulties in getting rid of the heavy load (creating a vicious circle). This is ESPECIALLY true if you define the memory parameters incorrectly, causing Swap to be heavily used, which may create a situation where the only solution is restarting the web server, DB server or even both in some cases.

kmike
25 Jun 2006, 10:20
Basically, once you hit the magic number of 3 million posts, the MySQL fulltext search times start to increase exponentially. And it'll only get worse with every new post added. I guess every board has to go through this once it reaches 3-4 million posts.

No amount of server tweaking is going to help here. Converting to InnoDB won't help as InnoDB doesn't have a fulltext search capability. Your only choice is a slave search server, preferably dedicated, though you can try the approach from the first post, it may work if you have a sufficient hardware in your db server. I'll be interesting to know if it worked for you.

99Percent
25 Jun 2006, 22:39
I think vBulletin can solve this issue if it can programmatically create a mirror post and thread tables for searches, with delayed writes to these tables. Time to hack??? <grin>

Troy Roberts
05 Aug 2006, 04:09
We archive all threads that haven't been replied to for x days to a read only database. When a user does a search they can leave the default, which is the current table, and/or they can choose archives. If they choose archives then they must choose a category for the search and cannot search the entire board at one time.

This is on top of full text search, etc.

Joe
17 Aug 2006, 00:19
We archive all threads that haven't been replied to for x days to a read only database. When a user does a search they can leave the default, which is the current table, and/or they can choose archives. If they choose archives then they must choose a category for the search and cannot search the entire board at one time.

This is on top of full text search, etc.

Can you tell me more about this setup? Sounds ideal...

Troy Roberts
17 Aug 2006, 00:30
Unfortunately I paid someone else to set it up so I don't have all the detailed tech info. It was a feature that I had back when I used zeroforum and wanted in vb. It really should be built in to vbulletin.

You can see it in action at http://forums.corvetteforum.com

Joe
17 Aug 2006, 02:38
Thanks Troy.

Erwin
29 Aug 2006, 08:02
The way I solve this is to offload search to a slave database that replicates the relevant tables. 26 million posts.

Troy Roberts
29 Aug 2006, 15:54
We used to use a dedicated search machine. That can work very well also.

One benefit we found with the archived data method is that if we run into a problem on the live data it's a lot easier to run utilities to fix it. A smaller database is easier to fix when things go wrong.

The archived data is not as likely to have problems since it is read only. Of course you do have other drawbacks like not being able to edit/delete a thread through normal methods once it is archived.

lazytown
01 Sep 2006, 03:53
The way I solve this is to offload search to a slave database that replicates the relevant tables. 26 million posts.

Does this require a separate server or is there still a benefit to doing this on the same server that houses the forum? I would imagine it would free up table locks, etc even if it was on the same server.

-vissa

Erwin
04 Sep 2006, 00:08
Different server. You could put it on the same server to free up table locks, but make sure you have enough CPUs/RAM to handle it. I prefer to offload it to a different box.

Dontom
04 Sep 2006, 21:29
Hi Erwin,
thanks for sharing. we already have a slave db server and would like / need to offload search there due to the locking issues (3M posts)

How did you tellvb to use the slave for searching? did you modify config.php or search.php? Any help is appreciated,
Thomas

Erwin
05 Sep 2006, 05:12
Different ways to do it - for me, I modified search.php quite extensively (I'm running a vB-based Wiki off that slave too).

kerplunknet
06 Sep 2006, 06:32
Erwin, where could I learn more about MySQL replication? If we wanted to offload search to a separate database server, we would need to use MySQL replication, correct? The search database server would be a slave server. I am unsure how to set this up. Any advice? Tips? Pointers? URLs? Thanks. :)

Dontom
06 Sep 2006, 06:37
http://dev.mysql.com/doc/refman/5.0/en/replication.html

kerplunknet
06 Sep 2006, 21:49
http://dev.mysql.com/doc/refman/5.0/en/replication.html
I appreciate your help, but I have already read this. I find most of the documents on MySQL's website are not helpful to me. They are very technical and precise. Documents such as these are more likely to help me and is more what I was asking Erwin for:

http://www.howtoforge.com/mysql_database_replication
http://www.onlamp.com/pub/a/onlamp/2005/06/16/MySQLian.html

Please don't Google "mysql replication" and post the first link that comes up. I can do that myself.

Dontom
06 Sep 2006, 21:58
the reference manual contains a step by step "How to Set Up Replication" which helped us setting it up...

How to put search on slave :
edit includes/config.php

put your slave config into this:

if (THIS_SCRIPT == search)
{

slave config goes here...

}

in vb 3.5x we had to do it slightly different (use a second master config for THIS_SCRIPT == search)
http://www.vbulletin.com/forum/showpost.php?p=1204993&postcount=9

Erwin
07 Sep 2006, 05:49
the reference manual contains a step by step "How to Set Up Replication" which helped us setting it up...

How to put search on slave :
edit includes/config.php

put your slave config into this:

if (THIS_SCRIPT == search.php)
{

slave config goes here...

}

http://www.vbulletin.com/forum/showpost.php?p=1204993&postcount=9
You can do it that way but you end up replicating the whole database when you should only replicate user, forum, thread and post.

Dontom
07 Sep 2006, 06:22
This way we can use it as a fallback db, in case the master dies...

Farcaster
15 Sep 2006, 23:31
In some situations it might be advisable to change some tables from MyIsam (table-level locking, no transactions) to InnoDB (row-level locking, with transactions). Using this kind of tuning, requires probably an experienced MySQL-administrator.

For large forums is there any reason not to use InnoDB? If you use InnoDB, you also could also modify the forum searches to READ UNCOMMITTED and pickup some performance there without any locking at all.

BoardTracker
15 Sep 2006, 23:39
For large forums is there any reason not to use InnoDB? If you use InnoDB, you also could also modify the forum searches to READ UNCOMMITTED and pickup some performance there without any locking at all.
InnoDB doesn't have FTS (Full text search) capabilities.

orban
16 Sep 2006, 12:57
There is a patch though, it's linked on the full text forums on mysql.com.

Would love to hear from somebody if that works.

psico
28 Sep 2006, 20:11
You can do it that way but you end up replicating the whole database when you should only replicate user, forum, thread and post.
Erwin, can you explain pls how you did it?

gorman
29 Sep 2006, 18:04
I asked at vbulletin.com but:

Is anybody able to point me to a working hack to disable totally the search with thread results, enabling only post results? If I understand things correctly, it could do wonders for FullText searches...

Is it just a matter of editing templates or what?

TECK
03 Oct 2006, 08:05
You can do it that way but you end up replicating the whole database when you should only replicate user, forum, thread and post.
I'm interested in that also, Erwin. Thanks.

kmike
03 Oct 2006, 10:00
You may want to read this document:
http://dev.mysql.com/doc/refman/4.1/en/replication-options.html
search for "--replicate-do-table".
Some additional reading: http://dev.mysql.com/doc/refman/4.1/en/replication-rules.html

ubuntu-geek
03 Oct 2006, 11:59
http://www.vbulletin.org/forum/showthread.php?t=127868 amazing speed..

TECK
03 Oct 2006, 16:35
Thanks guys, I will definitelly try Sphinx.

Erwin
17 Oct 2006, 06:36
One day I need to find time to do more How-Tos... I'm just way too busy with life...

kontrabass
07 Nov 2006, 16:37
http://www.vbulletin.org/forum/showthread.php?t=127868 amazing speed..

Is Sphinx really the miracle it seems to be? Any other big-boarders solved their search issues with Sphinx?

mute
10 Nov 2006, 14:45
Is Sphinx really the miracle it seems to be? Any other big-boarders solved their search issues with Sphinx?

We're the uh, 4th? Largest vb forum I think, and I believe the first to switch to Sphinx. It solved our search woes 100%, and we're very happy with it.

kmike
13 Nov 2006, 15:47
We've been running Sphinx-based search since May. It was a miracle indeed, after an usability nightmare of searching in split-per-year archives of our 6+ mln posts.

amcd
16 Nov 2006, 17:39
Is Sphinx really the miracle it seems to be? Any other big-boarders solved their search issues with Sphinx?
sphinx search is the best hack i have ever seen. with plugins in vb 3.5.x and 3.6.x, there is a tendency to avoid code edits, but this one is worth it.