Register Members List Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
  #361  
Old 05 Jul 2007, 11:53
orban orban is offline
 
Join Date: Jan 2005
Implementing Sphinx full-text search engine

Based on Sphinx 0.7.9 and vB 3.6.7 PL1. This means all file edits and config files are only tested with those two versions, it doesn't mean you cannot make Sphinx work with your vB 3.5 installation but it will require manual work on your side.

Known limitations
  1. You cannot filter by number of replies
    • Possible Fix: Add another "sql_group_column" holding the number of thread replies, the search will using the numbers of the last thread reindexing though (depending on your setup, hours to days old results).
  2. Sorting by title/number of replies/views/thread start date/username/forum isn't possible
    • Basically same issue as in (1.), Sphinx doesn't have the necessary data.
  3. You can only use Sphinx to perform queries that have a full text component. So searches by userid/forumid WITHOUT a key word are not possible. These searches can run on indices though so they shouldn't be an issue.
    • Workaround by kmike. "You can emulate the search by user in sphinx by adding a fake unique keyword per each member in the mix (e.g. "_userid_12345"). Searching by this keyword will return all posts by the member with userid 12345."
  4. Search Results out of order because the time stamps are too old
    • Sphinx doesn't query MySQL to get the latest time stamps. So if your thread had its last reply 3 days ago, was indexed by Sphinx 2 days ago and now today got a new reply, Sphinx will still assume its last reply was 3 days ago. In the search results, it will put waaay back instead being at the top. There is no easy fix for this, and certainly no fast one, because this is just what makes Sphinx so fast. We're sacrificing a bit of "up-to-date-ness" to gain speed. If are in desperate need of fixing this, kmike outlined a fix. Basically this will send a results to MySQL and sort it again, giving you up-to-date results by sacrificing speed. It's up to you to find out if it's worth it.
What Sphinx can do for you
  1. Incredibly fast full text searches on huge amounts of posts
    • It's really fast, really really really fast. Even on intersections of multiple keywords on several hundred thousand results.
  2. Replace forum search, search in this forum and search in this thread
    • Mimicking the default forum search for all but a few details
  3. Nearly instant indexing of new posts
    • Thanks to a special config file setup called "Live Updates"
Setting up Sphinx
  1. Grab Sphinx here: http://www.sphinxsearch.com/downloads.html and compile it
  2. Read a bit of the documentation to get familiar with it, might wanna peek in the installation bit
  3. Grab the sphinx.conf.txt at the end of this document (rename it to sphinx.conf). This is my configuration file. You have to, at least, fill in your database info and adjust the paths /.../
  4. You have to create a counter table that holds information about the last indexed post/thread for the Live Updates:
    Block Disabled:      (Update License Status)  
    Suspended or Unlicensed Members Cannot View Code.

    You can either place this in the same database as vB or in a different one, but don' forget to adjust sphinx.conf accordingly then (prefixing sph_counter with your database name: yourdb.sph_counter)
Running Sphinx
  • You start Sphinx with "searchd --config /.../sphinx.conf" this will create a new process called "searchd".
  • Indexing documents is handled by "indexer". You have to make sure you know whether it's running or not before you start an indexing process, this is crucial.
    • searchd is running: use "indexer --rotate", it will create temporary new files and rotate them in so searching won't be broken
    • searchd isn't running: use "indexer" without rotating it will just replace your current files
  • For creating the full indices it is recommended to shut down Sphinx because it might take a while and your server will be quite busy (unless you run sphinx on a slave). Reindexing all posts and threads is done by "indexer --config /.../sphinx.conf --all" or "indexer --config /.../sphinx.conf --rotate --all" if searchd is running.
  • Creating the delta indices for Live Updates is issued by "indexer --config /.../sphinx.conf --rotate postdelta threaddelta"
  • You can test your indices with "search", the third executable installed by Sphinx. Call "search" and it tell you how to use it
Live Updates
  • You have to figure out a couple values: How often to re-index the whole thing, how often to re-index all threads, how often to do Live Updates for postdelta and threaddelta.
  • "indexer -all": I do this about once per week on a very un-busy time, usually manually.
  • I re-index all threads once per day, we just have 80k so this takes no time.
  • I recreate the delta indices every five minutes for both posts and threads so you have to wait between 1 and 5 minutes before your new threads/posts start showing up in search results.
  • I suggest adding cron jobs for those taks on *n*x, other OSes I don't know, can you even run Sphinx on Windows?
Plugging Sphinx into vBulletin
  1. Sadly enough this requires file modification. I'm checking every version if they finally added a way to plug in a different search system like WordPress for examples does, but no luck so far. There is 5 edits required, I listed them in search.php.txt at the end of this document for easier references and so you can save it for future use. You will be editing "/.../forums/search.php". Don't forget that every vB upgrade the file will be overwritten and you will have to apply the changes again.
  2. We also need sphinxapi.php, it's from "/.../src/sphinx-0.9.7/api" where your Sphinx source files are. Copy paste it to "/.../forums", where global.php lies.
  3. And last item is sphinx.php which will handle the search. Grab sphinx.txt.php and rename it to sphinx.php and put it into "/.../forums/includes". Open it and adjust the values on top. You can obviously move those files to where you want just don't forget to adjust paths.
  4. Because we cannot offer all search options vB default search can, I removed a couple lines from the "search_forums" template. They are listed in search_forums.txt at the end of this document.
Bugs and FixesContributions
Attached Files
File Type: txt search.php.txt (4.2 KB, 479 views)
File Type: txt search_forums.txt (2.0 KB, 375 views)
File Type: txt sphinx.php.txt (5.6 KB, 337 views)
File Type: txt sphinx.conf.txt (7.3 KB, 363 views)

Last edited by Paul M; 05 Jul 2007 at 19:10. Reason: Text reduced - far too big (unnecessary)
Reply With Quote
  #362  
Old 05 Jul 2007, 12:03
orban orban is offline
 
Join Date: Jan 2005
Can somebody give this a look, I tried to list some limitations/bugs/contributions that we are currently experiencing. Did I miss anything important?

Last edited by orban; 05 Jul 2007 at 12:29.
Reply With Quote
  #363  
Old 05 Jul 2007, 19:22
ekool ekool is offline
 
Join Date: Jun 2003
Orban,

Very nicely put together. I still have my older working Sphinx setup working (thanks to you and many others in here) so I have no need to change anything just yet, but thanks for the wonderful write-up!
Reply With Quote
  #364  
Old 07 Jul 2007, 20:18
PSS PSS is offline
 
Join Date: Jul 2007
Originally Posted by orban View Post
Can somebody give this a look, I tried to list some limitations/bugs/contributions that we are currently experiencing. Did I miss anything important?
Couple of small things:

1. you did

CREATE TABLE sph_counter

but used sphinx_counter in sphinx.conf.

2. Then it would be great to have PREFIX_ where you would place your personal Vb table prefix. I added them there but it is not an easy task for those who do not know mysql syntax.

3. A step by step how to implement sort_search_items() would be nice.

Thanks for EXCELLENT work!

EDIT: Couple of things I would still like to know: when you have Sphinx search in place, do you need to have FULLTEXT index(es) in Vbulletin at all?

Also, is Sphinx used in "new posts" seach, too?

Last edited by PSS; 08 Jul 2007 at 12:06. Reason: Automerged Doublepost
Reply With Quote
  #365  
Old 09 Jul 2007, 02:56
TECK's Avatar
TECK TECK is offline
 
Join Date: Dec 2001
Real name: Floren Munteanu
orban, I don't see any reference in vBulletin or Sphinx to 'timesegments':

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

Is there something I miss? Thanks for explaining.
Also, if anyone got kmike's trick (for username is userid_12345) fixed into their configuration files, could you be kind and post here the actual code?

Thanks for taking the time to write this up.
__________________
Floren Munteanu
Axivo Inc.
Axivo Community - Visit the forums to find out more about us
Why Queued - My personal blog

Last edited by TECK; 09 Jul 2007 at 17:00.
Reply With Quote
  #366  
Old 09 Jul 2007, 16:37
PSS PSS is offline
 
Join Date: Jul 2007
Another question: is there a way to check if searchd is running and if not, put text "search is offline" to the search page?
Reply With Quote
  #367  
Old 09 Jul 2007, 20:20
TECK's Avatar
TECK TECK is offline
 
Join Date: Dec 2001
Real name: Floren Munteanu
Very easy.


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

I wrote the check as a function because you can use it in several areas, this way.

Now, back to my question. Can anyone help me with the username setup? I can't think how you can use a variable in Sphinx conf file... because you cannot. Obviously I`m wrong, kmike did it but unfortunatelly he is not available.
__________________
Floren Munteanu
Axivo Inc.
Axivo Community - Visit the forums to find out more about us
Why Queued - My personal blog
Reply With Quote
  #368  
Old 09 Jul 2007, 21:21
orban orban is offline
 
Join Date: Jan 2005
http://dev.mysql.com/doc/refman/5.0/...unction_concat

The query that grabs the posts, use two concats:

CONCAT( post, ' ', CONCAT( 'userid_', userid ) )

untested, but I hope you get the idea. Then you need to modify search.php to transform a given userid into the string...
Reply With Quote
  #369  
Old 10 Jul 2007, 02:51
TECK's Avatar
TECK TECK is offline
 
Join Date: Dec 2001
Real name: Floren Munteanu
Aha, thanks orban. What I want to do is this:
If an user wants to search for all threads/posts related to a specific user, he enters a username then leaves the search field empty. The results will show all threads started by that user, ordered the way you like it in Sphinx.

Anyone wants to work with me on this project? I PM'ed kmike, hoping he will join us... since he is the only one who managed to fix this, not to mention other little extras.

Originally Posted by DaiTengu View Post
You wouldn't happen to have an easy way to implement that, would you? My PHP knowledge is somewhat lacking

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

Then you call it anywhere you like:

Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

Can you post results on your busy boards and let me know how it impacts the performance?
The function above has less processing code then the original sort_search_items() function.

The PHP BBCode at vb.org is screwed, it breaks the code lines. Switched back to Code, much better.
__________________
Floren Munteanu
Axivo Inc.
Axivo Community - Visit the forums to find out more about us
Why Queued - My personal blog

Last edited by TECK; 10 Jul 2007 at 07:10. Reason: Automerged Doublepost
Reply With Quote
  #370  
Old 10 Jul 2007, 07:12
amcd amcd is offline
 
Join Date: Oct 2004
Originally Posted by TECK View Post
Very easy.


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

I wrote the check as a function because you can use it in several areas, this way.

Now, back to my question. Can anyone help me with the username setup? I can't think how you can use a variable in Sphinx conf file... because you cannot. Obviously I`m wrong, kmike did it but unfortunatelly he is not available.
this will not work in a multi-server setup.
__________________
eXBii.com - Indian community
no XB no fun know XB know fun !
Reply With Quote
  #371  
Old 10 Jul 2007, 07:42
TECK's Avatar
TECK TECK is offline
 
Join Date: Dec 2001
Real name: Floren Munteanu
Originally Posted by amcd View Post
this will not work in a multi-server setup.
True, I'm not there yet with multiple servers.
Use this instead:


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

I run a failsafe on my server... if searchd is crashing, vbulletin search will take over automatically.

Edit: Let me dig into this more... I think that searchd will still spit an error, even if it's running, something like (no error).
I will post at sphinx site to ask Andrew how exacly the last error works.
__________________
Floren Munteanu
Axivo Inc.
Axivo Community - Visit the forums to find out more about us
Why Queued - My personal blog

Last edited by TECK; 10 Jul 2007 at 14:09.
Reply With Quote
  #372  
Old 13 Jul 2007, 07:41
TECK's Avatar
TECK TECK is offline
 
Join Date: Dec 2001
Real name: Floren Munteanu
Originally Posted by raywjohnson View Post
When to run the indexer seems to be a matter of preference, keeping in mind the usage/size of the database in question. I run two (almost) identical crons, one every 20 min (for the deltas) and one every day (for the full index). The LOCKFILE helps to keep them from stepping on each other.


Block Disabled:      (Update License Status)  
Suspended or Unlicensed Members Cannot View Code.

You could also replace ">/dev/null 2>&1" with "| mail -s "Sphinx Report" YOUR_EMAIL_HERE" to get an email of the output.

-RayJ
You should use lockrun instead, is way more robust then a shell script.
__________________
Floren Munteanu
Axivo Inc.
Axivo Community - Visit the forums to find out more about us
Why Queued - My personal blog
Reply With Quote
  #373  
Old 22 Jul 2007, 20:24
TECK's Avatar
TECK TECK is offline
 
Join Date: Dec 2001
Real name: Floren Munteanu
Never mind, I sort it.
__________________
Floren Munteanu
Axivo Inc.
Axivo Community - Visit the forums to find out more about us
Why Queued - My personal blog

Last edited by TECK; 22 Jul 2007 at 21:19.
Reply With Quote
  #374  
Old 25 Jul 2007, 00:02
PSS PSS is offline
 
Join Date: Jul 2007
I still would like to know: when you have Sphinx search in place, do you need to have FULLTEXT index(es) in Vbulletin at all?

Maybe it is a stupid question and FAQ and RTFM etc, but please take a second to answer yes or no if you know the answer, thanks!
Reply With Quote
  #375  
Old 25 Jul 2007, 00:12
mute mute is offline
 
Join Date: Dec 2002
Originally Posted by PSS View Post
I still would like to know: when you have Sphinx search in place, do you need to have FULLTEXT index(es) in Vbulletin at all?

Maybe it is a stupid question and FAQ and RTFM etc, but please take a second to answer yes or no if you know the answer, thanks!
Nope.
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


New To Site? Need Help?

All times are GMT. The time now is 14:59.

Layout Options | Width: Wide Color: