Thread: Sphinx Search
View Single Post
  #387  
Old 05 Jul 2007, 11:53
orban orban is offline
 
Join Date: Jan 2005
Implementing Sphinx full-text search engine

Based on Sphinx 0.7.9 and vB 3.6.7 PL1. This means all file edits and config files are only tested with those two versions, it doesn't mean you cannot make Sphinx work with your vB 3.5 installation but it will require manual work on your side.

Known limitations
  1. You cannot filter by number of replies
    • Possible Fix: Add another "sql_group_column" holding the number of thread replies, the search will using the numbers of the last thread reindexing though (depending on your setup, hours to days old results).
  2. Sorting by title/number of replies/views/thread start date/username/forum isn't possible
    • Basically same issue as in (1.), Sphinx doesn't have the necessary data.
  3. You can only use Sphinx to perform queries that have a full text component. So searches by userid/forumid WITHOUT a key word are not possible. These searches can run on indices though so they shouldn't be an issue.
    • Workaround by kmike. "You can emulate the search by user in sphinx by adding a fake unique keyword per each member in the mix (e.g. "_userid_12345"). Searching by this keyword will return all posts by the member with userid 12345."
  4. Search Results out of order because the time stamps are too old
    • Sphinx doesn't query MySQL to get the latest time stamps. So if your thread had its last reply 3 days ago, was indexed by Sphinx 2 days ago and now today got a new reply, Sphinx will still assume its last reply was 3 days ago. In the search results, it will put waaay back instead being at the top. There is no easy fix for this, and certainly no fast one, because this is just what makes Sphinx so fast. We're sacrificing a bit of "up-to-date-ness" to gain speed. If are in desperate need of fixing this, kmike outlined a fix. Basically this will send a results to MySQL and sort it again, giving you up-to-date results by sacrificing speed. It's up to you to find out if it's worth it.
What Sphinx can do for you
  1. Incredibly fast full text searches on huge amounts of posts
    • It's really fast, really really really fast. Even on intersections of multiple keywords on several hundred thousand results.
  2. Replace forum search, search in this forum and search in this thread
    • Mimicking the default forum search for all but a few details
  3. Nearly instant indexing of new posts
    • Thanks to a special config file setup called "Live Updates"
Setting up Sphinx
  1. Grab Sphinx here: http://www.sphinxsearch.com/downloads.html and compile it
  2. Read a bit of the documentation to get familiar with it, might wanna peek in the installation bit
  3. Grab the sphinx.conf.txt at the end of this document (rename it to sphinx.conf). This is my configuration file. You have to, at least, fill in your database info and adjust the paths /.../
  4. You have to create a counter table that holds information about the last indexed post/thread for the Live Updates:
    Block Disabled:      (Update License Status)  
    Suspended or Unlicensed Members Cannot View Code.

    You can either place this in the same database as vB or in a different one, but don' forget to adjust sphinx.conf accordingly then (prefixing sph_counter with your database name: yourdb.sph_counter)
Running Sphinx
  • You start Sphinx with "searchd --config /.../sphinx.conf" this will create a new process called "searchd".
  • Indexing documents is handled by "indexer". You have to make sure you know whether it's running or not before you start an indexing process, this is crucial.
    • searchd is running: use "indexer --rotate", it will create temporary new files and rotate them in so searching won't be broken
    • searchd isn't running: use "indexer" without rotating it will just replace your current files
  • For creating the full indices it is recommended to shut down Sphinx because it might take a while and your server will be quite busy (unless you run sphinx on a slave). Reindexing all posts and threads is done by "indexer --config /.../sphinx.conf --all" or "indexer --config /.../sphinx.conf --rotate --all" if searchd is running.
  • Creating the delta indices for Live Updates is issued by "indexer --config /.../sphinx.conf --rotate postdelta threaddelta"
  • You can test your indices with "search", the third executable installed by Sphinx. Call "search" and it tell you how to use it
Live Updates
  • You have to figure out a couple values: How often to re-index the whole thing, how often to re-index all threads, how often to do Live Updates for postdelta and threaddelta.
  • "indexer -all": I do this about once per week on a very un-busy time, usually manually.
  • I re-index all threads once per day, we just have 80k so this takes no time.
  • I recreate the delta indices every five minutes for both posts and threads so you have to wait between 1 and 5 minutes before your new threads/posts start showing up in search results.
  • I suggest adding cron jobs for those taks on *n*x, other OSes I don't know, can you even run Sphinx on Windows?
Plugging Sphinx into vBulletin
  1. Sadly enough this requires file modification. I'm checking every version if they finally added a way to plug in a different search system like WordPress for examples does, but no luck so far. There is 5 edits required, I listed them in search.php.txt at the end of this document for easier references and so you can save it for future use. You will be editing "/.../forums/search.php". Don't forget that every vB upgrade the file will be overwritten and you will have to apply the changes again.
  2. We also need sphinxapi.php, it's from "/.../src/sphinx-0.9.7/api" where your Sphinx source files are. Copy paste it to "/.../forums", where global.php lies.
  3. And last item is sphinx.php which will handle the search. Grab sphinx.txt.php and rename it to sphinx.php and put it into "/.../forums/includes". Open it and adjust the values on top. You can obviously move those files to where you want just don't forget to adjust paths.
  4. Because we cannot offer all search options vB default search can, I removed a couple lines from the "search_forums" template. They are listed in search_forums.txt at the end of this document.
Bugs and FixesContributions
Attached Files
File Type: txt search.php.txt (4.2 KB, 479 views)
File Type: txt search_forums.txt (2.0 KB, 375 views)
File Type: txt sphinx.php.txt (5.6 KB, 337 views)
File Type: txt sphinx.conf.txt (7.3 KB, 363 views)

Last edited by Paul M; 05 Jul 2007 at 19:10. Reason: Text reduced - far too big (unnecessary)
Reply With Quote