Thread: Sphinx Search
View Single Post
  #1  
Old 29 Sep 2006, 20:05
orban orban is offline
 
Join Date: Jan 2005
Sphinx Search

Sphinx Implementation for vBulletin:

Version 0.1 Hooray!

Just sharing as usual, let the discussions begin (in b4 TECK "MINE IS BETTER")

Only tested with Sphinx-0.9.8-rc2 (r1234; Mar 29, 2008).

If you are upgrading from my old tutorial, backup your search.php (you know, just in case you need the old hacked up version again) and restore the original from the zip/tar, no more file modifications!

http://sphinxsearch.com/downloads.html

Tested on 3.6.10, should work on 3.7 if you modify /*insert query*/ on Line 522 (I removed 'prefixchoice' field because it doesn't exist in 3.6)

No support for tags/thread prefix yet, because I don't have access to a 3.7 installation at the moment

Similar threads is also being worked on

Alpha release for some feedback, hopefully it will be production ready soon

I assume you already have Sphinx up and running... see attached sphinx.conf.example for a minimalistic setup

Installation notes inside search_sphinx.php

Well yeah enjoy. And PM me if you need help

The old post is here: http://www.vbulletin.org/forum/showp...&postcount=387

The Good:
  • Search this forum
  • Search this thread
  • Find all posts by User
  • Find all threads started by User
  • "Search Entire Posts"/"Search Titles Only" and "Show Results as Threads"/"Show Results as Posts" in all four combinations supported
  • "Search Entire Posts" can be sorted by rank/post.dateline (postuserid, forumid will sort by integer)
  • "Search Titles Only" can be sorted by rank, last reply date, first post date, number of replies (views if you add that value to sphinx.conf)
  • Really fast

The Bad:
  • This means you can't sort posts by title, number of replies/views, thread start date, last reply date (Sphinx doesn't have this data).*
  • You could possibly add this to sphinx.conf but it will only be as good as your last full post index update
  • "Find Threads with At Least/Most X Replies" doesn't work when "Search Entire Posts"
  • Search results are delayed (depending on how often you run indexer)
  • "New Posts" not supported... too much logic in the query?!

The Ugly:
  • Sorting is kinda messed up (especially when "Search Entire Posts" and "Show Results as Threads" are combined)
  • search_sphinx.php is messy, duplicated code from search.php

*The Infamous Post Sorting Quirk

What happens here is that when you "Search Entire Posts" and "Show Results as Threads", do you want you threads sorted by:
  • First post dateline (vBulletin option)
  • Last post dateline (vBulletin default)
  • The matching post dateline (Sphinx)

Our Sphinx setup does not have first post and last post dateline stored in its post index (and it would be pretty much useless too) so the first two options are not available. vBulletin offers a function called "sort_search_items()" (search.php:633 3.7) which could, in theory, be used to sort the threads by last post dateline.

It does not fix the problem though. Let's assume we set maxresults to 5. We are searching for threads for "funny". We have 7 threads created today:

1. Thread "Cows", Created 08:00, Last Post 17:00 | "Funny Cows", Created 09:00
2. Thread "Cats", Created 09:00, Last Post 14:00 | "Funny Cats", Created 14:00
3. Thread "Dogs", Created 10:00, Last Post 12:00 | "Funny Dogs", Created 11:00
4. Thread "Mice", Created 11:00, Last Post 15:00 | "Funny Mice", Created 13:00
5. Thread "Rats", Created 12:00, Last Post 13:00 | "Funny Rats", Created 12:00
6. Thread "Eels", Created 13:00, Last Post 19:00 | "Funny Eels", Created 18:00
7. Thread "Fish", Created 14:00, Last Post 18:00 | "Funny Fish", Created 17:00

Do we want to show threads 6, 7, 2, 4, 5 (Sphinx)? Or do we want to show threads 6, 7, 1, 4, 2 (vB)?

vBulletin finds all 7 posts, orders them by last post descending, and grabs the top 5.
Sphinx will find the newest 5 matching posts and then returns you the associated threads.

Reordering search results with "sort_search_items()" does not fix the problem because there might be older threads with very recent replies that Sphinx won't even consider. Let's consider an 8th thread:

8. Thread "Bees", Created 2002, Last Post 20:00 | "Funny Bees", Created 2002

vBulletin will list this one on top, Sphinx will not consider it. So even re-sorting the search items will not make this thread appear.
Attached Files
File Type: php search_sphinx.0.1.php (17.3 KB, 655 views)
File Type: txt sphinx.conf.example.txt (3.6 KB, 766 views)

Last edited by orban; 08 May 2008 at 08:58.
Reply With Quote