PDA

View Full Version : Sphinx Search


Pages : [1] 2 3 4

orban
29 Sep 2006, 20:05
Sphinx Implementation for vBulletin:

Version 0.1 Hooray!

Just sharing as usual, let the discussions begin (in b4 TECK "MINE IS BETTER")

Only tested with Sphinx-0.9.8-rc2 (r1234; Mar 29, 2008).

If you are upgrading from my old tutorial, backup your search.php (you know, just in case you need the old hacked up version again) and restore the original from the zip/tar, no more file modifications!

http://sphinxsearch.com/downloads.html

Tested on 3.6.10, should work on 3.7 if you modify /*insert query*/ on Line 522 (I removed 'prefixchoice' field because it doesn't exist in 3.6)

No support for tags/thread prefix yet, because I don't have access to a 3.7 installation at the moment

Similar threads is also being worked on

Alpha release for some feedback, hopefully it will be production ready soon :p

I assume you already have Sphinx up and running... see attached sphinx.conf.example for a minimalistic setup

Installation notes inside search_sphinx.php

Well yeah enjoy. And PM me if you need help

The old post is here: http://www.vbulletin.org/forum/showpost.php?p=1283359&postcount=387

The Good:

Search this forum
Search this thread
Find all posts by User
Find all threads started by User
"Search Entire Posts"/"Search Titles Only" and "Show Results as Threads"/"Show Results as Posts" in all four combinations supported
"Search Entire Posts" can be sorted by rank/post.dateline (postuserid, forumid will sort by integer)
"Search Titles Only" can be sorted by rank, last reply date, first post date, number of replies (views if you add that value to sphinx.conf)
Really fast


The Bad:

This means you can't sort posts by title, number of replies/views, thread start date, last reply date (Sphinx doesn't have this data).*
You could possibly add this to sphinx.conf but it will only be as good as your last full post index update
"Find Threads with At Least/Most X Replies" doesn't work when "Search Entire Posts"
Search results are delayed (depending on how often you run indexer)
"New Posts" not supported... too much logic in the query?!


The Ugly:

Sorting is kinda messed up (especially when "Search Entire Posts" and "Show Results as Threads" are combined)
search_sphinx.php is messy, duplicated code from search.php


*The Infamous Post Sorting Quirk

What happens here is that when you "Search Entire Posts" and "Show Results as Threads", do you want you threads sorted by:

First post dateline (vBulletin option)
Last post dateline (vBulletin default)
The matching post dateline (Sphinx)


Our Sphinx setup does not have first post and last post dateline stored in its post index (and it would be pretty much useless too) so the first two options are not available. vBulletin offers a function called "sort_search_items()" (search.php:633 3.7) which could, in theory, be used to sort the threads by last post dateline.

It does not fix the problem though. Let's assume we set maxresults to 5. We are searching for threads for "funny". We have 7 threads created today:

1. Thread "Cows", Created 08:00, Last Post 17:00 | "Funny Cows", Created 09:00
2. Thread "Cats", Created 09:00, Last Post 14:00 | "Funny Cats", Created 14:00
3. Thread "Dogs", Created 10:00, Last Post 12:00 | "Funny Dogs", Created 11:00
4. Thread "Mice", Created 11:00, Last Post 15:00 | "Funny Mice", Created 13:00
5. Thread "Rats", Created 12:00, Last Post 13:00 | "Funny Rats", Created 12:00
6. Thread "Eels", Created 13:00, Last Post 19:00 | "Funny Eels", Created 18:00
7. Thread "Fish", Created 14:00, Last Post 18:00 | "Funny Fish", Created 17:00

Do we want to show threads 6, 7, 2, 4, 5 (Sphinx)? Or do we want to show threads 6, 7, 1, 4, 2 (vB)?

vBulletin finds all 7 posts, orders them by last post descending, and grabs the top 5.
Sphinx will find the newest 5 matching posts and then returns you the associated threads.

Reordering search results with "sort_search_items()" does not fix the problem because there might be older threads with very recent replies that Sphinx won't even consider. Let's consider an 8th thread:

8. Thread "Bees", Created 2002, Last Post 20:00 | "Funny Bees", Created 2002

vBulletin will list this one on top, Sphinx will not consider it. So even re-sorting the search items will not make this thread appear.

Adrian Schneider
29 Sep 2006, 20:30
Nice find! I'll play around with it once I get some time.

orban
29 Sep 2006, 20:37
Obviously the only options you will have on the advanced search page are:

Key Words:
Search In: Thread Titles/Posts
Sort Results by: Relevancy, Date Asc, Date Desc
Search in Forums:

And I guess searching by username will still be the built in way. (As in, without a search term, just list his posts.)

Gonna try to hack that up, when I make it work I'll release it I hope :)

But the fact you can index 4k posts/second is absolutely insane, and that was with 800 users online... :D

Paul M
29 Sep 2006, 20:39
Hmm, yes, that looks interesting, bookmarked for later. :)

orban
29 Sep 2006, 20:50
Also means I can remove that 400mb fulltext index from post table making MySQL even faster.

The right tool for the job. :)

Filtering by forumid already works, so does sorting by date.

And it still says 0.000003 seconds. Incredible.

forumdude
29 Sep 2006, 21:20
Hmm good timing. I got on here today to see if there were any other resources out there for searching and vbulletin and this showed up in the results.

We've had soooo much trouble keeping our search up. We're using the fulltext search right now with the search on its own server on tables reduced in size. Huge pain and it still doesn't return some results.

Keep us updated please, this looks cool.

forumdude
29 Sep 2006, 22:36
Awsome!

If I get some time tonight (probably not!) I will download Sphinx and give it a look.

What kind of data do you have to test this with?

We're looking at about 9 million records on our live post table (millions more archived). I'm very curious how well this would hold up to that amount of data.

mute
29 Sep 2006, 23:26
Can I get a peek at your sphinx.conf?

mute
29 Sep 2006, 23:33
wow, you are fast! thanks. I'm tossing it 24 million posts to see what it does :)

mute
30 Sep 2006, 00:28
*waits for post index to build*

So far so good. It ripped through 1,652,726 thread titles in about 2 minutes, on a machine replicating a very active forum, and one running a test upgrade from 3.5.5 to 3.6.1 :)

So far, I'm happy! I think with a little work this could be amazing. The api is a little unfriendly when it comes to errors and what not, but with some polishing and figuring out the targeting of searches and by name, and we're good to go.

Orban you are a hero among men!

Just FYI:

thread table:
collected 1658976 docs, 48.1 MB
sorted 5.1 Mhits, 100.0% done
total 1658976 docs, 48070959 bytes
total 148.426 sec, 323872.56 bytes/sec, 11177.16 docs/sec

post table:
collected 8860446 docs, 1416.9 MB
sorted 140.2 Mhits, 100.0% done
total 8860446 docs, 1416892676 bytes
total 3168.862 sec, 447129.84 bytes/sec, 2796.10 docs/sec

that is word length of 4 and no stopwords.

mute
30 Sep 2006, 13:03
Wow, that's crazy. 1.4gb for 8.8million posts....?!

Actually, 1.4gb for 24 million posts. For some reason it gets 1:1 "documents" when indexing thread, but only 1:3 for posts, not sure if that is a bug and it isn't indexing everything, or has something to do with our content?

I'm headed out fishing, but I'm going to play with your updated changes later :)

orban
30 Sep 2006, 13:42
Weird....

mute
30 Sep 2006, 13:55
Yeah, and I recreated it a few times (with stopwords, diff min word length, etc). Not exactly sure why yet.

orban
30 Sep 2006, 19:02
Maybe some posts are too short? Like no words longer than 4 characters?

But then again that'd never be 2/3th of the posts. I really have no idea :(

kmike
02 Oct 2006, 08:54
Sphinx 0.9.7 will feature an arbitrary number of group id's, so it would be possible to handle "search this thread" and search by user in Sphinx.
Meanwhile, it's easy to hack Sphinx to support 3 groupid columns instead of one by some copy-pasting. Naturally, the index size is larger with additional group id's, 5GB for 6mln post database. We've been running it for some months already with great success.

orban
02 Oct 2006, 09:50
Mind sharing the patch and maybe your implementation in vB? Or at least outlining it?

Would be nice...!

mute
02 Oct 2006, 16:32
orban, what kind of changes do i need to make to my search.php to have it search both the main and delta index?

Prior to setting up the delta index on my end, I noticed that I could search for words in post bodies and not return results, but if I look in my query.log I would see many many results.

orban
02 Oct 2006, 16:38
mmmmm

you don't have to modify search.php, create a fake index that contains the two other indices.

mute
02 Oct 2006, 16:40
hm ok, i think my config just might be a bit goofy. On my dev board, I created a new post after creating all 4 indexes. Anyway, my test post had a made up word in it, and after I posted I reran the delta updates, saw them pick up one doc, but I don't get any results returned if I use the "search" tool with sphinx.

I'm going to double check my config now.

orban
02 Oct 2006, 17:07
weird....

make sure the indexes get created (check the data files)

ubuntu-geek
02 Oct 2006, 17:59
I know this is a bit ugly right now, but:

http://forums.mtgsalvation.com/search.php

Also the "Search This Forum" is using Sphinx now.

"Search This Thread" and all queries using userids have to be done the old way for the moment until the new Sphinx version is released.

But I'm happy users can search our 1.4 million posts in <1sec again. Without crashing the server, locking any tables or anything.

When new version is out I'll finish the implementation and release it :)
New version of sphinx or vb? ;) I really want to try this out.. searches are killing us.

orban
02 Oct 2006, 18:03
sphinx

mute
02 Oct 2006, 18:05
Hm. this is very strange. I have verified that my config is the same as yours (minus the names of the indexes), and have emptied my sphinx_counter table, nuked all my indexes, and rebuilt.

[root@db2 var]# /home/httpd/sphinx/bin/search -c /home/httpd/sphinx/etc/sphinx.conf purple
Sphinx 0.9.6
Copyright (c) 2001-2006, Andrew Aksyonoff

- loaded 591 stopwords from '/home/httpd/sphinx/etc/sphinx.stopwords'
index 'vbpostidx': query 'purple ': returned 0 matches of 0 total in 0.000 sec
- loaded 591 stopwords from '/home/httpd/sphinx/etc/sphinx.stopwords'
index 'vbpostdeltaidx': query 'purple ': returned 0 matches of 0 total in 0.000 sec
- loaded 591 stopwords from '/home/httpd/sphinx/etc/sphinx.stopwords'
index 'vbthreadidx': query 'purple ': returned 0 matches of 0 total in 0.000 sec
- loaded 591 stopwords from '/home/httpd/sphinx/etc/sphinx.stopwords'
index 'vbthreaddeltaidx': query 'purple ': returned 0 matches of 0 total in 0.000 sec

I broke something, but I don't know what :)

Ah, I found the problem I think.

For whatever reason, on my initial index, despite having used --rotate, it is leaving *new* index files in my var dir:

[root@db2 var]# ls -la *new*
-rw-r--r-- 1 root root 1356935444 Oct 2 13:39 vbpost.new.spd
-rw-r--r-- 1 root root 10644727 Oct 2 13:39 vbpost.new.spi
-rw-r--r-- 1 root root 54322284 Oct 2 13:42 vbthread.new.spd
-rw-r--r-- 1 root root 879893 Oct 2 13:42 vbthread.new.spi

Sphinx won't search against these, but I'm not sure why they didn't roll over.

orban
02 Oct 2006, 18:30
Yeah I don't have .new. ones, just .old. ones.

Permissions?

mute
02 Oct 2006, 19:07
It is what I believe a bug in sphinx. If you start searchd w/ no indexes preexisting, then index with --rotate, it won't rotate. The solution is to stop searchd, nuke everything, index, then start searchd.

It took me a while to figure it out, I'm not sure why it isn't smart enough to see that there aren't preexisting indexes when searchd tries to rotate.

orban
02 Oct 2006, 19:11
Oh :(

Yeah I did the first index without searchd started.

Report it so it can be fixed :)

ubuntu-geek
02 Oct 2006, 20:46
I was just curious is your search_sphinx.php posted a few threads back the most current one? or have you made more adjustments?

mute
02 Oct 2006, 20:54
Just an FYI, make sure you limit access to that search on your dev boxes if you don't potentially want people searching for info in your private forums :)

I guess now we just get to wait patiently for 0.9.7 to come out...

orban
02 Oct 2006, 20:59
What you mean? Other users with ssh access?

Yeah or kmike can share his patch http://www.vbulletin.org/forum/showpost.php?p=1088212&postcount=21

:(

ubuntu-geek
02 Oct 2006, 21:14
Just an FYI, make sure you limit access to that search on your dev boxes if you don't potentially want people searching for info in your private forums :)

I guess now we just get to wait patiently for 0.9.7 to come out...
Not sure what you mean. It seems the permission system works on private/non private when doing searches with sphinx.

orban
02 Oct 2006, 21:17
search can be called by anyone with server access on the command line

so he gets access to all your indexes and thus to all your posts

so if you have a designer ssh access to upload stuff he can basically read your private forums

ubuntu-geek
02 Oct 2006, 22:27
search can be called by anyone with server access on the command line

so he gets access to all your indexes and thus to all your posts

so if you have a designer ssh access to upload stuff he can basically read your private forums
True.. Not an issue for us..

orban
02 Oct 2006, 22:29
Neither here, I'm the only with access.

mute
03 Oct 2006, 00:35
Just an FYI, make sure you limit access to that search on your dev boxes if you don't potentially want people searching for info in your private forums :)

I guess now we just get to wait patiently for 0.9.7 to come out...

Oh, I thought at this point the search wasn't excluding forums users don't have permissions to view :)

orban
03 Oct 2006, 00:37
They aren't, but all posts/threads are filtered again on the results page.

kmike
03 Oct 2006, 12:58
Attached is the patch for Sphinx 0.9.5 which adds two more group columns.
You'll have to have something like this in your sphinx.conf:

sql_query=SELECT postid, \
pagetext, post.title, forumid, \
IF(post.userid=0,99999999,post.userid) AS userid, \
post.threadid AS threadid, post.dateline AS dateline \
FROM post
.....

sql_group_column = forumid
sql_group2_column = userid
sql_group3_column = threadid
sql_date_column = dateline

The part with IF(post.userid=0) is needed because Sphinx doesn't like zero column values (you'll have them if a board has some posts by the guests or deleted users), so we replace them with an arbitrary high number (99999999) which is guaranteed not to happen in the real data.

sphinxapi.php supports two more grouping functions: SetGroup2(array) and SetGroup3(array).
So search.php will have to call $sphinx->SetGroups2($userids) when searching by user(s), where $userids is an array containing their userid's.
And similarly, $sphinx->SetGroups3(array($searchthreadid)) will be called when searching in a thread.

orban
03 Oct 2006, 13:20
Thank you. Gonna try this out :)

ubuntu-geek
03 Oct 2006, 14:10
Thank you. Gonna try this out :)
Curious to see how this works out.. :)

TECK
03 Oct 2006, 15:29
Thanks Orban (and others) for this solution.
0.9.6 is out, it fixes the following issues:
- added support for empty indexes (solves the previous issues we had with indexes)
- added support for multiple sql_query_pre/post/post_index
- fixed timestamp ranges filter in "match any" mode
- fixed configure issues with --without-mysql and --with-pgsql options
- fixed building on Solaris 9

orban
03 Oct 2006, 15:32
Yes, but the patch for more than one group won't work for this...

I'm trying to get a snapshot of 0.9.7....

kmike
03 Oct 2006, 16:00
Unfortunately, 0.9.7-dev is still too buggy to be used in production.

mute
03 Oct 2006, 21:01
Unfortunately, 0.9.7-dev is still too buggy to be used in production.

What kind of bugs are you running into?

kmike
04 Oct 2006, 06:39
The groupid has to be <4096...?! I'm sure you have more than 4096 users... Where did you get that number? We have much more than 4096 members and everything is working fine.
*edit* Ah, found it. You're mistaken - 4096 is the limit on a number of groupid's listed in one request. A groupid is an unsigned 32bit integer AFAIK, so the limit of 4GB should be enough for everybody (the famous last words)

What kind of bugs are you running into?Frequent crashes when searching.

TECK
05 Oct 2006, 05:02
Go ahead and post it. :)
Thanks Orban.

mute
05 Oct 2006, 05:08
Indeed, conf, patch and search would be fantastic :)

ubuntu-geek
05 Oct 2006, 14:09
Cool, I'll give this a go this morning and see what happens..

Edit:
http://dragy.de/public/sphinx.api.diff the file is giving a 404 back :(

mute
05 Oct 2006, 15:10
I'm getting a 404 on http://dragy.de/public/sphinx.api.diff, and am having some issues getting the src patch to apply, has anyone else managed to get it to apply?

orban, is there a reason you've removed the "Sort results by", "Find threads with", and "Find posts from" options from your search_forums template? They are still "doable" with multiple groups in sphinx, right?

Ideally I'm looking to replicate the existing vb search, minus the "find as posts and threads" option because I just think that is confusing.

mute
05 Oct 2006, 22:42
I'm dumb, I didn't realize you didn't "make clean" prior to creating your diff, and didn't notice it was breaking on the lack of a Makefile as I was building off of a pristine src dir.

mute
05 Oct 2006, 22:56
Yeah the patch is fine, if you've run configure before. I hadn't as I was using a fresh tarball so it won't apply cleanly. If you were to "make distclean" prior to generating your diff, it would apply cleanly for someone who had just untar'd the 0.9.6 source :)

I am rebuilding my indexes now, this is exciting! I think with date ranges this would probably be good enough to go live with!

orban
05 Oct 2006, 22:59
Fixed the diffs now, yeah the configure was the problem. Sorry about this.

date ranges: I added them...(changed template search_forums, search.php and includes/sphinx.php, it's all edited in my howto post already)...I didn't realise this was built in because it's not used in api/test.php or "search". (It is though in sphinxapi.php).

Now I got a few users wanting the "Show as threads" "Show as posts" back, what did vB think when they added that >.<

I mean what does the search show when you are searching for posts and select "display as posts"? The first post in the thread?

And when searching in thread titles and choose "display as threads"? All threads the posts that are found are in?

The latter is impossible to run on large forums becuase let's say you get 150.000 posts back, then you'd have to sort 150.000 threadids...I think those were those queries I had in my slow log with hundreds of thousands threadids in them...that were killing the server....smart vB.

mute
06 Oct 2006, 00:54
Try to download the patches again....I'm really sorry about this but I never created patches before :(

diff -Naur sphinx-0.9.6/src sphinx-0.9.6-multigroup/src > /home/xxxxxxx/www/public/sphinx.src.diff
diff -Naur sphinx-0.9.6/api sphinx-0.9.6-multigroup/api > /home/xxxxxxx/www/public/sphinx.api.diff

This is what I used.

No need to be sorry! I got it to apply before you fixed it! :D I'm playing with it now. I appreciate you sharing your progress with the rest of us, it saves us a lot of headaches :)

gorman
06 Oct 2006, 11:37
Could anybody create this as a standard plugin?

And... are others seeing the same extraordinary benefits?

orban
06 Oct 2006, 11:43
It is not possible to make this a plugin unless they add a ton of hooks to search.php.

Not to speak of general *n*x knowledge you need to install this anyway.

Owwwww

I forgot a step

Copy the sphinxapi.php to..hmm..some folder. :)

kmike
06 Oct 2006, 13:37
gorman: there is simply no comparison at all between MySQL embedded fulltext search and Sphinx-based search, both in terms of speed and relevance.

BTW, that's what I meant when I was replying to you at vb.com forums, about custom search solution.

ubuntu-geek
06 Oct 2006, 13:38
5. Uncomment "unset($datecut);" (-> "#unset($datecut);") so includes/sphinx.php can use it (for date range search).

Were exactly is this at?

ubuntu-geek
06 Oct 2006, 13:52
Doh, I mean comment it. So it doesn't get unset. >.<


line 12xx

In the section

// ############################################################################
// check if we are searching for posts from a specific time period

Before

// #############################################################################
// check to see if there are conditions attached to number of thread replies
perfect.. Was racking my brain on that one.. So far the implementation has been smooth. Only thing I am not keen on is the screen after a search that says please wait ;)

gorman
06 Oct 2006, 13:53
gorman: there is simply no comparison at all between MySQL embedded fulltext search and Sphinx-based search, both in terms of speed and relevance.

BTW, that's what I meant when I was replying to you at vb.com forums, about custom search solution.Cool. Thanks. And... at least you replied. I'm kind of annoyed that a recognized problem of this magnitude is being left "on its own" by the development team.

ubuntu-geek
06 Oct 2006, 13:55
That's always been there :O

$vbulletin->url = 'search.php?' . $vbulletin->session->vars['sessionurl'] . "searchid=$searchid";
eval(print_standard_redirect('search'));

Modify these lines and a add a straight header("Location:") maybe....
:0 I guess I am tired.. lol

gorman
06 Oct 2006, 13:55
It is not possible to make this a plugin unless they add a ton of hooks to search.php.I'm mainly worried about upgrades... at the rate the vB team is churning them out, it could become a serious hassle to hand-modify templates each time.

ubuntu-geek
06 Oct 2006, 13:59
I'm mainly worried about upgrades... at the rate the vB team is churning them out, it could become a serious hassle to hand-modify templates each time.
For me the speed increase is worth the few template edits..

ubuntu-geek
06 Oct 2006, 14:14
Yeah I have yet to find a better forum solution. And at this point with Threads: 269,003, Posts: 1,588,154, Members: 175,576 I am not going to try and move it. We average like 4,000 users online at once during peaks and alot of that is search traffic.

So cheers orban for finding and sharing this search solution. I was about to implement a google search wrap in the forum template. (ghetto style)

orban
06 Oct 2006, 14:17
I wish google offered a service to crawl your website in a closed environment for $ and then a search form. Like that google search appliance (?) but as an online service.

Oh well :)

ubuntu-geek
06 Oct 2006, 14:24
Shrug yep.. Gee I knew people would complain about three letter searches... :)

orban
06 Oct 2006, 14:26
Haha ;)

ubuntu-geek
06 Oct 2006, 14:55
Any thoughts on adding the option to display results as threads/posts?

orban
06 Oct 2006, 15:05
Well let's have a look:

Search posts and display as threads:

Let's say somebody searches for "book" and returns 150.000 posts. Those 150.000 posts are in 40.000 threads. If you find any way to fetch all 150.000 threadids, sort them and make a unique list of them, then let me know, but I really have no idea how to do that. I also think that this is a major problem of the vB search...(there are queries with several tens of thousands threadids in them).

Search threads and display as posts

I assume that "posts" mean "first posts in a thread"? You can probably add "firstpostid" as a new group for the thread index and then grab those...

Curse vB for adding those options :(

ubuntu-geek
06 Oct 2006, 15:10
Well let's have a look:

Search posts and display as threads:

Let's say somebody searches for "book" and returns 150.000 posts. Those 150.000 posts are in 40.000 threads. If you find any way to fetch all 150.000 threadids, sort them and make a unique list of them, then let me know, but I really have no idea how to do that. I also think that this is a major problem of the vB search...(there are queries with several tens of thousands threadids in them).

Search threads and display as posts

I assume that "posts" mean "first posts in a thread"? You can probably add "firstpostid" as a new group for the thread index and then grab those...

Curse vB for adding those options :(

Gotcha now I understand.. The users will just have to adjust.. Easy as that..

orban
06 Oct 2006, 15:28
I don't understand anyway what exactly the problem is....

If you are searching in thread titles, then the search returns a list of threads.

If you are searching in posts, then the search returns a list of posts.

ubuntu-geek
06 Oct 2006, 15:30
I don't understand anyway what exactly the problem is....

If you are searching in thread titles, then the search returns a list of threads.

If you are searching in posts, then the search returns a list of posts.
Its all good :)

orban
06 Oct 2006, 15:56
Its all good :)

No I meant why vB implemented this behaviour. Not you're asking for it. My english ahah :confused:

ubuntu-geek
06 Oct 2006, 16:10
Hmm got an interesting issue going on. When doing a search from forumdisplay i get this..

Warning: assert(): Assertion failed in /includes/sphinxapi.php on line 249
Query failed: searchd error: invalid or truncated request.

ubuntu-geek
06 Oct 2006, 16:26
Looking at the html source of the forumdisplay it looks like its getting set..

<input type="hidden" name="forumchoice[]" value="73" />

ubuntu-geek
06 Oct 2006, 16:32
Yeah.. exactly what I have.

orban
06 Oct 2006, 16:36
Add a "if ($vbulletin->userinfo['userid'] == 1) echo $forumchoice;" somewhere....to check if the value gets set...dunno... :(

ubuntu-geek
06 Oct 2006, 16:55
Yeah its getting set.. hrm... What version of php do you use?

137
Warning: assert(): Assertion failed in /includes/sphinxapi.php on line 249
Query failed: searchd error: invalid or truncated request.

ubuntu-geek
06 Oct 2006, 18:36
Ok going to try this out now..

Ok that seemed to clean up the assertion issue. Last issue it seems is..

Query failed: searchd error: invalid group5 count 272485 (should be in 0..4096 range).

Hmm ok this seems to be related to how the groupid is counted in the searchd.cpp hrm..

mute
06 Oct 2006, 19:30
Hmm got an interesting issue going on. When doing a search from forumdisplay i get this..

Warning: assert(): Assertion failed in /includes/sphinxapi.php on line 249
Query failed: searchd error: invalid or truncated request.

I'm getting this too. My searches don't appear to be hitting searchd, I'm trying to debug it now as well.

Do only searches that HIT searchd get logged in query.log? My searches from the command line are working fine, but I can't seem to get them to hit searchd anymore via my test site.

Ok, i added that last bit of code but it doesn't seem to be fixed for me. Here's the output of a search targeted to a specific forum:

SphinxClient Object ( [_host] => db2 [_port] => 3312 [_offset] => 0 [_limit] => 250 [_mode] => 0 [_weights] => Array ( [0] => 100 [1] => 1 ) [_groups] => Array ( [0] => 394 ) [_groups2] => Array ( ) [_groups3] => Array ( ) [_groups4] => Array ( ) [_groups5] => Array ( ) [_sort] => 1 [_min_id] => 0 [_max_id] => 4294967295 [_min_ts] => 0 [_max_ts] => 4294967295 [_min_gid] => 0 [_max_gid] => 4294967295 [_error] => searchd error: invalid or truncated request [_warning] => ) Query failed: searchd error: invalid or truncated request.

ubuntu-geek
06 Oct 2006, 19:41
I'm getting this too. My searches don't appear to be hitting searchd, I'm trying to debug it now as well.

Do only searches that HIT searchd get logged in query.log? My searches from the command line are working fine, but I can't seem to get them to hit searchd anymore via my test site.

Ok, i added that last bit of code but it doesn't seem to be fixed for me. Here's the output of a search targeted to a specific forum:

SphinxClient Object ( [_host] => db2 [_port] => 3312 [_offset] => 0 [_limit] => 250 [_mode] => 0 [_weights] => Array ( [0] => 100 [1] => 1 ) [_groups] => Array ( [0] => 394 ) [_groups2] => Array ( ) [_groups3] => Array ( ) [_groups4] => Array ( ) [_groups5] => Array ( ) [_sort] => 1 [_min_id] => 0 [_max_id] => 4294967295 [_min_ts] => 0 [_max_ts] => 4294967295 [_min_gid] => 0 [_max_gid] => 4294967295 [_error] => searchd error: invalid or truncated request [_warning] => ) Query failed: searchd error: invalid or truncated request.
Same issue here..

mute
06 Oct 2006, 19:42
Ok here's what is and isn't working for me:

1) Searching all open forums for keywords - Works
2) Searching in a specific forum by keyword - Does not work
3) Searching all open forums by username - Works
4) Searching in a specific forum by username - Works

ubuntu-geek
06 Oct 2006, 19:46
Ok here's what is and isn't working for me:

1) Searching all open forums for keywords - Works
2) Searching in a specific forum by username - Works
3) Searching all open forums by username - Works
4) Searching in a specific forum by keyword - Does not work
Exactly my issue.. what version of php do you run?

mute
06 Oct 2006, 19:57
I'm using 5.1.5 at the moment. If I printout $forumchoice in sphinx.php, it is getting set, as well as making its way into the Sphinx request array.. I'm a bit puzzled.

ubuntu-geek
06 Oct 2006, 20:32
Hmm nothing yet from me.. You making any progress?

mute
06 Oct 2006, 20:41
nada

ubuntu-geek
07 Oct 2006, 00:09
The one that puzzles me is..

Query failed: searchd error: invalid group5 count 271308 (should be in 0..4096 range).

ubuntu-geek
07 Oct 2006, 00:29
No errors on the command line..

orban
07 Oct 2006, 00:31
Try to use the sphinxapi.php from my tar.gz.....?

Download it again

I think I found the error...I fixed something in my sphinxapi.php and didn't copy it back to /api/

I'm so sorry :(

ubuntu-geek
07 Oct 2006, 00:36
sphinxapi from the gz worked perfect... Orban you rock! Can i send you a donation for this effort? :)

ubuntu-geek
07 Oct 2006, 00:38
<- has had a few beers already.. Yeah i'll send them a donation for sure! Will you be updating this when 0.9.7 is released?

orban
07 Oct 2006, 00:40
Sure thing.

I'll also try to to make "show as posts" "show as threads" happen, but just right now I don't see how it's possible. But you never know what I come up with ;)

ubuntu-geek
07 Oct 2006, 00:51
Right on.. :)

mute
07 Oct 2006, 01:58
Yay! That fixed it for me too!

So, I'm thinking if you do plan on cleaning things up and releasing it at some point as a hack, that it would be best to gather up the "settings" into one file or at the top of the sphinx include. For example I have a multi server setup, so I specify the searchd server's ip rather than localhost, and I've renamed my indexes. To the average joe they might not notice or know how to make those changes to get things working. I'm going to do some more testing later on but things are looking very good :)

orban
07 Oct 2006, 10:47
Yeah...to be honest I intend to do that when 0.9.7 comes out where more than one group is supported natively and things should be a lot cleaner (and prolly faster too).

I also hope I can figure out the show as posts and show as thread until then (tho I believe best would be to use subscriptions for that [a member told me he was searching for his posts + show as threads to track threads he posted in: SUBSCRIPTIONS]).

kmike
07 Oct 2006, 12:46
Search posts and display as threads:

Let's say somebody searches for "book" and returns 150.000 posts. Those 150.000 posts are in 40.000 threads. If you find any way to fetch all 150.000 threadids, sort them and make a unique list of them, then let me know, but I really have no idea how to do that. I also think that this is a major problem of the vB search...(there are queries with several tens of thousands threadids in them).

I assume you're storing threadid as a group attribute, to support searching within threads. So you'll get it back in the search results for every post found.
Just collect all threadid's in an array, throw out the duplicates using array_unique, and voila, you have your results as threads.

Search threads and display as posts

There's no such option, looks like you mean "search titles only". But posts have titles too, you know?

orban
07 Oct 2006, 13:07
I assume you're storing threadid as a group attribute, to support searching within threads. So you'll get it back in the search results for every post found.
Just collect all threadid's in an array, throw out the duplicates using array_unique, and voila, you have your results as threads.

Yeah, just when there's 120.000 posts found......you'd have to increase the limit in sphinx.conf to 200.000 or so, and loop through ALL of them, then throw out the uniques, and then sort by lastpost....!?

There's no such option, looks like you mean "search titles only". But posts have titles too, you know?

Yes there is...you can select "Search Titles Only" and then "Show as Posts"....

I think it returns the first post of all threads found....

ubuntu-geek
07 Oct 2006, 13:31
Orban just curious, how often do you re-index the big index?

ubuntu-geek
07 Oct 2006, 14:30
Right on, I'll give it a go. I do have a weird one though. If I do a search for just a username and leave everything else default it will pull only older threads nothing new. hrm...

orban
07 Oct 2006, 14:32
If you don't enter any search terms the default vB search should be used....

kmike
07 Oct 2006, 20:17
Yeah, just when there's 120.000 posts found......you'd have to increase the limit in sphinx.conf to 200.000 or so, and loop through ALL of them, then throw out the uniques, and then sort by lastpost....!?
What is your search results limit, is it really that high (120000)? I highly doubt it because your current search implementation would choke on that number, too, since the part of the script responsible for the search results display already goes through all the returned results.

So I guess you have more reasonable limit to the number of returned search results (around 1000?). At which point going through all of them suddenly doesn't look so bad.

Yes there is...you can select "Search Titles Only" and then "Show as Posts"....
"Search titles only" combined with "show as posts" should search within the titles of the posts. They happen to be the same as the titles of the threads in the case of a first post in a thread (well, at least in most cases).
Now, the original vB search implementation (non-fulltext one) is following this logic. But vB fulltext implementation is throwing this concept away and searches within the titles of the threads, displaying only first posts in the threads found. I'll let you judge if this is correct or not.

Personally, I, too, think it's too confusing, but it's the legacy of the decision to allow each post to have its own title. Most of the members don't bother to type anything in a post title field when replying, and even if they do, it's completely inconspicuous in the default vB layout (and in the most vB layouts I've seen, for that matter).
But it's there, and it's there for good, so we should bear with it.

*edit*: cool, 100 posts! I'll let it sit there for some time ;-)

orban
07 Oct 2006, 20:21
What is your search results limit, is it really that high (120000)? I highly doubt it because your current search implementation would choke on that number, too, since the part of the script responsible for the search results display already goes through all the returned results.

Well...

Let's assume you have

thread1 - 100 times "word"
thread2 - 50 times "word"
thread3 - 10 times "word"
thread4-50 5 times "word"

A search for "word" will return us 2500 posts. BUT there are only 50 different threads.

If your limit is 1000 (like mine) this will only return like 30 threads. So you're missing out 20......I'm actually seeing this on very common words (when searching post and "show as threads").

----------

1. Search Titles Only - Show as Threads = full text index on thread titles
2. Search Titles Only - Show as Posts = full text index on post titles
3. Search Entire Posts - Show as Threads = full text index on posts but grab threadids and display them, basically grouped by thread
4. Search Entire Posts - Show as Posts = full text index on posts

1., 3., 4. is working already. 2. is not (yet). I'll need to fix this then. (At the moment it's searching thread titles only and displaying the first post).

Also it's not weighting post titles/bodies yet (I think).

TECK
08 Oct 2006, 08:32
Guys, when you compiled Sphinx, did you specified the mysql directory or you simple used --with-mysql?
Thanks.

orban
08 Oct 2006, 11:01
I didn't add anything, make sure you have the mysql-dev stuff installed. What error you getting?

TECK
08 Oct 2006, 23:13
Not getting any errors, just wanted to make sure before I compile it.
I'll let you know if anything comes up.

ubuntu-geek
09 Oct 2006, 15:02
I also added a "$vbulletin->GPC['nocache'] = true;" to the search_process_start hook, I had some queries that stuck and I think that's because vB cached some queries and did some very bad re-sorting on those....try it out.

Could you give me a hint on this one :)

orban
09 Oct 2006, 15:33
Meh I think it was because I deleted the full text indices and ran a MATCH (...) AGAINST query....and mysql kinda crashed...

Should be safe to enable it again.

ubuntu-geek
09 Oct 2006, 15:39
gotcha ok

mute
09 Oct 2006, 16:10
I think I'll do it every 3 days but I don't know yet, we're not very busy right now so the delta indices are quite small.

--------------------------------------------------------------------------------------------------------

I fail to see why this works and I still think there's missing data in these results....

Updated sphinx.conf

Added "IF(firstpostid=0,99999999,firstpostid) as firstpostid" to fields list and "sql_group3_column = firstpostid"

you only need to rebuilt the thread indices.

http://dragy.de/public/sphinx.conf

Updated includes/sphinx.php

http://dragy.de/public/sphinx.php.txt

Update search_forums template

Readded the show as threads, show as posts options...

Rolled back navbar and FORUMDISPLAY templates....back to "show as thread" "show as posts"....

http://dragy.de/public/sphinx_search_forums.template.txt

Updated search.php

Remove

else
{
// bug fix because we don't have "show as posts/threads" anymore
if ($vbulletin->GPC['starteronly'])
{
$vbulletin->GPC['showposts'] = 0;
}
else
{
$vbulletin->GPC['showposts'] = 1;
}
// end bug fix
}




I also added a "$vbulletin->GPC['nocache'] = true;" to the search_process_start hook, I had some queries that stuck and I think that's because vB cached some queries and did some very bad re-sorting on those....try it out.

Can someone summarize what's going on here? I got sorta lost. Are you guys trying to figure out how to do the "view as posts, view as threads" options using sphinx, or making it so those options fall back on the vb search?

orban
09 Oct 2006, 16:13
"Are you guys trying to figure out how to do the "view as posts, view as threads" options using sphinx"

Yes and it seems to work, too.

mute
09 Oct 2006, 16:15
hm, I suppose I will give it a shot then!

Mine seems to be working as intended! Do you guys think the "$vbulletin->GPC['nocache'] = true;" bit in the search hook is needed? I love how this hack seems to be getting simpler as time goes on :)

TECK
09 Oct 2006, 18:47
I just made a script, that will compile easier Sphinx.
It's for people who are not really comfortable with Unix.

1. Open you SSH utility and type vim installscript > Press Enter.

2. Press i (Insert).

3. Paste the following script:
#!/bin/bash
# -----------------------------------------------------------
# Sphinx Compiler
# -----------------------------------------------------------
# This script will compile the Sphinx search engine.
# Make sure you verify all file locations and versions
# before you run this script!
#
# ---------------------------
# Directory Extensions
# ---------------------------
DST_DIR=${HOME}/dist
SPH_DIR=${HOME}/sphinx
SRC_DIR=${HOME}/source
SQL_DIR=/usr
#
# ---------------------------
# File Versions
# ---------------------------
SPHINX="sphinx-0.9.6"
#
# ---------------------------
# File Locations
# ---------------------------
SPHINX_URL="http://sphinxsearch.com/downloads"
#
# ---------------------------
# Install Functions
# ---------------------------
function print_step()
{
tput cud1 ; tput bold
echo $1
tput sgr0
}
function install_ok()
{
if [ $? -ne 0 ] ; then
tput bel
print_step "An error occured during the install. Exiting now..."
exit 1
else
tput bold
echo "OK."
tput sgr0
fi
}
#
# ---------------------------
# SOURCE Directories
# ---------------------------
print_step "Creating the source directories..."
mkdir -p ${DST_DIR}
mkdir -p ${SRC_DIR}
install_ok
#
# ---------------------------
# Download SPHINX archive
# ---------------------------
cd ${DST_DIR}
print_step "Downloading the ${SPHINX} archive..."
wget -nc ${SPHINX_URL}/${SPHINX}.tar.gz
install_ok
cd ${SRC_DIR}
print_step "Extracting the ${SPHINX} archive..."
tar -xzf ${DST_DIR}/${SPHINX}.tar.gz
install_ok
#
# ---------------------------
# SPHINX Install
# ---------------------------
print_step "Creating the install directories..."
mkdir -p ${SPH_DIR}
install_ok
cd ${SRC_DIR}/${SPHINX}
print_step "Starting the ${SPHINX} install..."
sh ./configure --prefix=${SPH_DIR} --with-mysql=${SQL_DIR}
make
install_ok
make install
install_ok
#
# ---------------------------
# Install Cleanup
# ---------------------------
print_step "Deleting all not needed files and folders..."
cd ~
rm -fr ${DST_DIR}
rm -fr ${SRC_DIR}
install_ok
print_step "Install completed successfully."
#
# -------------------------------------------------------
# END
# -------------------------------------------------------

4. Press ESC.

5. Type :wq (Write Quit) > Press Enter.

6. Type chmod +x installscript > Press Enter.

7. Type ./installscript > Press Enter.

Wait for install completion and read the messages.
Post any wierd errors here. You are done. :)

mute
09 Oct 2006, 18:52
I also made a diff against the hacked search.php for vBulletin 3.6.2. To apply, just "patch -p0 < sphinx_search_362.diff" in your src dir.

http://junglist.org/sphinx_search_362.diff

orban
09 Oct 2006, 18:56
TECK, does that apply the multiple group patch?

Also gonna try to add basic sorting (date asc, date desc, relevance) later and fix the post title search.

TECK
09 Oct 2006, 18:57
Orban and other guys, please feel free to edit the script, in order to include all extra patches needed for vBulletin.
Post here the edits and let us know.

TECK, does that apply the multiple group patch?

Also gonna try to add basic sorting (date asc, date desc, relevance) later and fix the post title search.
Nope, just the basic install, with SQL validation... if for some reason the server does not find it by default. It will remove some wierd messages the Sphinx regular install might spit.
That's the reason I posted the script, so you can edit it and add the patches.
It's pretty stright forward, with the Unix commands, you can add them there, following the same patern.

I did not looked into patches, because I'm not familiar with them yet.
I was hoping you will take care of it and post the edits. :)
Also please explain more in detail what you did, others will understand better.

Be aware of those locations:
DST_DIR=${HOME}/dist
SPH_DIR=${HOME}/sphinx
SRC_DIR=${HOME}/source
SQL_DIR=/usr

Type ${HOME} to see what returns returns to your Unix prompt:
$ ${HOME}
bash: /home/user: is a directory

You still use mysqli for the forums and mysql for the search, right?
I have my forums set on mysqli.

ubuntu-geek
09 Oct 2006, 19:50
TECK, does that apply the multiple group patch?

Also gonna try to add basic sorting (date asc, date desc, relevance) later and fix the post title search.
Looking forward to those changes ;P

TECK
10 Oct 2006, 02:38
Ya, I will work on it. :0
Pretty new at patching me also...
Question: in your config file, you don't have any table prefixes?
http://dragy.de/public/sphinx.conf

I'm probably missing something. Are you using a recent vB version, where it has table prefixes?
Thanks for clearing this up.

mute
10 Oct 2006, 02:44
Ya, I will work on it. :0
Pretty new at patching me also...
Question: in your config file, you don't have any table prefixes?
http://dragy.de/public/sphinx.conf

I'm probably missing something. Are you using a recent vB version, where it has table prefixes?
Thanks for clearing this up.

Technically he doesn't need dbname.sphinx_counter either, just sphinx_counter would suffice.

TECK
10 Oct 2006, 02:57
Editing the sphinx.conf file as we speak.

mtgsalvation is a new database where all sphinx tables were created, I believe?
Let me know why you did not created the sphinx table into the vBulletin database. Thanks.

This part:
sql_query = \
SELECT postid, forumid, post.threadid as threadid, IF(post.userid=0,99999999,post.userid) AS userid, IF(postuserid=0,99999999,postuserid) AS postuserid, pagetext, post.dateline \
FROM post ...
Does not have any table prefixes???

mute
10 Oct 2006, 03:13
We're not storing the sphinx data IN mysql, so only one table needs to be created, and that is the sphinx_counter table. mtgsalvation is the name of his vbulletin installation.

TECK
10 Oct 2006, 04:14
Thanks mute... however, I'm not clear with the query above.
It does not make sense. The sql_query posted in his .conf file will not work, if the database tables have prefixes.
Please explain more in detail why you don't need table prefixes.

Also, from his .conf file:
sql_db = mtgsforums

That's why I'm confused...

orban
10 Oct 2006, 09:27
Okay, in my example:

mtgsforums: my vbulletin database
mtgsalvation: the database with the counter table in it

I do NOT have table prefixes. You have to add those.

TECK
10 Oct 2006, 11:48
That makes a lot of sense... I was expecting this answer from my previous post above, Orban.
Thanks for the explanation. :)

mute
10 Oct 2006, 17:19
OOC, why'd you put the counter table in a different database than your vb forum, just standard practice?

orban
10 Oct 2006, 17:22
Yes, and I intend to probably have more sphinx indices for other things in future. It has nothing to do with vB so I want it seperate.

mute
12 Oct 2006, 17:27
Hm, I've got your latest changes running, but sorting results by date doesn't seem to be working, I get the same results if i choose relevancy or by date.

orban
12 Oct 2006, 17:41
Sorry I once again forgot to update the downloadable includes/sphinx.php file again >.< Try now. Relevancy asc/desc doesn't work, it's always descending (highest relevant at top, obviously).

mute
12 Oct 2006, 17:56
I just wget'd the sphinx.php (to make sure i wasn't caching), edited it to my liking, and my search results when selecting "Show as threads", "one month ago", "search all forums", and "sort by last post, descending" aren't sorted in any sort of method I can figure out, am I missing something?

orban
12 Oct 2006, 18:05
It's working fine for me :(

Are you sure the search isn't getting cached?

mute
12 Oct 2006, 18:23
This is why I shouldn't be working on an empty stomach! It is working as intended :)

Brains
12 Oct 2006, 20:06
HOE LEE SHIYYYTEE!!!! This absolutely rocks. I was a little skeptical, but I went ahead and built the "worst case" index with Sphinx (no stopwords, 4.7M posts, min word length of 1) and tried some typically VERY difficult searches (from the command line). WOW... This sucker is unbelievably fast...

Time to stitch it into my forums. This is amazing, GREAT find, and THANK YOU for sharing!

ubuntu-geek
12 Oct 2006, 21:02
updated and working awesome!

ALanJay
13 Oct 2006, 17:30
Ah, I found the problem I think.

For whatever reason, on my initial index, despite having used --rotate, it is leaving *new* index files in my var dir:

[root@db2 var]# ls -la *new*
-rw-r--r-- 1 root root 1356935444 Oct 2 13:39 vbpost.new.spd
-rw-r--r-- 1 root root 10644727 Oct 2 13:39 vbpost.new.spi
-rw-r--r-- 1 root root 54322284 Oct 2 13:42 vbthread.new.spd
-rw-r--r-- 1 root root 879893 Oct 2 13:42 vbthread.new.spi

Sphinx won't search against these, but I'm not sure why they didn't roll over.

I found the same thing but just renamed the files without the ".new" and all was fine. :)

The sugestion to nuke everything and start again without searchd running didn't seem to work either.

This is a case where the first time you create stuff there is an issue but after that it all works fine - when I update the DELTA files there doesn't seem to be an issue.

orban
13 Oct 2006, 17:32
Yeah maybe --rotate doesn't work when you create them first time.

ALanJay
13 Oct 2006, 22:00
Curiously when I added anither database to the config file that sphinx database also refused to rotate just creating .new files looks like some kind of bug.

On another track has anyone tried accessing the sphinx searchd from another host?

I tried using the php api and at first test it refused to connect from a remote host but works when on the same machine (but referencing an IP address rather than localhost).

orban
13 Oct 2006, 22:05
Firewall?

That rotate seems to be bugged mm...yeah..just if you have a new config file entry just create that one alone...without --rotate, first time.

ALanJay
13 Oct 2006, 22:23
Firewall?

Turned out that as well as a place holder in sphinxapi.php the "localhost" was also hard coded into the test.php code provided :)

That rotate seems to be bugged mm...yeah..just if you have a new config file entry just create that one alone...without --rotate, first time.

Yes it is very odd. With the new database that I have added if I don't use --rotate it overwrites the current file but if I use --rotate it creates .new files (even subsequently). The other files are still in place and rotate correctly when I install the DELTA files on the main forum databases every 5 minutes.

So as you say a bit confising but other than that pretty impressive :)

An update to this - I relised that maye the problem is that searchd needs to be fuly restarted to re-read the config file before it knows about the new files and allows them to be rotated.

Starting a new data set in Sphinx seems to require:

1) creating without the --rotate flag
2) Stopping searchd completly (kill `cat /var/log/searchd.pid`)
3) restarting searchd with the updated config file sphinx.conf

After those changes things seem to once again work. :)

Hi another update / query

Well having managed to get sphinx up and running and the test.php element searching the data we thought we would try the next steps.

Unfortunately we are still using 3.0.x and the search.php has changed a huge amount :(

I don't suppose anyone has tried adding sphinx search to 3.0.x?

Looking at the changes suggested I can find c1 and c2 - though the variable names have changed along with c4 and c5.

Obviously with the variable name changes oban your very useful sphinx.php will need various changes to the variables.

But if anyone has tried this with 3.0.x please let me know :)

ALanJay
15 Oct 2006, 22:06
Hi another update / comment / query :)

We seem to have managed to get things working with 3.0.x but when testing see:

Warning: assert() [function.assert]: Assertion failed in /includes/sphinxapi.php on line 209

This doesn't seem to be fatal in any way and the search function works any ideas what this is trying to achieve :)

Overall thanks to Oban for the code to make this all work it seems to do an excellent job.

orban
16 Oct 2006, 07:20
What's on line 209?

ALanJay
16 Oct 2006, 09:17
Hi,

Well further to the above doing various test searches which all seem to produce the correct results I have discovered a couple more of these anomalies :)

when I set various search options - ie user or forums or date as well as text search I get these errors in sphinxapi.php there are various assertion test ie

line 209:

assert ( is_int($limit) );

line 234

/// set groups
function SetGroups ( $groups )
{
assert ( is_array($groups) );
foreach ( $groups as $group )
assert ( is_int($group) );

$this->_groups = $groups;
}

It looks like the defaults set in sphinx.php line 75

$cl = new SphinxClient ();
$cl->SetServer ( $sphinx_server, $sphinx_port );
$cl->SetWeights ( $sphinx_weights );
$cl->SetLimits ( 0, $vboptions['maxresults'] );
$cl->SetMatchMode ( SPH_MATCH_ALL );
$cl->SetGroups ( $sphinx_groups );
$cl->SetGroups2 ( $sphinx_groups2 );
$cl->SetGroups3 ( $sphinx_groups3 );
$cl->SetGroups4 ( $sphinx_groups4 );
$cl->SetGroups5 ( $sphinx_groups5 );
$cl->SetSortMode ( $sphinx_sort );

And before this some times for some searches they are set to strings line 52

$sphinx_forumid_group = 'group';
$sphinx_switch_group = 'group2'; //threadid
$sphinx_userid_group = 'group3';

This doesn't seem to effect the results but the assertion fails when the elements are not integers.

In the case of line 209 (sphinxapi.php) and line 75 (sphinx.php) these can be made to be (forced) to integers as they are obviously numbers ie

$cl->SetLimits ( intval(0), intval($vboptions['maxresults']) );

But I am not certain about the other elements and options which because the defaults are text strings don't work in the same way.

Anyway hope that helps.

By the way if anyone wants the recipe for using Sphinx with 3.0.X then let me know and I can remove my specific defaults and post it here. The biggest change is the recoding from OOP to the old style referencing of variables. But there always seem to be ones that meet the same requirements.

The only other things to spot are the changes to the search.php from the Vb code which follows the examples that Oban gave but obviously in slightly different locations in search.php ie

Make change c1 at around line 304
Make change c2 at around line 331
Make change c3 at around line 1210
Make change c4 at around line 1414
Make change c5 at around line 1147

sphinx.php see the diff file attached.

Once again very cool work Oban and we should also thank the Andrew Aksyonoff over at www.sphinxsearch.com

ALanJay
16 Oct 2006, 12:57
Hi,

Having done more research the warning errors can be switched off by adding:

assert_options(ASSERT_ACTIVE, 0); // 0 off or 1 on

At the top of the sphinxapi.php script. It might be better to stop the the reasons the warnings are being created but at least it gives on the option to see or not to see them.

Another curiosity is on our forum the searches all seem lighting quick EXCEPT when you look for exclusively the "thread started by user" this can take over a minute to give back a result.

If you add additional requests - limit the date / thread content / forums to search the time it takes is reduced.

Finally when searches are processed every one of them you see the redirect page to show that it is being processed when you do a "thread started by user" search you don't see that label so not sure if my changes to search.php have caused this anyone any ideas or does the same happen on 3.6.x?

Regards
ALan

mute
16 Oct 2006, 22:28
orban, did you ever fix the searching of post titles? I thought I was running your latest code, but it seems to be broken on my devel install.

orban
16 Oct 2006, 22:32
It also searches them in my latest version (you can set relevancy in the sphinx.php) but it doesn't quite work like the default vB search (yet).

ALanJay: I would not remove the asserts, because they might create invalid requests to the searchd. Also the being processed is a vB thing.

ALanJay
17 Oct 2006, 08:40
ALanJay: I would not remove the asserts, because they might create invalid requests to the searchd. Also the being processed is a vB thing.

OK - does anyone else get "assert" warnings?

What I have done is set the warning messages off

assert_options(ASSERT_ACTIVE, 0); // 0 off or 1 on

in sphinxapi.php

As far as I can see the assert errors are generated because the asserts all check to see if things are integers and some of the input defaults are either text strings numerics or text strings.

These warnings don't seem to effect the output which seems to work pretty well. But with some of the more complex searches it is possible to produce array warning errors ie

Warning: in_array() [function.in-array]: Wrong datatype for second argument in /includes/sphinx.php on line 125

The line there looks like

if (!can_moderate($docinfo[$sphinx_forumid_group]) AND i
n_array($docinfo[$sphinx_userid_group], $Coventry))

So this warning implies that one of the items is the wrong datatype - checking back through the code on line 34 and 50 these are set to:

$sphinx_forumid_group = 'group';
$sphinx_switch_group = 'group2'; //threadid
$sphinx_userid_group = 'group3';

Is this the issue? Should they be numeric?

For anyone interested this is now live at:

www.digitalspy.co.uk/forums/ (http://www.digitalspy.co.uk/forums/)

We have 11,158,584 Posts and 464,239 Threads. And the main data file is a little over 4Gb in size.

It is still a work in progress but it does seem to produce the correct results :)

But with some of the more complex searches it is possible to produce array warning errors ie

Warning: in_array() [function.in-array]: Wrong datatype for second argument in /includes/sphinx.php on line 125

The line there looks like

if (!can_moderate($docinfo[$sphinx_forumid_group]) AND i
n_array($docinfo[$sphinx_userid_group], $Coventry))



After much thought we realised that we don't use the $Coventry feature and I suspect that is the reason it does not work. As I'm not sure what $Conventry should resolve to I have removed from my implementation the whole line. It seems to say if not moderator and sent to Coventry then don't do search and as we have no people in the secodn category removing it seems to be the best short term solution.

I'm not sure if this is an issue between 3.0.x and 3.5/3.6 but thought I would share my thoughts on this as it kept me on my toes and I now have a much better understanding of the way the code works :)


PS the docinfo[$spinx????] elemets turn the group defaults into numerical output as required. I'm still not sure why the assert errors are being seen though will delve deeper :)

PPS Well after more searching and playing I am no further forward as to why the assert warning errors are occuring. Trying to force the elements to be integers with intval breaks the code :) so I am now with a system that seems to work but generates warning errors that I have switched off. I assume no one using 3.6 is having these issues with these assert warnings?

orban
17 Oct 2006, 10:58
Why does intval() break any code?

And maybe the $Coventry variable is something else in vB 3.0...

I'm really sorry I can't be of any further assistance here but I'm not running vB 3.0 :(

ALanJay
17 Oct 2006, 12:07
Why does intval() break any code?

That is a very good question. I suspect I am not using it 100% correctly but in the simplest example line 32

$sphinx_groups2 = $sphinx_userids;

to

$sphinx_groups2 = intval($sphinx_userids);

Seemed to cause odd behaviour.

I was also seeing if using it in:

if (!empty($userids)) $sphinx_userids = explode(',', $userids);
else $sphinx_userids = array();
if ($forumchoice != '') $sphinx_groups = explode(',', $forumchoice);

But wasn't sure I could use it in this context.

My problem is that not entirely understanding the logic of what is going on here (but learning as I go along). I'm not sure why I am seeing the "Warnings" yet they generate perfect results.

Depending on the results each of the elements "SetGroups" "SetGroups2" SetGroups3" generate these warning errors but because these are arrays I need to build the array with integers and I assume not numerics that are text(?)

And maybe the $Coventry variable is something else in vB 3.0...

It is possible - from talking to my system admin it allows you to not allow users to do certain things. After thinking about this I don't think it is an issue as we don't use it. So for me removing it solves the problem that the second element of the if statement that checking if the user has been sent to Coventry isn't nescessarry.

I'm really sorry I can't be of any further assistance here but I'm not running vB 3.0 :(

No problem without your code we wouldn't have been able to do this at all. So thanks so much.

I assume you don't see any of the assertion errors in vB 3.6 ?

Anyway as you can see (if you register on our site) the Sphinx search does work and very smoothly and quickly and great solution to off looading the search function out of the main database.

One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?

Swamper
21 Oct 2006, 09:51
One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?

Why not have that specific search just redirect to the standard vB search.php? It's fast.

----

Found my way here via the Big Boards Thread on vB.com - wow - I'm going to get on this right away! :D We're moving from a heavy modded 6.5+ million post vB2 to 3.6 in the coming weeks and for over a year now we've survived only because our search was split up into separate tables according to date range - updated nightly - and stored on another drive, but with 'Search this Thread', 'View New Posts' and 'Find all posts by User' acting on the live post table.

kmike
23 Oct 2006, 06:48
We're moving from a heavy modded 6.5+ million post vB2 to 3.6 in the coming weeks
Be warned that vB 3.6 is much more CPU demanding than vB2 (and even vB3), so you'd better beef up your web frontend(s) before the final switch.


Let's assume you have

thread1 - 100 times "word"
thread2 - 50 times "word"
thread3 - 10 times "word"
thread4-50 5 times "word"

A search for "word" will return us 2500 posts. BUT there are only 50 different threads.

If your limit is 1000 (like mine) this will only return like 30 threads. So you're missing out 20......I'm actually seeing this on very common words (when searching post and "show as threads").
Yes, that's exactly how vB search works in this specific case.
The solution? Don't search for the common words, it won't do any good in any case. Or better, narrow your search by adding more specific keywords.

orban
23 Oct 2006, 09:49
Yes, that's exactly how vB search works in this specific case.
The solution? Don't search for the common words, it won't do any good in any case. Or better, narrow your search by adding more specific keywords.

Yeah but you can't really control user behaviour. There'll always be the guy to put the keyword in the search form that's used in 100.000 threads.

ALanJay
23 Oct 2006, 15:40
It's because they are both arrays, or a string of comma seperated numbers?

You'd have to use array_walk, lemme know if you need help.

I everntually worked this out but never managed to get it to work sucessfully I assume something in difference between the way 3.0.x and 3.6 handles these casues a problem. Because it is only a warning I have left it - maybe next time there is an opportunity to play I will have another go with array_walk if I can fathom the syntax to get everything switched from numerals as text to integers.

With or without key words?

If it's without it's using the default search and I can't really help with that.

Without which I now understand why it is slow and we have removed it from our choiced 1 minute to bring back the answer was a little long.


Overall it has been running now for a week and once we sorted a few things out it has been excellent and using your cool current and DELTA index the databases are updated every 15 minutes and the whole site reindexed every night.

Thanks for the ideas this has been an excellent tool and remarkably easy to implement.

orban
23 Oct 2006, 15:56
function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.

Glad to hear it works for you!

ALanJay
23 Oct 2006, 16:08
function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.

I will have a play - thanks.

Glad to hear it works for you!

Seems to :) Of the various hacks and atempts to solve the text search issue this one seems to have delivered on its goals. There are still a few things I don't understand and which would probably improve performance but overall it works well.

Maybe you have some ideas on the issues:

morphology = none
stopwords =
min_word_len = 3
charset_type = sbcs
}

What do morphology and stopwords do / offer and how to best use them.

and

mem_limit = 256M
}

mem_limit for creating the index anyone have any views as to sensible optimum answer for this we are running this on a machine with 8Gb of RAM and as it started as 32M I didn't want to make it too big but it still complains it could be better :)


==============

Looking in the original configuration file I think I have a handle on the morphology, word_len and char set.

Would I be right in saying that the stopwords file is a list of words NOT to index?

If so does anyone have a good list of 2 and 3 letter words that can happily be removed from an index :)

==============

Looking on the sphinxsearch forums there is discussion on creating stop words and the indexer can produce list of most used words for you to work with ie

/usr/local/bin/indexer --config sphinx.conf --rotate --buildstops sphinx-stop.txt 1000 --buildfreqs

This builds a file with the most commonly used words in the index and the frequencythat they are in your index.

If I understand this correctly it should allow you to remove a few of the obvious things.

function intvalArray(&$item, $key)
{
$item = intval($item);
}

array_walk($array, "intvalArray");

untested, but that's the idea.



Hi,

Looking at the code in sphinx.php:


if ($titleonly)
{
// searching thread titles
$sphinx_index = $sphinx_thread_index_name;
$sphinx_groups2 = $sphinx_userids;
$sphinx_forumid_group = 'group';
$sphinx_switch_group = 'group3'; //firstpostid
$sphinx_userid_group = 'group2';
// only titles, nothing to weight
$sphinx_weights = array ( 1 );
}


Where do you put the array_walk manipulation?

As far as I can tell one needs the results of the various items above to be so processed.

or do you implement it:


$cl = new SphinxClient ();
$cl->SetServer ( $sphinx_server, $sphinx_port );
$cl->SetWeights ( $sphinx_weights );
// $cl->SetLimits ( 0, $vboptions['maxresults'] );
$cl->SetLimits ( intval(0), intval($vboptions['maxresults']) );
$cl->SetMatchMode ( SPH_MATCH_ALL );
$cl->SetGroups ( $sphinx_groups );
$cl->SetGroups2 ( $sphinx_groups2 );
$cl->SetGroups3 ( $sphinx_groups3 );
$cl->SetGroups4 ( $sphinx_groups4 );
$cl->SetGroups5 ( $sphinx_groups5 );
$cl->SetSortMode ( $sphinx_sort );



ie

$cl->SetGroups4 ( (array_walk( $sphinx_groups4, "intvalArray") );

I assume it doesn't matter that sometimes the array will be one element long.

=====================================




Originally Posted by ALanJay
One final question. Everything runs very quickly and smoothly except one search "Find Threads Started by User" which is extremly slow. Do you have the same problem with 3.6?

With or without key words?

If it's without it's using the default search and I can't really help with that.


Originally Posted by Swamper
Why not have that specific search just redirect to the standard vB search.php? It's fast.

Searches without keywords already are redirected to the default search.


Just curious "orban" having done some more checks when doing just a user "Find Threads Started by user" it is over a minute with the size of files we have - and from what you are saying this is the standard vB result. While once you add an addional key - search string it all works much faster as it is using Sphinx (is that right?).

Is there a reason you didn't code that using Sphinx?

kmike
24 Oct 2006, 10:16
Yeah but you can't really control user behaviour. There'll always be the guy to put the keyword in the search form that's used in 100.000 threads.
Well, it's their own fault then ;-)

orban
24 Oct 2006, 22:49
Well if it's crashing the server it's not :<


Just curious "orban" having done some more checks when doing just a user "Find Threads Started by user" it is over a minute with the size of files we have - and from what you are saying this is the standard vB result. While once you add an addional key - search string it all works much faster as it is using Sphinx (is that right?).

Is there a reason you didn't code that using Sphinx?

Yeah but you'd have to add a fake string to all posts...mm....that's what you do right?

ALanJay
25 Oct 2006, 07:29
Yeah but you'd have to add a fake string to all posts...mm....that's what you do right?

If only it was that easy :)

If you enter a space it thows it away and does the standard search if you enter a single character it just says there are no such matches. If you have a longer string it only finds occurences that match not all all of them :)

mute
25 Oct 2006, 18:26
We're rolling out our sphinx search when we upgrade our site to 3.6.2 on thursday. I'm hopeful that it will live up to my testing, but I am a tad worried that the "find posts by user" searches will be a bit pokey. I'm don't think it warrants a lot of concern given how often that particular type of search is actually done though..

Orban, have you been working on any other surprises lately? :)

orban
25 Oct 2006, 20:10
You mean "find posts by user" without key words yeah?

No, haven't had a lot time lately. ;(

ALanJay
25 Oct 2006, 21:22
We're rolling out our sphinx search when we upgrade our site to 3.6.2 on thursday. I'm hopeful that it will live up to my testing, but I am a tad worried that the "find posts by user" searches will be a bit pokey. I'm don't think it warrants a lot of concern given how often that particular type of search is actually done though..

Just to make clear that "find posts by user" is fine and works very fast it is "Find THREADS STARTED by user" that is solw (still uses the internal code in vB - I think from the comments made).

Overall we have been using sphinx search now for nearly a week and it seems to work very nicely, the DELTA file is generated every 5 mintes with the full file being rebuilt each night in the early hours of the morning. Our vB main data file is around 4Gb in size (Threads: 467,561, Posts: 11,271,241, Members: 173,321).

We have disabled the "Find Threads STARTED by user" option for non admin users from the search options as a temporary measure - I'm not sure anyone used it in anycase.

Obviously in an ideal world it would be nice if this was searchable but it isn't a deal breaker.

kmike
26 Oct 2006, 05:49
AlanJay, what's your MySQL version? I remember dealing with a bug in MySQL 4.0.x when "find threads started by user" query didn't use the proper indexes. Repairing the thread and post tables fixed that, but after some time the problem crept back in.

Update: found the same bug in the bookmarks, it appears not only 4.0.x are affected: http://www.vbulletin.com/forum/bugs.php?do=view&bugid=4159
Looks like an intermittent index loss or corruption.

Also, FYI sphinx-0.9.7-rc1 has been released.
Another update: forgot to say that the crash bug has been fixed in RC1.

ALanJay
26 Oct 2006, 07:22
AlanJay, what's your MySQL version? I remember dealing with a bug in MySQL 4.0.x when "find threads started by user" query didn't use the proper indexes. Repairing the thread and post tables fixed that, but after some time the problem crept back in.

We are using 4.1 version of mySQL (clients 4.1.18 to 4.1.21 and server 4.1.19).

Though I suppose the real goal is to remove this from the standard vB search and put it into a search done by sphinx (if possible) as with tables as big as ours I can understand why the results might take a little time to be returned.

Update: found the same bug in the bookmarks, it appears not only 4.0.x are affected: http://www.vbulletin.com/forum/bugs.php?do=view&bugid=4159
Looks like an intermittent index loss or corruption.

Thanks will take a look.

Also, FYI sphinx-0.9.7-rc1 has been released.
Another update: forgot to say that the crash bug has been fixed in RC1.

Interesting from discussions it sounds like there is quite a number of changes so I assume there might be more to upgrading than a simple rebuild once it is properly released.

orban
26 Oct 2006, 10:35
Yeah, I just saw 0.9.7-RC1 got released.

I think I will wait for one more release candidate or even for the final version, because I'm sure there'll be bugs. Once it's release I'll create a new how-to. It will probably not just be a simple rebuild of indices, yeah.

ALanJay
26 Oct 2006, 11:59
Yeah, I just saw 0.9.7-RC1 got released.

I think I will wait for one more release candidate or even for the final version, because I'm sure there'll be bugs.

:) - as the current version works so well I'm in no hurry.

Once it's release I'll create a new how-to. It will probably not just be a simple rebuild of indices, yeah.

I suspected as much. Good luck when he gets that far.

By the way did you have a sugestion as to where was the best place to do the arraywalk in your sphinx.php code?

Can it be implemented in the lines like?


ie something like


$cl->SetGroups2 ( array_walk($sphinx_groups2, "intvalArray") );

Obvioulsy with the function elsewhere in the code.

function intvalArray(&$item, $key)
{
$item = intval($item);
}

When I try I get another error:

Invalid argument supplied for foreach()

in Sphinxapi.php in the places that the int_val check takes place.

kmike
26 Oct 2006, 12:22
Well, as the creator of the multi group column support for 0.9.5 I can honestly say that it was a terrible copy/paste hack. 0.9.7 has them implemented properly, and the resulting index takes much less space which is always good from the I/O standpoint.

orban
26 Oct 2006, 12:25
By the way did you have a sugestion as to where was the best place to do the arraywalk in your sphinx.php code?

Can it be implemented in the lines like?


ie something like


$cl->SetGroups2 ( array_walk($sphinx_groups2, "intvalArray") );

Obvioulsy with the function elsewhere in the code.

function intvalArray(&$item, $key)
{
$item = intval($item);
}

Yeah that's alright.

Well, as the creator of the multi group column support for 0.9.5 I can honestly say that it was a terrible copy/paste hack. 0.9.7 has them implemented properly, and the resulting index takes much less space which is always good from the I/O standpoint.

Hm okay. I'll have a look then. Thanks for telling me :)

ALanJay
26 Oct 2006, 13:16
Well having tried that:

$cl->SetGroups2 ( array_walk($sphinx_groups2, "intvalArray") );


I end up with errors from sphinxapi.php

Invalid argument supplied for foreach()

in the code that look in the array for the values to be checked as integers. I'll have to do some more playing when I have some time. :(

orban
26 Oct 2006, 13:26
Oh I'm sorry.

array_walk doesn't return the new array.

Do
array_walk($sphinx_groups2, "intvalArray");
$cl->SetGroups2 ( $sphinx_groups2 );


---

Also at the moment rewriting sphinx.php for 0.9.7-RC1.

In sphinx.conf just minimal changes were necessary, sphinx.php quite some changes. Currently recreating indices so I can start playing :)

---

Running 0.9.7-RC1. Minimal changes to sphinx.conf, a huge change to sphinx.php and it's running :) Will upload upgrade howto and full howto later! Going to the gym now, need to get strong! "Strong Mind, Strong Body"!?

ALanJay
26 Oct 2006, 13:55
Oh I'm sorry.

array_walk doesn't return the new array.

Do
array_walk($sphinx_groups2, "intvalArray");
$cl->SetGroups2 ( $sphinx_groups2 );



Thanks - that seems to work. Curiously I am now getting assertion errors further down in the date element:

/// set timestamps to match
function SetTimestampRange ( $min, $max )
{
assert ( is_int($min) );
assert ( is_int($max) );
assert ( $min<=$max );
$this->_min_ts = $min;
$this->_max_ts = $max;
}

Which is most odd. Looks like in 3.0.x everything is held as text.

It appears adding:

$datecut = intval($datecut);

Just before datecut is first looked at seems to sort that out.

As I assume intval doesn't do anything harmful so it can be used generically.

Thanks oban for the guidence.

orban
26 Oct 2006, 13:57
Yeah that's weird. Just call

SetTimestampRange

with SetTimestampRange ( intval ( ... ), intval ( ... ) );

ALanJay
26 Oct 2006, 14:23
Yeah that's weird. Just call

SetTimestampRange

with SetTimestampRange ( intval ( ... ), intval ( ... ) );


Thanks - I ended up just forcing datecut with intval (see above) and leaving it like that as the other elements in the various palces SetTimestampRange are set don't seem to be causing an issue.

mute
26 Oct 2006, 18:35
Orban, lookin good. I have mine up and running just in time for our upgrade tonight :)

One thing i had to change though -- I limit my search results to 500, and on line 101 of sphinxapi.php there is a

$this->_maxmatches = 1000;

This will throw an error if you try to request 1000 results via php from a searchd that is limited to less than 1000. I changed my sphinxapi to match and its all good :)

Just an FYI!

orban
26 Oct 2006, 18:40
Cool, glad to hear it works :) I honestly didn't do much testing but the few things I tried seemed to work and it's just a few different function calls. And if something it wrong I sure get a notice here quickish :D

mute
26 Oct 2006, 19:13
Edit: Nm! I forgot that i had sphinx pointed at a vb 3.5.2 database, not a 3.6.2 :)

orban
26 Oct 2006, 19:22
Well, dateline is supposed to be the sql_date_column? I thought that was the case, but I guess it's not. Looks like all groups even the date one have the sql field as their name.

Edit: OOooo! This means we can also implement sort by userid and forumid!? (And all other sorting options but they will have a quite big delay...like if you update only every 24 hours they will be 24 hours old).

mute
26 Oct 2006, 19:25
Eh, I'm not sure it is needed. Since my main site isn't 3.6 yet (tonight is the upgrade), I had my sphinx looking at my live data, and my vb install looking at a test db (with older data).

The dateline column on thread was added in 3.6, which is why I think it was broken. I'll know more tonight when we upgrade :)

Were you running into the dateline error on thread title?

orban
26 Oct 2006, 19:30
I thought "dateline" was the internal name for whatever field you have defined as the sql_date_column. But it looks like that isn't the case. I wonder why there even still is an sql_date_column. Because the new SetSortMode() can take ANY column. This confused me a bit.

mute
26 Oct 2006, 19:32
Ahh, you are right. I updated my sphinx.php and it seems better now.

orban
26 Oct 2006, 19:35
It's still bugged. Lemme figure this out. When I fix one the other breaks. Haha.

mute
26 Oct 2006, 19:36
Hm, getting something similar targeting a specific forum with an "entire posts" search:

Warning: assert() [function.assert]: Assertion failed in /sphinxapi.php on line 284
Query failed: searchd error: index 'vbpost': sort-by attribute 'lastpost' not found.

Again, could be my currently messed up setup, it is hard to tell until I get the upgrade done tonight :)

Edit: haha didn't see that something else was broken. I'll leave the testing to you hehe

kmike
26 Oct 2006, 19:39
I thought "dateline" was the internal name for whatever field you have defined as the sql_date_column. But it looks like that isn't the case. I wonder why there even still is an sql_date_column. Because the new SetSortMode() can take ANY column. This confused me a bit.
Because AFAIK Sphinx sorts by dateline even when requested sort is by another column - e.g. for the relevance you'll get results basically as ORDER BY relevance DESC, dateline DESC.

Also, better replace "docinfo = inline" with "docinfo = extern" if you have RAM to spare, according to sphinx.conf.dist. Check the relevant section.

orban
26 Oct 2006, 19:45
Okay, I had to add a new variable (that gets modified by whether you search in threads or posts) and now a much nice sorting code block.

http://dragy.de/public/sphinx/sphinx.php.txt

I tried all four combinations so it must work now!? Did I say this before? ;)

Hmm, about the docinfo, gonna have a look. We're only gifted with 2GB RAM for our forums and in a few months we'll have 2 million posts. Not good. I'll make a note in the installation.

mute
26 Oct 2006, 19:48
The box I'm running sphinx on has 2gb of ram (it is just a slave db server). MySQL has been using a ton of memory lately because we're slaving searches to it, but after we switch to sphinx tonight I think it'll even out a lot. I'm thinking that extern might be better (we have 25 million posts). I'm not entirely sure, but I guess I can try both ways w/o much reconfiguration :)

orban
26 Oct 2006, 19:50
Also, better replace "docinfo = inline" with "docinfo = extern" if you have RAM to spare, according to sphinx.conf.dist. Check the relevant section.

( 1 + number_of_attrs )*number_of_docs*4 bytes

( 1 + 5) * 1.5 million posts * 4 bytes = 34 megabytes.

Might be worth keeping external, you're right.

I'm running everything on one box, so I'm kinda really short on ram, but 34mb seems to be doable. :D

mute
26 Oct 2006, 20:10
Hm, 600mb for 25 million posts, I guess I'll give it a shot :)

So, we just rolled out our sphinx search. All is well, but some users are reporting a ton of warnings after clicking submit and before they get their results.

I'm not exactly what is causing it, but the error is:

Warning: in_array() [function.in-array]: Wrong datatype for second argument in /includes/sphinx.php on line 142

Looks as though $sphinx_conventry_id is not getting set on this line?

if (!can_moderate($docinfo['attrs'][$sphinx_can_moderate_forumid]) AND in_array($docinfo['attrs'][$sphinx_conventry_userid], $Coventry))

Any thoughts?

orban
27 Oct 2006, 15:00
Second argument is $Coventry...? That variable is from vBulletin... :(

No idea what's causing this. Do you have Conventry disabled maybe?

And I strongly suggest turning off errors for users and log them to a file.

mute
27 Oct 2006, 15:03
erm yeah, it's $Coventry. We don't use it, so that is probably why. For now i just set error_reporting to 0 and it went away :)

ALanJay
27 Oct 2006, 15:04
I think it is slightly amusing that with this upgrade people are now seeing very similar errors to the ones that I have with vB 3.0.x

The assert function errors are caused by text numbers being passed rather than integers. You will see further up a solution using array_walk to make sure that that does not happen.

The coventry ID one I also had and if you don't use that function just remove the query line in sphinx.php short term until the real problem/solution is discovered :)

Good luck. :)

orban
27 Oct 2006, 15:05
It's weird though it doesn't even get set...should at least be an empty array. And why is it even a seperate global and not $vbulletin->coventry?

Just remove the line if you aren't using coventry then.

Such a mess ;)

mute
27 Oct 2006, 15:09
Alanjay, you mean you're surprised that jelsoft code is buggy and hard to work with? I'm not :)

orban
27 Oct 2006, 15:12
Haha

:D

I really wish they removed all the global variables. They're the pain in the ass to be honest.

ALanJay
27 Oct 2006, 16:21
as oban says :) ha ha

We should be very grateful that we can hack it to bits and get it to do stuff that makes it so expandable.

orban
27 Oct 2006, 16:24
IMO they must add a hook before and after every statement. So you can completely modify any part.

weeno
30 Oct 2006, 01:24
does this require a specific version of mysql? (I'm at 4.0.x)

I get an error on indexing

ERROR: sql_range_query: You have an error in your SQL syntax. Check the manual that corresponds to your MySQL server version for the right syntax to use near 'SELECT max_doc_id FROM sphinx_counter WHERE counter_id = 1 )'

thanks
arn

orban
30 Oct 2006, 04:16
Have you created the sphinx_counter table?

weeno
30 Oct 2006, 05:32
Have you created the sphinx_counter table?

yeah... I think it has to do with 4.0.x not allowing nested SELECT queries. (subqueries).

that support didn't come until 4.1.

arn

orban
30 Oct 2006, 15:45
Oh. Try asking in the sphinx forums maybe there's a way to avoid the nested queries.

amcd
02 Nov 2006, 11:09
followed the howto and installed flawlessly in one shot

the first look is very very encouraging

will report again after few days

btw, my forum is Threads: 85,829, Posts: 3,686,297, Members: 175,810

orban
02 Nov 2006, 16:05
Glad to hear :)

mute
02 Nov 2006, 20:52
Our search is working flawlessly, and seeing ~4000 searches per day, which isn't too shabby at all.

Question for you guys, as I can't seem to find much in the way of documentation regarding the searches.

Does sphinx support "OR"? If you were to search for "test one two", it searches for all three with an implicit "AND". If you search for "blah not bleh", it will search for "blah -bleh". If you search for "test or task or mask", it will search for that literally (and likely ignore "or" as one of my stopwords)

Anyone? A couple of my more picky users are complaining, and I don't really have an answer for them, as I typically just do keyword searches.

amcd
02 Nov 2006, 21:34
can we go one step deeper on this delta thingy?

main index - rebuilt once every day or maybe twice a week
delta 1 - rebuilt once every hour or maybe 4 times a day
delta 2 - rebuilt every 5 minutes

is it possible? will having the index in 3 parts affect performance?

mute
02 Nov 2006, 22:48
can we go one step deeper on this delta thingy?

main index - rebuilt once every day or maybe twice a week
delta 1 - rebuilt once every hour or maybe 4 times a day
delta 2 - rebuilt every 5 minutes

is it possible? will having the index in 3 parts affect performance?

There really isn't a point in doing so. I run my delta updates every 5 minutes, and it takes ~1 second, and I get about 450 new posts per minute..

amcd
03 Nov 2006, 06:43
There really isn't a point in doing so. I run my delta updates every 5 minutes, and it takes ~1 second, and I get about 450 new posts per minute..
oh. then i suppose the current 2 level system is fine.

how often do you rebuild the main index? i have set it up for once a day.

ALanJay
03 Nov 2006, 07:12
how often do you rebuild the main index? i have set it up for once a day.

I rebuild my once a day at the slowest part of the night. Which seems to be fine. The delta is run every 5 minutes and takes a few seconds to create. We have between 10-40,000 new posts a day.

mute
03 Nov 2006, 15:28
oh. then i suppose the current 2 level system is fine.

how often do you rebuild the main index? i have set it up for once a day.

I personally have no plans on rebuilding my main index on a regular basis. Given the nature of the way this delta update stuff works, there is really no penalty to letting your delta updates grow in size, so I don't plan on rebuilding my index until sometime in the future that we have a maintenance window or something like that.

mute
05 Nov 2006, 17:29
Our search is working flawlessly, and seeing ~4000 searches per day, which isn't too shabby at all.

Question for you guys, as I can't seem to find much in the way of documentation regarding the searches.

Does sphinx support "OR"? If you were to search for "test one two", it searches for all three with an implicit "AND". If you search for "blah not bleh", it will search for "blah -bleh". If you search for "test or task or mask", it will search for that literally (and likely ignore "or" as one of my stopwords)

Anyone? A couple of my more picky users are complaining, and I don't really have an answer for them, as I typically just do keyword searches.

So.. anyone smarter than I am have an answer to this? :)

orban
05 Nov 2006, 17:33
Ask in the sphinx forum maybe....:O

mute
05 Nov 2006, 17:37
I think that | is OR and & is AND, but I haven't tested it just yet. I hope they make it a tad more user friendly in the 0.9.7 release, my users aren't all that tech savy :)

ALanJay
05 Nov 2006, 18:01
As Oban says there has been discussion on these kind of things on the sphinx forum http://www.sphinxsearch.com/forum/

mute
06 Nov 2006, 04:28
Right, that's where I found it. I'll probably end up doing search and replace additions to my search page to replace natural language operators with their character representation.

ALanJay
06 Nov 2006, 06:56
A quick question has anyone sucessfully configured there search to work with 2 letter words?

I have set my system for max length of 2 but I still only seem to be able to find 3 letter words.

amcd
06 Nov 2006, 07:00
A quick question has anyone sucessfully configured there search to work with 2 letter words?

I have set my system for max length of 2 but I still only seem to be able to find 3 letter words.
first tell me how to configure that then i will tell u my results :)

mute
06 Nov 2006, 15:46
first tell me how to configure that then i will tell u my results :)

min_word_len = 2

Rebuild your indexes :)

ALanJay
06 Nov 2006, 15:52
I discovered that when you change the word lenght (or the stop bits file) you have to fully stop "searchd" and restart it for the changes to be taken into account.

Once searchd was restarted it behaved as expected.

Another query / thought. I have been so impressed by Sphinx that over the last couple of days we have impelemented a search of our non forum content using it. It works well. But I then thought it might be nice to create a simple search for the forums.

If anyone is interested I have some very basic code to do this - started with the code from test.php and once a valid result is found it does a search in the forum trhead database for the thread $docinfo[group2]

$article_query = "SELECT title, threadid FROM ???_forum.thread WHERE threadid='$docinfo[group2]'";

From there you can select the article trhead and create a simple output page.

If anyone in interest in more details shout and I might clean up the code so it can be looked at by all you pros :)

DaiTengu
08 Nov 2006, 18:52
It's weird though it doesn't even get set...should at least be an empty array. And why is it even a seperate global and not $vbulletin->coventry?

Just remove the line if you aren't using coventry then.

Such a mess ;)

I'm also having this problem, and the forum I'm using it on _is_ using Coventry. The results are returned & everything, but I'm assuming they'll search posts & return results of posts of users that are in Coventry if I turn this off.

Other than that, I'm completely amazed at how fast search is running :)

orban
08 Nov 2006, 19:09
Are you guys using vB 3.6?

DaiTengu
08 Nov 2006, 20:01
Are you guys using vB 3.6?

It's a very clean 3.6.1 install on a very busy forum. There's no hacks or anything else installed.

amcd
09 Nov 2006, 02:55
i dropped 3 fulltext indexes

title and title_2 on post
title on thread

hope i did the correct thing

what are the postindex and posthash tables for? i also have 3 tables called postindex_temp31480 and similar. what are those?

Neil Lock
12 Nov 2006, 13:14
Would love a little bit of help, if available?

I have tried getting this to work on one of our test forums which currently has about 1.5mil posts and is running vbul 3.5.4 first of all im not sure whether my conf is correct when i run the indexer is it supposed to say

skipping index 'vbfulltext' (distributed indexes can not be directly indexed)...
skipping index 'vbfulltextthread' (distributed indexes can not be directly indexed)...
those are the last 2 results - the dirstributed indexes is that error suppose to be there?

secondly when i run a search --config path qry it appears to work and give back results however upon turning on the searchd i dont seem to get any results the echos on my search.php
Query '' retrieved 0 of 0 matches in 0.000 sec.
Query stats:
'test' found 0 times in 0 documents

and there are no queries on my searchd.log

any starting point suggestions- i guess the searchd isnt being queried however when i switch it off the echo alerts me that searchd is not running? is one of the issues that im using 3.5?



Thanks
Nelly

orban
12 Nov 2006, 13:20
skipping index 'vbfulltext' (distributed indexes can not be directly indexed)...
skipping index 'vbfulltextthread' (distributed indexes can not be directly indexed)...
those are the last 2 results - the dirstributed indexes is that error suppose to be there?

Yes

secondly when i run a search --config path qry it appears to work and give back results however upon turning on the searchd i dont seem to get any results the echos on my search.php
Query '' retrieved 0 of 0 matches in 0.000 sec.
Query stats:
'test' found 0 times in 0 documents

and there are no queries on my searchd.log

any starting point suggestions- i guess the searchd isnt being queried however when i switch it off the echo alerts me that searchd is not running? is one of the issues that im using 3.5?

Are you starting the searchd with --config path too?

Neil Lock
12 Nov 2006, 13:30
Wow, thanks for the really quick reply, yup am starting with the conf file

and the only lines in the log read:
[Sun Nov 12 14:29:38 2006] [24295] creating server socket on 0.0.0.0:3312
[Sun Nov 12 14:29:38 2006] [24295] accepting connections


Nelly

orban
12 Nov 2006, 13:32
There's two log files.

searchd.log and query.log

What does query.log say?

Neil Lock
12 Nov 2006, 13:35
ahh didnt notice that there was a query log

ok so it must be hitting it
[Sun Nov 12 14:34:13 2006] 0.009 sec [all/1/attr- 0 (0,500)] [vbpostindex] test qry

hmmmm will go back to the search.php - is it likely to be something to do with using 3.5.4 and not 3.6?

Nelly

mute
12 Nov 2006, 13:36
Would love a little bit of help, if available?

skipping index 'vbfulltext' (distributed indexes can not be directly indexed)...
skipping index 'vbfulltextthread' (distributed indexes can not be directly indexed)...
those are the last 2 results - the dirstributed indexes is that error suppose to be there?


This is the correct behavior.


secondly when i run a search --config path qry it appears to work and give back results however upon turning on the searchd i dont seem to get any results the echos on my search.php
Query '' retrieved 0 of 0 matches in 0.000 sec.
Query stats:
'test' found 0 times in 0 documents


Check your settings in sphinx.php, make sure your searchd command line has the proper --config, and.. upgrade to 3.6, as I don't believe you'll have much luck with Orbans search files w/ 3.5, as they were designed for 3.6.

Jeez, there were like 4 replies while I was replying :)

orban
12 Nov 2006, 13:36
I don't really know... :(

you can try to work with the files in the /api/ folder of the archive you downloaded, and try to get that one work.

mute: !! :D

Neil Lock
12 Nov 2006, 13:58
Thanx guys, will take a look at the api stuff but on initial inspection get error messages such as
Query failed: searchd error: index 'vbthreadindex': incompatible schemas: non-virtual attributes count mismatch: 4 in schema '/var/data/vbthreadindex', 5 in schema '/var/data/vbpost'.

if this means anything that can be relayed then please mention otherwise gonna spend the afternoon reading and playing...

as for 3.6 we hope to upgrade soon, but i havent ascertained what they have done to this upgrade (ie how much more 'beef' will our front ends need for this version!)


cheers

mute
12 Nov 2006, 14:00
If i were you i'd verify that your sphinx.conf matches the example.

We upgraded to 3.6.2 about 2 weeks ago and it went pretty well, as far as our front end web traffic goes, I don't really notice a difference in terms of load :)

orban
12 Nov 2006, 14:01
Are you combining wrong indexes together?

Neil Lock
12 Nov 2006, 14:03
If i were you i'd verify that your sphinx.conf matches the example.

We upgraded to 3.6.2 about 2 weeks ago and it went pretty well, as far as our front end web traffic goes, I don't really notice a difference in terms of load :)

well thats good news - the upgrade from 3 to 3.5 really hit us hard! we had our db box well tuned for 3 and then they up and move it to the front ends...v. annoying

ALanJay
12 Nov 2006, 15:35
Neil,

The other thing to check is to use the test.php script to check that you can search the files correctly. If that works then you know you need to tweak the search.php code - it is possible as I have tweaked it all the way back to 3.0.x :) Though I doubt there will be much to change as the major changes occured in the the chnage from 3.0 to 3.5

Neil Lock
12 Nov 2006, 15:42
Hey all, well i have been playing around with no real success with test.php which i can only assume means that my conf file is messed up somewhere, if anyone could take a few secs to see whether I have made any glaring errors

#
# sphinx configuration file sample
#

#############################################################################
## data source definition
#############################################################################

source src1
{
type = mysql
strip_html = 0
sql_host = ******
sql_user = vbupgradeforum
sql_pass = ******
sql_db = vb_upgrade_forums
sql_port = 3306


sql_query_pre = REPLACE INTO sph_counter SELECT 1, MAX(postid) FROM vb_post

sql_query_range = SELECT MIN(postid), MAX(postid) FROM vb_post
sql_range_step = 1000
sql_query = \
SELECT postid, forumid, vb_post.threadid as threadid, IF(vb_post.userid=0,99999999,vb_post.userid) AS userid, IF(postuserid=0,99999999,postuserid) AS postuserid, vb_post.title, pagetext, vb_post.dateline \
FROM vb_post \
INNER JOIN vb_thread AS thread ON(thread.threadid = vb_post.threadid) \
WHERE vb_post.visible = 1 AND postid >= $start AND postid <= $end \
AND postid <= ( SELECT max_doc_id FROM sph_counter WHERE counter_id = 1 );

sql_group_column = forumid
sql_group_column = threadid
sql_group_column = userid
sql_group_column = postuserid
sql_date_column = dateline

sql_query_post =
}

source src2 : src1
{

sql_query_pre =
sql_query_range = SELECT ( SELECT max_doc_id FROM sph_counter WHERE counter_id = 1 ), MAX(postid) FROM vb_post
sql_range_step = 1000
sql_query = \
SELECT postid, forumid, vb_post.threadid as threadid, IF(vb_post.userid=0,99999999,vb_post.userid) AS userid, IF(postuserid=0,99999999,postuserid) AS postuserid, vb_post.title, pagetext, vb_post.dateline \
FROM vb_post \
INNER JOIN vb_thread AS thread ON(thread.threadid = vb_post.threadid) \
WHERE vb_post.visible = 1 AND postid >= $start AND postid <= $end \
AND postid > ( SELECT max_doc_id FROM sph_counter WHERE counter_id = 1 );
}


source src3
{
type = mysql
strip_html = 0
sql_host = *****
sql_user = vbupgradeforum
sql_pass = *****
sql_db = vb_upgrade_forums
sql_port = 3306


sql_query_pre = REPLACE INTO sph_counter SELECT 2, MAX(threadid) FROM vb_thread

sql_query_range = SELECT MIN(threadid), MAX(threadid) FROM vb_thread
sql_range_step = 1000
sql_query = \
SELECT threadid, forumid, title, IF(postuserid=0,99999999,postuserid) AS postuserid, IF(firstpostid=0,99999999,firstpostid) as firstpostid, lastpost \
FROM vb_thread \
WHERE visible = 1 AND threadid >= $start AND threadid <= $end \
AND threadid <= ( SELECT max_doc_id FROM sph_counter WHERE counter_id = 2 );

sql_group_column = forumid
sql_group_column = postuserid
sql_group_column = firstpostid
sql_date_column = lastpost

sql_query_post =
}

source src4 : src3
{
sql_query_pre =
sql_query_range = SELECT ( SELECT max_doc_id FROM sph_counter WHERE counter_id = 2 ), MAX(threadid) FROM vb_thread
sql_range_step = 1000
sql_query = \
SELECT threadid, forumid, title, IF(postuserid=0,99999999,postuserid) AS postuserid, IF(firstpostid=0,99999999,firstpostid) as firstpostid, lastpost \
FROM vb_thread \
WHERE visible = 1 AND threadid >= $start AND threadid <= $end \
AND threadid > ( SELECT max_doc_id FROM sph_counter WHERE counter_id = 2 );
}

#############################################################################
## index definition
#############################################################################

# local index example
#
# this is an index which is stored locally in the filesystem
# all indexing-time options (such as morphology and charsets) belong to the index
index vbpost
{
source = src1
path = /var/data/vbpost
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbpostindex
{
source = src2
path = /var/data/vbpostindex
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbthreadindex
{
source = src3
path = /var/data/vbthreadindex
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbthreadindexdelta
{
source = src4
path = /var/data/vbthreadindexdelta
docinfo = extern
morphology = none
stopwords =
min_word_len = 4
charset_type = sbcs
}

index vbfulltext
{
type = distributed
local = vbpost
local = vbpostindex
}

index vbfulltextthread
{
type = distributed
local = vbthreadindex
local = vbthreadindexdelta
}

#############################################################################
## indexer settings
#############################################################################

indexer
{
# memory limit
# can be specified in bytes, kilobytes (mem_limit=1000K) or megabytes (mem_limit=10M)
# will grow if set unacceptably low
# will warn if set too low, hurting the performance
# optional, default is 32M
mem_limit = 64M
}

#############################################################################
## searchd settings
#############################################################################

searchd
{
# port on which search daemon will listen
port = 3312


# log file
# searchd run info is logged here
log = /var/log/searchd.log


# query log file
# all the search queries are logged here
query_log = /var/log/query.log


# client read timeout, seconds
read_timeout = 5


# maximum amount of children to fork
# useful to control server load
max_children = 30


# a file which will contain searchd process ID
# used for different external automation scripts
# MUST be present
pid_file = /var/log/searchd.pid


# maximum amount of matches this daemon would retrieve from each index
# and serve to client
#
# this parameter affects per-client memory usage slightly (16 bytes per match)
# and CPU usage in match sorting phase; so blindly raising it to 1 million
# is definitely NOT recommended
#
# default is 1000 (just like with Google)
max_matches = 1500
}

# --eof--




my tables use the extension vb_ and i went through and modified all the table names as per instructs. anything glaringly obvious>?

Neil

amcd
13 Nov 2006, 07:25
i have a problem

Warning: assert() [function.assert]: Assertion failed in /sphinxapi.php on line 284
Query '' retrieved 0 of 0 matches in 0.000 sec.
Query stats:
'barnacles' found 11 times in 11 documents
\ysF[w BBhAdYvϲĕbA
߯{fp}WAO_ygi\AXSԃC|= Dt"&9Xnʬ+Iܤxx!lYB$wz] >X
З$�
[(=O�M&
$?ѳ E?%r֔i\;BLGS�>i%tW3񈦺)TeW,S8_=&%G\uD/;kL a+ I],=}/%B^ �hK4 } _gNM|)wmxG^6Oƻm$!tG^a*K;ZoF-絶ͮÂ�wKCXG*D�c�M%�Y5IjOa'E<&۵]8w)N҅]yV,tmW7?o�?y9Mo2>myOAN@�!aOhj'54@ l)yam06V7s]`R&,MFkF A4K :8 H�J^-A?bkA!;wG#;yFuC'�."qm3w؅XdqKf"At0%i L
3B
#^g fG4CCu_to@�cډ9_$ƈ 

and the gibberish continues for another page or two



to reproduce this error, go to http://www.xboard.us/bbb/forumdisplay.php?f=7 and click on the 'search this forum' link. in the dropdown which opens, type 'barnacles' in the search term field and select 'show results as posts' and click go

happens with other forums and other search terms also, but not always. internet explorer and firefox show only a blank page. to see the error, use opera.

orban
13 Nov 2006, 13:25
Can you try to use the test.php and/or the "search" command tool?

amcd
13 Nov 2006, 13:41
search works 90% of the time, so its not a total failure

search command line tool gives the following output
search -c /usr/local/etc/sphinx.conf barnacles
Sphinx 0.9.7-RC1
Copyright (c) 2001-2006, Andrew Aksyonoff

index 'postmain': query 'barnacles ': returned 11 matches of 11 total in 0.000 sec

displaying matches:
1. document=19361, weight=1, forumid=4, threadid=169, userid=10, postuserid=219, dateline=Thu Nov 4 20:03:03 2004
2. document=19396, weight=1, forumid=4, threadid=169, userid=12, postuserid=219, dateline=Thu Nov 4 21:07:51 2004
3. document=52950, weight=1, forumid=4, threadid=1239, userid=10430, postuserid=10430, dateline=Thu Dec 9 10:29:54 2004
4. document=836679, weight=1, forumid=4, threadid=13920, userid=477, postuserid=27439, dateline=Thu Jun 30 05:07:27 2005
5. document=1629825, weight=1, forumid=4, threadid=370, userid=477, postuserid=6, dateline=Wed Dec 14 02:03:22 2005
6. document=1788408, weight=1, forumid=4, threadid=36177, userid=72112, postuserid=1061, dateline=Wed Jan 18 04:48:09 2006
7. document=1925410, weight=1, forumid=4, threadid=38680, userid=477, postuserid=35714, dateline=Mon Feb 13 08:46:41 2006
8. document=2000574, weight=1, forumid=4, threadid=80294, userid=477, postuserid=1717, dateline=Sun Feb 26 12:33:41 2006
9. document=2000585, weight=1, forumid=4, threadid=80294, userid=21423, postuserid=1717, dateline=Sun Feb 26 12:35:18 2006
10. document=2921752, weight=1, forumid=10, threadid=69489, userid=477, postuserid=477, dateline=Thu Jul 13 19:51:01 2006
11. document=3371748, weight=1, forumid=13, threadid=81940, userid=103694, postuserid=103694, dateline=Sun Sep 10 14:59:51 2006

words:
1. 'barnacles': 11 documents, 11 hits

index 'postdelta': query 'barnacles ': returned 0 matches of 0 total in 0.000 sec
index 'threadmain': query 'barnacles ': returned 0 matches of 0 total in 0.000 sec
index 'threaddelta': query 'barnacles ': returned 0 matches of 0 total in 0.000 sec

Neil Lock
15 Nov 2006, 11:44
Finally got it to work,

Now I have another question, I want to run this from a slave database ie grab the query data but the sphinx requires the REPLACE INTO which obv cant run on a slave instance so my question is this - is it possible to hook this up to run on the master for the replaces and the slave for the other queries. I intend on going and playing but wondered before hand whether anyone had a solution?

Thanks

Neil

orban
15 Nov 2006, 11:48
Why can't you have the counter table you run REPLACE INTO on the slave server?

Master: vB
Slave: Replicated vB + counter table + sphinx

Should work fine :O

amcd
15 Nov 2006, 11:54
orban, any suggestions for my problem?

orban
15 Nov 2006, 12:01
If you can reproduce the error with "search" I'd try the sphinx forums....or does it only happen when using sphinxapi.php?

Neil Lock
15 Nov 2006, 12:31
The slave server as far as i know(and im pretty sure) has only read permissions hence the replace will have to be run on the master (which is then obv replicated across) so i cannot write to the slave db at all.

Neil

kmike
15 Nov 2006, 14:46
Nothing can prevent you from creating a new table in the replicated db on the slave - granted your db user has the create/update privileges on that db. So your assertion that the slave is read-only isn't true.

(Actually nothing prevents you from messing with replicated tables on the slave, too - which obviously will break the replication integrity)

amcd
16 Nov 2006, 09:53
If you can reproduce the error with "search" I'd try the sphinx forums....or does it only happen when using sphinxapi.php?
as far as i can tell, it happens only when using the API

Warning: assert() [function.assert]: Assertion failed in /sphinxapi.php on line 284
Query '' retrieved 0 of 0 matches in 0.000 sec.
Query stats:
'barnacles' found 11 times in 11 documents

it does say that it got 11 results, so probably the error is not in the searchd portion

looks like i may have solved the problem by using the array_walk and intval approach as discussed by orban and alanjay. strange thing is that my forums are running vb 3.6.1, not 3.0.x for which the discussion was originally intended.

mute
20 Nov 2006, 23:06
Hm. We're running into that "search results are out of order" bug again, even if a user has the sort set to by date, rather than relevancy.

Has anyone else using this run into it?

ALanJay
21 Nov 2006, 07:29
looks like i may have solved the problem by using the array_walk and intval approach as discussed by orban and alanjay. strange thing is that my forums are running vb 3.6.1, not 3.0.x for which the discussion was originally intended.

I viewed it as a bit of a belt and braces issue :)

the values are numerals but sometimes they seem to be stored as strings no idea why but array_walk and intval seem to resolve the issue simply enough.

By the way for anyone interested I have now used Sphix to create a search interface to both our news and forums databases outside of vBulletin. Next step to implement the sort order (which jsut needs to be made pretty).

You can see the tool at http://www.digitalspy.co.uk/search/ds-search.php

DaiTengu
25 Nov 2006, 03:19
Nothing can prevent you from creating a new table in the replicated db on the slave - granted your db user has the create/update privileges on that db. So your assertion that the slave is read-only isn't true.

(Actually nothing prevents you from messing with replicated tables on the slave, too - which obviously will break the replication integrity)

Sure there is, you can put a read-only option in my.cnf to prevent users from writing data to the database (except for the replication user & the root user)


Anyway, I am also running into the results out of order bug. I haven't changed anything, and it just seemed to start cropping up one day.

mute
25 Nov 2006, 04:03
Yeah, my users are complaining about the out of order results, but I haven't had the time lately to delve into it. I swear at one point it was working, but now.. not so much.

ALanJay
25 Nov 2006, 08:24
Finally got it to work,

Now I have another question, I want to run this from a slave database ie grab the query data but the sphinx requires the REPLACE INTO which obv cant run on a slave instance so my question is this - is it possible to hook this up to run on the master for the replaces and the slave for the other queries. I intend on going and playing but wondered before hand whether anyone had a solution?

Thanks

Neil

Can I ask why?

searchd can be on any computer and the database it looks into can be on any other one (that it can see). You obvioulsy have to configure the front end to look at searchd on the correct computer and change the localhost references to the IP address of the machine that has searchd running on it.

But the load from indexing the files isn't that great and the way Oban has implemented it with a main index and deltas means that even with a large board with lots of posts and we get from 10,000 to 40,000 a day running the rebuild of the full index once a day at a quite preriod will not put a load on the database (and in our case with a file with nearly 12 million posts it takes under 5 minutes). The creation of the delta file which I run every 5 minutes takes just a few seconds.

Sphinx's overhead when indexing seems very small (as far as I can tell) on the mySQL database so I don't see the need to complicate things.

In my setup:

HTML front ends (x8 - 10.10.10.11 to 10.10.10.18) sphinx.php points to 10.10.10.19

10.10.10.19 - searchd when index created looks at 10.10.10.1

10.10.10.1 - mySQL master database

As I understand it all Sphinx leaves in the database is a marker to say where the line between the main and delta database is.

Good luck Neil :)

orban
25 Nov 2006, 10:24
Yeah, my users are complaining about the out of order results, but I haven't had the time lately to delve into it. I swear at one point it was working, but now.. not so much.

That's really weird :(

I never had this problem.

amcd
25 Nov 2006, 10:32
That's really weird :(

I never had this problem.
i have the same problem, though no one has complained yet

orban
25 Nov 2006, 10:36
Can you reproduce this with "search" on the same input?

Might be worth asking in the sphinx forums.

DaiTengu
25 Nov 2006, 10:39
For curiousity's sake, can I get rid of any of my indexes on the post table now? The table crashes periodically, and with the fulltext index it takes almost an hour to repair.

Neil Lock
25 Nov 2006, 11:14
hey,

cheers for the help, probably against our server peoples wishes i indexed from the master database (our master db is fairly heavily loaded - we were trying to avoid adding anything new which may "tip it over") - so i now have an indexed db, the searchd daemon running now all i need to do is play around and write some scripts to manipulate the data the searchd returns - my first job is to build a standalone search which can be used to test before it goes live on our forums. I am actually quite excited about this product and looking forward to using it. Cheers guys. Will keep you posted on progress.

Neil

DaiTengu
01 Dec 2006, 21:55
has anyone managed to fix the out-of-order results on their forum, yet?

kmike
02 Dec 2006, 07:32
I'm taking a wild guess here, but maybe the returned results are sorted by relevance, that's why they seem out of order? (which order btw? date posted?)

DaiTengu
02 Dec 2006, 08:02
Yeah, they're supposed to be sorted by date. Apparently I'm not the only person having the problem.

ALanJay
02 Dec 2006, 08:14
Yeah, they're supposed to be sorted by date. Apparently I'm not the only person having the problem.

When I wrote a search against oanother (non forum database) and I added sort by date I discovered that the place that the sort type is set matters.

ie

////////////
// do query
////////////
$cl = new SphinxClient ();
$cl->SetServer ( $sphinx_server, $sphinx_port );
$cl->SetWeights ( array ( 100, 1 ) );
// Number of results to display //
$cl->SetLimits ( intval(0), intval($limit) );
// $cl->SetMatchMode ( $any ? SPH_MATCH_ANY : SPH_MATCH_ALL );
$cl->SetMatchMode ( $sp_srch );
$cl->SetSortMode ( $sp_sort );
$cl->SetGroups ( $groups );
$cl->SetGroups2 ( $groups2 );
$cl->SetGroups3 ( $groups3 );
$cl->SetGroups4 ( $groups4 );
$cl->SetGroups5 ( $groups5 );
$res = $cl->Query ( $q, $index );

Works for me byt putting the "SetSortMode" below Group 5 didn't work not sure why :) But it might be worth checking where it appears.