Register Members List Search Today's Posts Mark Forums Read

Reply
 
Thread Tools
  #751  
Old 12 Jan 2010, 17:33
amcd amcd is offline
 
Join Date: Oct 2004
16.5 mil posts
488k threads
vb 3.6

no plans to move to vb 4 until everyone else does it, too.

boolean and phrase search are needed. been missing them.

spending for a search solution - no problem. spending 2k - no way.

I would really love an updated version that runs on 3.6 that takes advantage of all the Sphinx goodies, since I don't see us moving to 4.0 for close to a year. All of the custom code we've written has to be ported and tested, and being the lone admin on a site this big has my hands full a lot of the time.
echo
__________________
eXBii.com - Indian community
no XB no fun know XB know fun !
Reply With Quote
  #752  
Old 13 Jan 2010, 21:14
eoc_Jason's Avatar
eoc_Jason eoc_Jason is offline
 
Join Date: Dec 2001
Okay, I've been working slowly but surely... Here's the following constraints thus far:

1. New threads/posts added when you run your delta cron job (most run every 2-5 min)...
2. Changes in # views, last poster, deleted threads / posts, etc should be real time updates.
3. Edits to the title or post text will not be updated until next full re-index (usually nightly) unless it is within the delta file.

Will have boolean searching, phrase, etc...
__________________
My Site: EXTREME Overclocking

Do not PM me with your iTrader problems or asking for the code. I will just delete your PM without reading it.
Reply With Quote
  #753  
Old 13 Jan 2010, 22:26
mute mute is offline
 
Join Date: Dec 2002
Originally Posted by eoc_Jason View Post
Okay, I've been working slowly but surely... Here's the following constraints thus far:

1. New threads/posts added when you run your delta cron job (most run every 2-5 min)...
2. Changes in # views, last poster, deleted threads / posts, etc should be real time updates.
3. Edits to the title or post text will not be updated until next full re-index (usually nightly) unless it is within the delta file.

Will have boolean searching, phrase, etc...
One thing I'd like to have that we don't currently have, is properly ordered search results. If you don't do full reindexing on a regular basis, they tend to get really out of order.
Reply With Quote
  #754  
Old 15 Jan 2010, 11:47
kmike kmike is offline
 
Join Date: Oct 2002
Apart from using Sphinx to search for the similar threads, you can also use it to generate the post excerpts with search keywords highlighted when in the "Show search results as posts" mode.

Our stats: almost 14 mln posts, 1.1 mln threads, 300k users, vB 3.8.
We're using our own Sphinx implementation since it predates the hack in this thread.

We got rid of the obscure search and sort modes though (such as sorting by the number of views or replies), and there was not a single complaint from our members. I don't think you should focus too much on 100% compliance with the default search. Having too many document attributes will inflate the index size, resulting in more I/O and more sluggish performance.
If you are worried about the need to edit the default search form template, you could always clone it, make the necessary changes and ship it with the product.
Reply With Quote
  #755  
Old 15 Jan 2010, 19:26
eoc_Jason's Avatar
eoc_Jason eoc_Jason is offline
 
Join Date: Dec 2001
Thanks for the feedback guys. Another thing I'm pondering on is instead of trying to work off just a main + delta index is to break the total post count up and constantly rotate smaller indexes...

I.E. If a site has 10,000,000 posts... Have 10 indexes each with 1,000,000 threads. Then have each of the indexes rotate say hourly. This would be a shift from the typical one massive re-index nightly (or however often you do it). In theory too, the last index would contain the most recent posts and could be re-indexed more often.

I dunno, that's just a thought... My concern right now is the core code for searching, the indexes themselves can be manipulated differently at a later time as that is transparent to everything else.
__________________
My Site: EXTREME Overclocking

Do not PM me with your iTrader problems or asking for the code. I will just delete your PM without reading it.
Reply With Quote
  #756  
Old 16 Jan 2010, 06:20
kmike kmike is offline
 
Join Date: Oct 2002
Originally Posted by eoc_Jason View Post
Thanks for the feedback guys. Another thing I'm pondering on is instead of trying to work off just a main + delta index is to break the total post count up and constantly rotate smaller indexes...
That's what we're doing, too, though the delta is still there. The bonus is that you can set up a distributed index with the number of agents equal to the number of CPUs, like described here, to take advantage of all CPUs in the server. However it's more of a manual operation, it would be hard to generate a partitioned sphinx.conf automatically.
Reply With Quote
  #757  
Old 17 Jan 2010, 22:17
eoc_Jason's Avatar
eoc_Jason eoc_Jason is offline
 
Join Date: Dec 2001
kmike - thanks for that info, I must over looked over that in the docs...

Just curious, how much of a performance difference did you see using the distributed process?


I kind of got sidetracked today... One of my good friend's wife just got out of the hospital, so I was there for a while today. Then I was coding some anti-spammer measures for my forum registration process...
__________________
My Site: EXTREME Overclocking

Do not PM me with your iTrader problems or asking for the code. I will just delete your PM without reading it.
Reply With Quote
  #758  
Old 17 Jan 2010, 22:49
mute mute is offline
 
Join Date: Dec 2002
We have 2 post indexes, one or our live post table, and one for our archived post table. They each have 30 million posts each. I don't see a point in sharding the post indexes aside from being able to take advantage of multiple CPUs when indexing.

The way I see it, if I can keep the old indexes online while I do a full reindex, I don't really care how long the full reindex takes since (at least in our case), the search server is just a slave database server and not our primary.
Reply With Quote
  #759  
Old 18 Jan 2010, 12:20
Kevlar's Avatar
Kevlar Kevlar is offline
 
Join Date: Nov 2001
The only thing I am waiting on before converting to vB4 is sphinx (or a working search alternative). The rest of the little stuff I modded I can do with or without until those developers get upgrades.

1.3 million threads
18 million posts
__________________
KEVLAR
www.bimmerforums.com
Reply With Quote
  #760  
Old 18 Jan 2010, 13:13
kris kris is offline
 
Join Date: Nov 2001
mute, can you share how did you archive post table ? What changes did you do in code and MySQL ? I want to move my old posts to another post_archive table but I am not sure how can I join those tables from vbulletin code.

eoc_Jason
my forum is 200k threads and 10mil posts, vb 3.8.4. I have only one database (no slave), nginx webserver, Core I7 with 12GB RAM.

I installed sphinx on server and from ssh it works great but from moded search.php it works very strange, sometimes when I want to find some keywords with option "show results as posts" it returns "no results" message but if I change search options to "show results as thread" with same keywords, I got good numbers of results showen as threads.

Users posts search does not works at all, search.php?do=finduser&u=xxx always gives blank screen no php errors in log or anywhere just blank screen and thats it.
Reply With Quote
  #761  
Old 18 Jan 2010, 19:21
amcd amcd is offline
 
Join Date: Oct 2004
my forum is 200k threads and 10mil posts, vb 3.8.4. I have only one database (no slave), nginx webserver, Core I7 with 12GB RAM
It is time for you to move to dual servers - one for webserver/PHP and another for MySQL.
__________________
eXBii.com - Indian community
no XB no fun know XB know fun !
Reply With Quote
  #762  
Old 18 Jan 2010, 22:06
kris kris is offline
 
Join Date: Nov 2001
It is time for you to move to dual servers - one for webserver/PHP and another for MySQL.
no money making here just spend

I think spliting big post table to smaller read only archived tables will be cheaper and even better solution and of couse Sphinx for search.
Reply With Quote
  #763  
Old 19 Jan 2010, 02:01
mute mute is offline
 
Join Date: Dec 2002
Originally Posted by kris View Post
mute, can you share how did you archive post table ? What changes did you do in code and MySQL ? I want to move my old posts to another post_archive table but I am not sure how can I join those tables from vbulletin code.
It's REALLY nasty. I really don't think you want to do it. In fact, I'm thinking about abandoning it on our site.

Back when we wrote it, we were probably at like 25 million posts, on (if I remember right), like a dual xeon with HT. Now, we're on a Quad Quad xeon box with 16gb of ram. We have 30 million in our archived tables (10x3 mill posts each) + 30 million in our post table. I'm not seeing any slowdowns against the post table, which has me wondering if we'd be seeing any slowdowns if I was pulling against all 60 million in one table, given how much faster our CPUs have gotten and how much ram we have sitting around.
Reply With Quote
  #764  
Old 19 Jan 2010, 03:41
masons masons is offline
 
Join Date: Jan 2007
Hi,

I have sphinx installed for my wiki since a few days, and now am looking to get it working with my vbulletin setup,

But, my server load went a bit overboard this morning (120+) and I have no idea how to work with that,..... any tips on taking some presure of the load? Before I add this to vbull?

Some server stats (dedicated)
Processor #1 Vendor: GenuineIntel
Processor #1 Name: Intel(R) Core(TM)2 Duo CPU E8300 @ 2.83GHz
Processor #1 speed: 1998.000 MHz
Processor #1 cache size: 6144 KB

Processor #2 Vendor: GenuineIntel
Processor #2 Name: Intel(R) Core(TM)2 Duo CPU E8300 @ 2.83GHz
Processor #2 speed: 1998.000 MHz
Processor #2 cache size: 6144 KB
Reply With Quote
  #765  
Old 19 Jan 2010, 11:25
amcd amcd is offline
 
Join Date: Oct 2004
go through this thread over at vb.com
__________________
eXBii.com - Indian community
no XB no fun know XB know fun !
Reply With Quote
Reply



Currently Active Users Viewing This Thread: 1 (0 members and 1 guests)
 
Thread Tools

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


New To Site? Need Help?

All times are GMT. The time now is 10:23.

Layout Options | Width: Wide Color: