PDA

View Full Version : Tips on Impex Import


Swordy
11 May 2015, 15:59
Hello,

I have a phpBB3 board with ~80000 users and about 9 million posts and am looking at moving over to vBulletin.

We've bought the software and am currently running a few testing installs to see how it works out, but we're finding the database conversion in ImpEx to be terribly slow - literally taking days and forecasting months, and am looking at possibly ways of speeding it up.

Both the source and target database are on Amazon RDS instances (and neither are live databases). We're running PHP5.5 on vB 4.2.3Beta on the EC2 instance with the latest ImpEx download from vbZachary's Git.

We can essentially have as much RAM and processing power as we like so an looking for advice on what the key fields to change are to maybe help it along? We've got PHP running at about 1GB limit at the moment, what else can we bump up? I've followed the advice in the manual regarding the indexes on posts and turning off dupe checking but both were already done by default.

The idea is to get the conversion process of entire database down to under 6-8 hours if possible which doesn't seem too lofty of a goal.

What else can we do?

kh99
11 May 2015, 17:02
So are you saying you basically have two remote databases with impex running on a local machine? Is there any way to export the phpbb3 database and import it to a local machine, do the convertion locally, then do the reverse?

BTW, I don't know what Amazon RDS or EC2 are, so it's possible that I don't understand the situation.

Swordy
11 May 2015, 17:26
Yes we can put it on a local machine but then obviously the resources are going to be shared and we see the resource tuning as the problem.

RDS is an Amazon hosted database as part of their Amazon Web Services. EC2 is essentially the same - a cloud server you can spin up or down based on traffic.

To be honest I was looking more for any ideas on possible variables to tune inside ImpEx/MySQL/PHP that are given out to people attempt to move around big boards as we're struggling to find much current and relevant information on ImpEx and how people are troubleshooting it.

kh99
11 May 2015, 17:45
Yeah, ok. It sounded to me like you were just trying to do a transfer from phpbb3 to vbulletin and get it done faster. I don't know anything about tuning parameters, but hopefully someone else will. Zachery still comes around once in a while.

Zachery
12 May 2015, 03:10
ImpEx really doesn't do a lot of processing in php, (though timeouts, etc can be a problem). Most of the grunt work is done in MySQL.

Do you have access to throw massive amounts of memory at the problem in MySQL? If so you can greatly decrease the time it takes to run the import. You might also want to get both databases locally to each other, or as local as possible.

You're going to be moving that data from the phpbb3 database into the vBulletin database. Network overhead is not your friend here.

Swordy
12 May 2015, 17:52
Yes I can throw massive amounts of memory at MySQL. Is there anything specific that I can change in the MySQL options that will speed it up? I saw a vague mention of buffer_pool_size though the thread was years old and didn't contain much information.

I could potentially run two databases on the same machine, but what balance should I give to MySQL and PHP?

I was under the impression that the PHP memory limit was the largest performance gain.

Server wise we could potentially throw 350GB of RAM and 30 vCPUs at it if needed, it's more trying to get that conversion time down to a few hours at most where the important thing is. The users and threads seem to convert pretty sharpish - 15 minutes for the users and maybe half an hour for ~ 1 million threads - it's the posts that seemingly take forever and ever. Last run was for 2 hours and a SELECT count(*) FROM posts in the target had about 350,000 done.

Brandon Sheley
12 May 2015, 18:44
You should be able to transfer more then 350k post in 2 hours. I've transferred well over a million post board from phpbb to vb and it's never taken days just for the transfer part.
Does the forum have a ton of attachments, I assume they would be taking most of the time to be honest depending on how they are stored.. DB or file system.
Also once you've done the import you'll need to "update the counters" The search one is going to take several hours on it's own with 9 million post to scan.

I can't remember if I've done this with an amazon server or not, I know I've ran this on your basic mysql/phpmyadmin setup numerous times.

BTW.. make sure you've grabbed the latest impex on github.. I know the one vb provided has some issues with a few platforms.

Swordy
12 May 2015, 19:08
The forum has no attachments in it and I am running vbZachary's Impex as of yesterday's code.

I am going to try doing the import as locally as possible, using the same MySQL and transfer across two databases to remove any network overhead from the equation. The target DB has always been a remote database so it could be a potential issue for speed.

Have PHP using about 5GB inside its memory_limit and trying 500,000 posts each tick.

Swordy
13 May 2015, 20:18
Still struggling with this one chaps. Seems that everything works pretty quickly with the exception of the posts table that takes forever and times out semi-regularly. This is on a PHP with a 5GB memory limit, a machine with 30GB and 12 vCPUs that are compute optimised.

During the user/thread imports I see a 70/30 CPU usage split between MySQL and Apache. However when the posts go into import then Apache varies between 5 and 100% whilst MySQL sits pretty happily at 10-20% for a bit until both drop to about 0-1% CPU. Memory usage for both is almost non-existent. This is using a 250,000 post import 'tick' and looking at the MySQL variables they are pretty well within their limits. 0% temp tables on disk, 40% key buffer usage, etc.

Both DB on a local MySQL setup rather than remote.

Again getting this import down in time scale is the most important thing and we're willing to get stick stupid resources behind it if we can get the import time down.

Zachery
15 May 2015, 02:29
I responded to your pm, but what does your IOwait look like while doing the imports?

I remember amazon having some terrible IO issues in the past. I don't know if its in your price ranges but Azure has some wicked fast SSD instances.

Swordy
16 May 2015, 10:14
Had a look at the IO using iotop

I have a process called jbd2/xvda1-8 that is battering it, claiming 60-90% leaving mysql only really writing at 80-90kb/s

We've got Apache2 logging at emerg only and all of the log file sizes seem rational.

This seems to be the most realistic lead for us so far and I'll be looking into it further. Will update if anything comes up but it may just require us to move it out of the Amazon chain if this is something on their end.

Thanks for all of your continued help guys, really appreciate it.

Swordy
19 May 2015, 15:03
We're having a bit more luck with this and just wanted to let people know for any future Amazon EC2 searchers. The jbd2/xvda1-8 was the journaling process pointing at the SSD

The issue was definitely the IO failure on a normal EC2 machine. We've solved this by using a machine with Provisioned IOPS that will be used just for the transfer process and the absolutely massive improvement came from pre-warming the disk:

dd if=/dev/zero of=initialize bs=1024M count=50

That sets up a 50G file in the local directory that should be immediately deleted afterwards. Then run the import and it goes for us from a ~40-50 hour import time to about a ~9 hour transfer time.

Cheers for the heads up Zachary.

Zachery
19 May 2015, 23:46
Glad you got it done :)