PDA

View Full Version : Help needed with parsing


Saint
31 Oct 2001, 00:15
am new to php and needed help in parsing some html code from an external webpage (with consent from owner)
and outputing it on the index.php of vB.

Not parsing the whole webpage though
just some of it.

Any anyone help?

MrLister
31 Oct 2001, 03:01
post the code here.... you'll get much more response... if the code is too big then post the problem area.

Mark Hensler
31 Oct 2001, 07:13
This will involve pattern matching. You'll need to know what's before and after the text you want.

If you want to read docs on some of the functions you might use...
eregi() (http://www.php.net/manual/en/function.eregi.php), file() (http://www.php.net/manual/en/function.file.php), preg_match() (http://www.php.net/manual/en/function.preg-match.php)

Saint
31 Oct 2001, 14:26
Originally posted by MrLister
post the code here.... you'll get much more response... if the code is too big then post the problem area.

The HTML code or the php code?

I know nothing about PHP, am still learning. :(

If it's the HTML code, yes I can paste it here.

Saint
31 Oct 2001, 15:16
This is the HTML from the site I want to parse

<TABLE WIDTH="70%" >
<TR>
<TD><FONT SIZE=+1><a name="patch">Patch Server:</a></FONT></TD>
<TD><IMG SRC="http://ultima.lightning.net/uo/img/grnball.gif" HEIGHT=17 WIDTH=17 ALIGN=TOP> UP! for 57h 20m 06s</TD>
</TR>
<TR>
<TD><FONT SIZE=+1><a name="login">Login Server:</a></FONT></TD>
<TD><IMG SRC="http://ultima.lightning.net/uo/img/grnball.gif" HEIGHT=17 WIDTH=17 ALIGN=TOP> UP! for 97h 49m 06s</TD>
</TR>
<TD><FONT SIZE=+1><a name="AOLLegends">AOL Legends:</a></FONT></TD>
<TD><IMG SRC="http://ultima.lightning.net/uo/img/grnball.gif" HEIGHT=17 WIDTH=17 ALIGN=TOP> UP! for 1h 31m 06s&nbsp;&nbsp;<A HREF="http://ultima.lightning.net/uo/en/history/AOLLegends.html"><FONT SIZE="-2">[details]</FONT></a></TD>
</TR>


Note that I only need to parse some of it not the whole HTML so some stripping need to be done.
i.e I only need to parse the code I highlighted in red above.
The page that I'm parsing the HTML from refreshed every 60 secs.

MrLister
31 Oct 2001, 15:38
As Mark already mentioned try looking into eregi(), file(), preg_match() on php.net and i'm pretty sure there are a few scripts that do something like this... you could try and look them up at hotscripts.com and look at the source and get an idea from there.

Saint
31 Oct 2001, 15:55
Ok thanks

Mark Hensler
31 Oct 2001, 17:03
What you'll be doing is pattern matching. So, you have to know what is surrounding the text you want.

Do you only want the those two pairs in red?
Or, do you want anything in this pattern:
<TR>
<TD><FONT SIZE=+1><a name="login">TEXT TEXT TEXT</a></FONT></TD>
<TD>TEXT TEXT TEXT HEIGHT=17 WIDTH=17 ALIGN=TOP> UP! for 97h 49m 06s</TD>
</TR>

When your pattern matching, you want to be very specific.

Saint
31 Oct 2001, 21:39
I just want the 2 pair in red.

In total there's about 14 pairs of that on that page.

But output differently on my page,
I'll want to replace his image file with my own image file.
But i need to know which image file is on his page at that time cos there's 2 type, a grnball.gif and a redball.gif

I'll name mine the same too, but will be of different pic.

Thanks for replying Mark.

Mark Hensler
31 Oct 2001, 22:12
Wait.. do you want only those 2 pairs (login server, aol legends), or all 14 pairs? (I'm looking at pairs as the text and image)

Some "Quickie Code" (untested)

// suck the remote file into a string
$remote_site = join('', file("http://remote.domain.com/index.php") );

preg_match_all(
"|<tr>(.*)<a name=\"(.*)\">(.*):</a>(.*)<IMG SRC=\"(.*)\"(.*)</tr>|Ui",
$remote_site,$matches);

for ($i=0; $i<count($matches[3]); $i++) {
$name = $matches[3];
$image = $matches[5];
if (strstr($image,'grnball.gif')) {
// green ball
}
else {
// red ball
}

// do your thingy

}
functions docs: file() (http://www.php.net/manual/en/function.file.php), join() (http://www.php.net/manual/en/function.join.php), preg_match_all() (http://www.php.net/manual/en/function.preg-match-all.php), strstr() (http://www.php.net/manual/en/function.strstr.php)

Good Luck,

Saint
01 Nov 2001, 19:16
Originally posted by Mark Hensler
[B]Wait.. do you want only those 2 pairs (login server, aol legends), or all 14 pairs? (I'm looking at pairs as the text and image)



Sorry all the 14 pairs.
Yes, pairs as in the text and image.


I'm still trying to absorb your code.
Am a newbie at this. :(


Thanks

Mark Hensler
02 Nov 2001, 03:44
Let me try to break it down for you..

// suck the remote file into a string
$remote_site = join('', file("http://remote.domain.com/index.php") );

// now, pattern match for the desired text, in this case,
// $matches[3] will contain the value of the first red block (the name-like thingy)
//$matches[5] will contain the value of the second red block (the image source)
preg_match_all(
"|<tr>(.*)<a name=\"(.*)\">(.*):</a>(.*)<IMG SRC=\"(.*)\"(.*)</tr>|Ui",
$remote_site, $matches);

/**
* $matches now looks like this:
* $matches[3][0] = first match for the name block
* $matches[5][0] = first match for the image block
* $matches[3][1] = second match for the name block
* $matches[5][1] = second match for the image block
* etc.
*/

// loop through all the matches
for ($i=0; $i<count($matches[3]); $i++) {
// put the name/image info into more user friendly variables
$name = $matches[3];
$image = $matches[5];

// find out what the image source was...
if (strstr($image,'grnball.gif')) {
// the image source contains "grnball.gif",
}
else {
// the imag source does not contain "grnball.gif",
// so it must be "redball.gif"
}

// do your thingy
// you might print a new table using the $name/$image from the other site
}
I hope that helps (probably not =P). If you have a specific question, those are easier to answer.

Saint
02 Nov 2001, 08:18
means I got to repeat that 14 times for the pairs?

and add $matches(0) for all the pairs?

Mark Hensler
02 Nov 2001, 08:37
No, it is already looping through all the pairs. See where I said "// do your thingy"?

Try it.. just make a new file, and through this in there....

<?
$remote_site = join('', file("http://remote.domain.com/index.php") );

preg_match_all(
"|<tr>(.*)<a name=\"(.*)\">(.*):</a>(.*)<IMG SRC=\"(.*)\"(.*)</tr>|Ui",
$remote_site, $matches);

for ($i=0; $i<count($matches[3]); $i++) {
$name = $matches[3];
$image = $matches[5];

echo "|" . $name . "|" . $image . "|";

if (strstr($image,'grnball.gif')) {
echo "the image is a green ball" . "|";
}
else {
echo "the image is a red ball" . "|";
}

echo "<br>\n";
}
?>

Saint
02 Nov 2001, 08:54
Warning: file("http://ulitma.lightning.net/uo/index.html") - Undefined error: 0 in /usr/local/www/vhosts/nettiq.com/htdocs/serverstats.php on line 2

Warning: Bad arguments to join() in /usr/local/www/vhosts/nettiq.com/htdocs/serverstats.php on line 2


I got this error when I try to run the php script.

Mark Hensler
02 Nov 2001, 16:57
That URL doesn't work for me.

Saint
02 Nov 2001, 17:12
my mistake
typo
http://ultima.lightning.net/uo/index.html

I corrected it and when i run the php
it gives me a blank screen.

Mark Hensler
02 Nov 2001, 20:28
try this:

<?
echo "Yes, I'm running<BR>\n";

$remote_site = join('', file("http://remote.domain.com/index.php") );

preg_match_all(
"|<tr>(.*)<a name=\"(.*)\">(.*):</a>(.*)<IMG SRC=\"(.*)\"(.*)</tr>|Ui",
$remote_site, $matches);

echo "begining loop<BR>\n";

for ($i=0; $i<count($matches[3]); $i++) {
$name = $matches[3][$i];
$image = $matches[5][$i];

echo "|" . $name . "|" . $image . "|";

if (strstr($image,'grnball.gif')) {
echo "the image is a green ball" . "|";
}
else {
echo "the image is a red ball" . "|";
}

echo "<br>\n";
}
?>

Saint
02 Nov 2001, 20:38
trying now. :D

Saint
02 Nov 2001, 20:43
Nope.

Only get this 2 line

Yes, I'm running
begining loop

I should replace just the

$remote_site = join('', file("http://remote.domain.com/index.php") );

to http://ultima.lightning.net/uo/index.html right?

all the code stays.

Mark Hensler
02 Nov 2001, 21:12
$remote_site = join('', file("http://remote.domain.com/index.php") );

should become

$remote_site = join('', file("http://ultima.lightning.net/uo/index.html") );

Saint
03 Nov 2001, 01:42
Yup.

That's how I did it.

Only got those 2 lines

Mark Hensler
03 Nov 2001, 06:06
I have some time right now.. let me try playing with it.

Mark Hensler
03 Nov 2001, 06:25
OK.. this works for me. You can edit it to say whatever you need..

<?
// echo "Yes, I'm running<BR>\n";

$remote_site = join('', file("http://ultima.lightning.net/uo/index.html") );

preg_match_all(
"|<td>(.*)<a name=\"(.*)\">(.*):</a>(.*)</td>(.*)<td>(.*)<img src=\"(.*)\"(.*)</td>|Usi",
$remote_site, $matches);

// echo "begining loop<BR>\n";

for ($i=0; $i<count($matches[0]); $i++) {
$name = $matches[3][$i];
$image = $matches[7][$i];

// echo "|" . $name . "|" . $image . "|";

if (strstr($image,'grnball.gif')) {
// echo "the image is a green ball" . "|";
echo "$name is <font color='#00CC00'>online</font><BR>\n";
}
else {
// echo "the image is a red ball" . "|";
echo "$name is <font color='#FF0000'>offline</font><BR>\n";
}
} //END for
?>

Saint
03 Nov 2001, 06:45
That works!

Thanks alot for your time Mark!

Mark Hensler
03 Nov 2001, 07:46
With table, and images...

<?
$remote_site = join('', file("http://ultima.lightning.net/uo/index.html") );

preg_match_all(
"|<td>(.*)<a name=\"(.*)\">(.*):</a>(.*)</td>(.*)<td>(.*)<img src=\"(.*)\"(.*)</td>|Usi",
$remote_site, $matches);

echo "<table border=1 cellpadding=1 cellspacing=0>\n";

for ($i=0; $i<count($matches[0]); $i++) {
$name = $matches[3][$i];
$image = $matches[7][$i];

echo " <tr>\n";
echo " <td>\n";
echo "\t<font face='verdana, arial' size=1>";
echo $name;
echo "</font>\n";
echo " </td>\n";
echo " <td>\n";

if (strstr($image, 'grnball.gif')) {
echo "\t<img src='http://nettiq.com/images/image1.gif'>\n";
}
else {
echo "\t<img src='http://nettiq.com/images/image2.gif'>\n";
}

echo " </td>\n";
echo " </tr>\n";

} //END for

echo "</table>\n";
?>

Mark Hensler
05 Nov 2001, 16:48
new patterm:

<tr bgcolor="#ffc858">
<td><font size="+1"><a name=Catskills>Catskills:</a></font></td>
<td width="230" nowrap><img height=17 src="http://ultima.lightning.net/uo/img/grnball.gif" width=17
align=top> UP! for 6h 00m 05s</td>
<td width="170"><font size="-1"><a class=tbl href="http://ultima.lightning.net/uo/en/history/Catskills.html">details &gt;&gt;</a></font></td>
</tr>
revised version:

<?
$remote_site = join('', file("http://ultima.lightning.net/uo/index.html") );

preg_match_all(
"|<tr.*<a name=.*>(.*):</a>.*src=\"(.*)\".*</tr>|Usi",
$remote_site, $matches);

echo "<html>\n";
echo "<body>\n";

echo "<table border=0 cellpadding=0 cellspacing=0 align=center>\n";

for ($i=0; $i<count($matches[0]); $i++) {
$name = $matches[1][$i];
$image = $matches[2][$i];

echo " <tr>\n";
echo " <td>";

if (strstr($image, 'grnball.gif')) {
echo "<img src='http://nettiq.com/images/image1.gif'>";
}
else {
echo "<img src='http://nettiq.com/images/image2.gif'>";
}

echo "</td>\n";
echo " <td>\n";
echo "\t<font face='verdana,arial,helvetica' size='1'>&nbsp;";
echo $name;
echo "</font><BR>\n";
echo " </td>\n";
echo " </tr>\n";

} //END for

echo "</table>\n";
echo "<br>\n";

echo "<center>\n";
echo "<font face='verdana,arial,helvetica' size='1'>\n";
echo "<a href='$PHP_SELF'>Refresh</a>\n";
echo "</font>\n";
echo "</center>\n";

echo "</body>\n";
echo "</html>\n";
?>
It's always a pain when the remote site changes their pattern. 8[