View Single Post
 
Old 29 Jun 2007, 19:22
The Geek's Avatar
The Geek The Geek is offline
 
Join Date: Sep 2003
I thought I would put down some basics of creating regexp's for AME as many users are wanting to create their own. This isnt an exhaustive list, nor am I an expert on regexps, this is just a rough, basic overview to get you started.

AME takes the regexp you provide and wraps it in [url] tags, boundaries and codes to make the regexp case insensitive so you don't have to worry about doing that. You do have to remember though that your regexp needs to match the entire URL. If it doesnt, it wont qualify as a match in AME.

Best way to test regular expressions (that I have found) is with regexp buddy. This is what I do:
  • Start up Regexpbuddy
  • Click the test tab
  • Tick the 'case insensitive' option
  • In the box below the tabs, I past the URL I want to create a regexp for. You need to be able to identify what part of the URL is the part you want to extract. In this instance, I am trying to create a regexp for http://www.clipfish.de/player.php?videoid=MzEwODYwfDg2NzY0Ng==. So I want to extract the MzEwODYwfDg2NzY0Ng part.
  • I then paste everything leading up to the part I want to extract into the top window like this: http://www.clipfish.de/player.php?videoid=
  • I then escape special characters from the url with the \ character like this: http://www\.clipfish\.de/player\.php\?videoid=. At this stage, RegexpBuddy should have highlighted your test URL up to the part we want to extract. If it hasnt, then you are missing something.

I now need to define a character class that will allow me to match the pattern I am after. If the pattern is only word characters (i.e. letters and or numbers), then I can use [\w]. If it is letters only, then I would use [a-z], If they are numbers only then I can use [\d].
I can also specify additional characters that can appear. For instance, if I wanted a class that allows word characters and underscores, I could do [\w_]. If I wanted letters only, hyphens and underscores, I could do [a-z_-]

In the case where I am trying to extract MzEwODYwfDg2NzY0Ng then a word character class would work fine: [\w]

The problem is that only matches the first occurence of a character in the class. In other words, my match would be http://www.clipfish.de/player.php?videoid=M NOT http://www.clipfish.de/player.php?videoid=MzEwODYwfDg2NzY0Ng== which is what I want!.

This is where special characters come in.
  • . will match any single character that is NOT a line break
  • * will match 0 or unlimited times
  • + will match once or unlimited times
  • ? will match 0 or 1 time.

So, to make my character class work, I use [\w]+

So now my regexp looks like:

http://www\.clipfish\.de/player\.php\?videoid=[\w]+

Now, that will match, but I need to capture whatever pattern is matched in the [\w]+ part. Thats where ()'s come into play. If I so this:

http://www\.clipfish\.de/player\.php\?videoid=([\w]+)

Then I get the contents of that pattern.

However!!! It still wont match yet because there are these annoying == signs in there! Since we are not sure how and when they will appear, lets just create another class to accomadate whatever else may come after.

[&\w;=+_-]* That class says "match any single character that is an &, a word (or digit), a semi colon, a plus, an underscore and a hyphen 0 to an unlimited amount of times (the asterix says that!). That means that any of those mentioned characters may of may not appear, but nothing outside of that class can appear (for instance, a %).
So my final regexp looks like:

http://www\.clipfish\.de/player\.php\?videoid=([\w]+)[&\w;=+_-]*

And in the case of AME, I can put $p1 in the replacement HTML to get the 'movie' id which in this case is MzEwODYwfDg2NzY0Ng.

nJoy
Attached Images
File Type: jpg step-1.jpg (112.0 KB, 170 views)
File Type: jpg step-2.jpg (122.8 KB, 108 views)
File Type: jpg step-3.jpg (127.7 KB, 113 views)
File Type: jpg step-4.jpg (132.1 KB, 94 views)
File Type: jpg step-5.jpg (133.7 KB, 78 views)
File Type: jpg step-6.jpg (138.0 KB, 83 views)
File Type: jpg step-7.jpg (141.0 KB, 122 views)
Reply With Quote