This discussion has been locked.

You can no longer post new replies to this discussion. If you have a question you can start a new discussion

how to make expression filter case INdependent

dear readers,

is there a way to match all various cases of a word (viagra for example) using one expression, or not? (then I would have to enter viagra, Viagra, vIagra, viAgra, etc, etc, etc.... (you get the idea)

Case independent matching seems MUCH more logical to me than case dependent... It's the actual word that triggers the rule, not the capitalisation it happens to be in... right..?

This thread was automatically locked due to age.

Parents

0 MagicMike over 21 years ago

http://www.regular-expressions.info/modifiers.html
Cancel
Vote Up 0 Vote Down

Cancel
0 mourik_jan over 21 years ago in reply to MagicMike

ok, thanks very much, that works.

For people having the same problem, here's the solution:

(?i)viagra

this example matches any case of viagra (Viagra, viAgra, whatever)

Still think that this should really be the standard behaviour...

Thanks for helping out!
Cancel
Vote Up 0 Vote Down

Cancel
0 SecApp over 21 years ago in reply to mourik_jan

Then sm probably works too.

Now we can search for dangerous HTML elements...
Cancel
Vote Up 0 Vote Down

Cancel
0 mourik_jan over 21 years ago in reply to SecApp

hmm... I'm not that clever... could you post an example..?
Cancel
Vote Up 0 Vote Down

Cancel

Reply

0 mourik_jan over 21 years ago in reply to SecApp

hmm... I'm not that clever... could you post an example..?
Cancel
Vote Up 0 Vote Down

Cancel

Children

0 SecApp over 21 years ago in reply to mourik_jan

Since I just found this out, I don't have a formal example ready to roll;
but, for example, if you wanted to disable script "behaviors" in HTML that use attributes beginning with on... (scriptlet">). Or for that matter, what if I want to knock out a scipt block?? (...) Whenever this things spanned multiple lines, the damn default behavior of the parser would not look beyong the line the match started on. So a simple RegEx would not catch the following:

     <>...
                onLoad="scriptlet">

or...


.
.
.


Also, if you were using sed or awk to strip script from HTML, and had multiple script blocks in your HTML page (which often happens):

...
.
.
.
some HTML markup you want to let through...
.
.
.
...

a match for something that starts with  and ends with  would find the most outer matching pair (a.k.a., a "greedy" match), and end up gobbling up the HTML markup you wanted to let through (you would end up with a blank page, which I have seen incidentally when people were using AntiVirus products...). With ifs (and some of the other new additions shown at that site), I can regulate the behavior to get the closest pairings and knock them out...

AND:

You can do this with MIME encodings too.
So now you can block dangerous mime application types (well, you always could, if you knew this newer additional switch...)

P.S. You could write a program using Perl or gcc to do any of this parsing and/or reformatting, but a collection of RegExes is more elegant, portable, and open to inspection...
Cancel
Vote Up 0 Vote Down

Cancel
0 mourik_jan over 21 years ago in reply to SecApp

right...! [:)]

interesting, thanks for taking the time to explain!

yours,
mourik jan
Cancel
Vote Up 0 Vote Down

Cancel
0 mourik_jan over 21 years ago in reply to mourik_jan

OOPS. there is still a problem with my 'solution' to filter mail case independant. BECAUSE:
"(?i)cialis" also matches this string INSIDE other words. Meaning: blocking cialis (a viagra alternative, from what i gather..?) also blocks mails containing the word "specialising", as it contains "cialis".

So... any idea how to add a space before and after the term..? this doesn't work:

(?i) viagra
(i've simply put a space before and after)

Suggestions?
Cancel
Vote Up 0 Vote Down

Cancel
0 mourik_jan over 21 years ago in reply to mourik_jan

found it: \b matches a word boundary, so

(?i)\bcials\b

does the trick

also this works:
(?i)\btest test\b

macthes only "test test"

Am i right..? [:)]portant..?)

Thanks anyway.
Cancel
Vote Up 0 Vote Down

Cancel
0 venom over 21 years ago in reply to mourik_jan

Hmmm, it already IS case independant.

Go to the expression filter and enter "vaigrA", then put the word VIAGRA or viagra in the subject line or the message body, and it will match every time for me.

It scans the subject line and the message body and is not case sensitive.
Cancel
Vote Up 0 Vote Down

Cancel
0 MagicMike over 21 years ago in reply to venom

Hmm, I have to say - I have a an expression "viagra". It didn't catch neither "VIAGRA", "ABCVIAGRASSSS", "viagrA" nor "v1@gr@" - the last one was rather unexpected though. I catched only "viagra" on my 5 phrase test. Don't know if this is how it was planned but I guess it's never too much to be rather safe than sorry.
Cancel
Vote Up 0 Vote Down

Cancel
0 SecApp over 21 years ago in reply to MagicMike

Filtering for spam only gives you temporary relief; they do 'verticals', all sorts of punctuation permutations, spacing, and let's not forget images... (-though relief nonetheless that your users will probably appreciate for a few weeks...)

That's why I was more interested in its application for rudimentary (potential) malicious code content inspection...

Thanks for the tip!
Cancel
Vote Up 0 Vote Down

Cancel
0 MagicMike over 21 years ago in reply to SecApp

SecApp, the online help gives this kind of example for the HTTP Proxy custom content scanning:
(?i)\]*\>
So you're on the track. You can also block the scripts with HTTP Proxy built-in feature. Or is it lacking something?
Cancel
Vote Up 0 Vote Down

Cancel
0 SecApp over 21 years ago in reply to MagicMike

Obviously did not see that in the help!

There are always trick new spots to stuff script being found out.

Also, I wonder how the script detection handles foreign encodings? Some RegExes support detection for alphabetics with [:alpha:]; what happens when you have a tag made with foreign (unicode) characters? Some American or German browsers will step it down to characters (and consequently tags) you may not want to allow. So does the script detection cover this? Dunno...

Maybe the script detection covers it all, but it's nice to know when they post a new exploit on BugTraq, we can now do something...
Cancel
Vote Up 0 Vote Down

Cancel
0 synopex over 21 years ago in reply to SecApp

hi you all,

this discuss is very interesting. in the past (~6years) i was in many mailingslists with different emailadresses and today i get many spam mails. filter on my mta and asl helps me to drop the most, but the idiots of spammers found every day new solutions for spam. last mail was "\/I@gra" with slash and backslash. the problem is, i cannot enter all possible caracters which build a "bad" word.

lets view how this discuss go to end.

by
Cancel
Vote Up 0 Vote Down

Cancel