Smarty Forum Index Smarty
The discussions here are for Smarty, a template engine for the PHP programming language.

Smarty Validate issue validating email addresses

 
Post new topic   Reply to topic    Smarty Forum Index -> Add-ons
View previous topic :: View next topic  
Author Message
ubaldo
Smarty Regular


Joined: 21 Apr 2003
Posts: 35
Location: Barcelona, Spain

PostPosted: Tue Sep 28, 2004 10:13 am    Post subject: Smarty Validate issue validating email addresses Reply with quote

The code (snippet from Smarty Validate below) that checks email addresses is pretty good but if fails to detect that foo:bar@xxx.com is not valid email address. Well, at least qmail fails to deliver to addresses with : in them (I would think it's not a valid one)

I've been trying to find the official spec (they may even provide a regex) but no luck yet.


Code:

if (   preg_match('!@.*@|\.\.|\,!', $email)
         || !preg_match('!^.+\@(\[?)[a-zA-Z0-9\.\-]+\.([a-zA-Z]{2,4}|[0-9]{1,3})(\]?)$!', $email))
      {
            return false;
        }
   return true;

_________________
Ubaldo Huerta
http://madrid.loquo.com
http://rentals.loquo.com
Back to top
View user's profile Send private message Visit poster's website
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7365
Location: Lincoln Nebraska, USA

PostPosted: Tue Sep 28, 2004 2:36 pm    Post subject: Reply with quote

Give CVS a try, I updated the isEmail criteria to use the regex given by Jeffrey Friedl in Mastering Regular Expressions:

http://public.yahoo.com/~jfriedl/regex/email-opt.pl
Back to top
View user's profile Send private message Visit poster's website
ubaldo
Smarty Regular


Joined: 21 Apr 2003
Posts: 35
Location: Barcelona, Spain

PostPosted: Tue Sep 28, 2004 10:10 pm    Post subject: Reply with quote

Thank you mohrt, for tackling the problem

I got the code from CVS

Code:

       // regex taken from Jeffrey Freidl e-mail validation example
       // http://public.yahoo.com/~jfriedl/regex/email-opt.pl
       $_regex = '[\040\t]*(?:\([^\\\x80-\xff\n\015()]*
(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*
(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])
|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff]
[^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*
(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])
|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*(?:\([^\\\x80-\xff\n\015()]*
(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*@[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|
\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])
|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]
|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*
(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])
|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]
|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*
|(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])
|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037]*
(?:(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")
[^()<>@,;:".\\\[\]\x80-\xff\000-\010\012-\037]*)*<[\040\t]*(?:\([^\\\x80-\xff\n\015()]*
(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:@[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]
|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[
(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]
|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*
(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[
(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]
|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff]
[^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*
(?:,[\040\t]*(?:\([^\\\x80-\xff\n\015()]*
(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*
(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*@[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]
|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])
|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]
|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff]
[^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*
(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff]
[^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[
(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]
|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*)*:[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*
(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)?
(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|"[^\\\x80-\xff\n\015"]*
(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]
|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*
(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff]
[^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])
|"[^\\\x80-\xff\n\015"]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015"]*)*")[\040\t]*
(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*
(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)
[\040\t]*)*)*@[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]
|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+
(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])|\[
(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*
(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*
(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:\.[\040\t]*(?:\([^\\\x80-\xff\n\015()]*(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*
(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*
(?:[^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff]+(?![^(\040)<>@,;:".\\\[\]\000-\037\x80-\xff])
|\[(?:[^\\\x80-\xff\n\015\[\]]|\\[^\x80-\xff])*\])[\040\t]*(?:\([^\\\x80-\xff\n\015()]*
(?:(?:\\[^\x80-\xff]|\([^\\\x80-\xff\n\015()]*(?:\\[^\x80-\xff][^\\\x80-\xff\n\015()]*)*\))
[^\\\x80-\xff\n\015()]*)*\)[\040\t]*)*)*>)';
   
      if(!preg_match("/^$_regex$/x", $email))
      {
            return false;
        }

       return true;


and took it for a spin. If correctly detects the foo:bar@yyy.com isn't valid. However, if incorrectly says that aaa@bbb is valid.

This is certainly a hard problem. You see regex examples everywhere (even in paper books, get exited for a while, and later realized that it doesn't cover all cases.

The regex in SmartyValidate is pretty good, but as I said, it lets some addresses through. I've abandoned all hope to figure out the regex(s) myself, after spending quite some time.

It's also hard to believe that there isn't one widely understood and tested regex to do this.
_________________
Ubaldo Huerta
http://madrid.loquo.com
http://rentals.loquo.com
Back to top
View user's profile Send private message Visit poster's website
boots
Administrator


Joined: 16 Apr 2003
Posts: 5611
Location: Toronto, Canada

PostPosted: Tue Sep 28, 2004 10:49 pm    Post subject: Reply with quote

Actually, I do believe aaa@bbb is valid, at least on local domains since bbb can be a valid host on a local domain.
Back to top
View user's profile Send private message
ubaldo
Smarty Regular


Joined: 21 Apr 2003
Posts: 35
Location: Barcelona, Spain

PostPosted: Tue Sep 28, 2004 11:37 pm    Post subject: Reply with quote

You may also be right about bbb being a valid host. Most regex take into account that bbb part could be an IP address so it's only conceivable that any string that's a valid host name is allowed.

Now, the issue here is one of form validation though. Basically, trying to make sure that the user "bob" isn't making a mistake entering form data. Almost 99% of the time in regular web apps (not intranets, etc), you don't want to allow bbb hostnames, that is, you want to tell the user that bob@hotmail is wrong and bob@hotmail.com is right

Would it be possible to have both flavors of the regex, one for the normal case (meaning vainilla things gear to normal web apps like xxx@yyy.com, that is, the intended audience of SmartyValidate) and the super special one, for serious (or intranet) apps that need xxx@yyy to be valid (which it's a legitimate case, granted).

Am I making any sense here?
_________________
Ubaldo Huerta
http://madrid.loquo.com
http://rentals.loquo.com
Back to top
View user's profile Send private message Visit poster's website
boots
Administrator


Joined: 16 Apr 2003
Posts: 5611
Location: Toronto, Canada

PostPosted: Wed Sep 29, 2004 1:06 am    Post subject: Reply with quote

I'm inclined to agree. Without trying to tweak that GNARLY regex (which looks machine generated, anyhow), it is possible to get the desired effect if we assume that anything that passes the regex will have at minimum: valid username, @, valid hostname. So all we have to do is ensure that the hostname has at least one period and we are set.

However, as we are talking validation, you really may want to take additional steps such as validating the top-level domain against known domains. Another step would be to try to open a socket against the hostname to see if it is reachable. I think that's all a little over-kill, though and better left to an offline process.

I'd say that if anyone really needed a local host type of validation that it can either be added via a flag or maybe a separate plugin.

[php:1:98200d0e3d]<?php
if (!preg_match("/^$_regex$/x", $email)){
return false;
} else if (!strpos(substr($email, strpos($email, '@')+1), '.')) {
return false;
} else {
return true;
}
?>[/php:1:98200d0e3d]
I suppose I could have used array functions to split the string, I suppose, but I suspect that this is a faster test, particularly since I'm not interested in the value of the host here. I didn't test this expression though Smile

NOTE: Ubaldo, For the sake of this thread, I took the liberty of splitting that regex in your post at somewhat arbitrary points and added the 'x' modifier so as to ignore whitespace. That way messages should read a little more clearly. Feel free to remove it altogether since it is available in the CVS and I'd be very surprised if anyone can grok the whole of it anyhow. You get the award, though, for posting the most outrageous regex!

BTW, I learned something: I would have thought that in "/^$_regex$/" PHP would be confused by the trailing $ since it is in double quotes. I guess it ignores $ in double quotes if it is not followed by a letter.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Smarty Forum Index -> Add-ons All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Protected by Anti-Spam ACP