Smarty Forum Index Smarty
The discussions here are for Smarty, a template engine for the PHP programming language.

Regarding smarty_mb_str_replace

 
Post new topic   Reply to topic    Smarty Forum Index -> Smarty Development
View previous topic :: View next topic  
Author Message
jrbeaure
Smarty n00b


Joined: 23 May 2014
Posts: 4

PostPosted: Tue Dec 09, 2014 6:22 pm    Post subject: Regarding smarty_mb_str_replace Reply with quote

What are the character encodings that Smarty should work with?

1. The str_replace function covers single-byte character encodings as well as UTF-8, and probably the other UTF encodings. (UTF schemes take advantage of the fact that ASCII was 7-bit in order to maintain backwards compatibility in this regard).

2. An example of a multi-byte character encoding that str_replace would not always work with is Shift-JIS. However, I can't find a concrete example, especially since I'm not Japanese.

3. The smarty_mb_str_replace function is using mb_split, which relies on the character encoding specified by mb_regex_encoding. Given that, I don't see how Smarty's mb_str_replace implementation is encoding-aware.

Why not just use vanilla str_replace?
Back to top
View user's profile Send private message
AnrDaemon
Administrator


Joined: 03 Dec 2012
Posts: 1183

PostPosted: Tue Dec 09, 2014 11:44 pm    Post subject: Reply with quote

Smarty has been built with UTF-8 in mind.
Back to top
View user's profile Send private message
U.Tews
Administrator


Joined: 22 Nov 2006
Posts: 5059
Location: Hamburg / Germany

PostPosted: Wed Dec 10, 2014 12:54 am    Post subject: Reply with quote

You can set Smarty's charset
Code:
Smarty::$_CHARSET = 'UTF-8';

which is the default.
Back to top
View user's profile Send private message
jrbeaure
Smarty n00b


Joined: 23 May 2014
Posts: 4

PostPosted: Thu Dec 11, 2014 11:50 pm    Post subject: Reply with quote

Okay, so regarding str_replace compatibility with UTF-8:

One byte characters are encoded as 0xxxxxxx. This is backwards-compatible with ASCII.

Two byte characters are encoded as 110xxxxx 10xxxxxx. The byte-sequence does not overlap with one byte characters

Three byte characters are encoded as 1110xxxx 10xxxxxx 10xxxxxx. The byte sequence does not overlap with either one-byte characters or two-byte characters (and so on).

In terms of UTF-8, this makes the function smarty_mb_str_replace pretty useless when the code could be using the faster str_replace function instead. As long as the developer isn't trying to mix character sets, there is no issue in using vanilla str_replace for UTF-8 support.

Additionally, modifiers that include a reference to smarty_mb_str_replace don't have a compiler version. That means for every reference to the replace modifier, at least one system call has to be made (for the include/require statement), and in some environments system calls can have a high cost in terms of efficiency (what made me look into this in the first place).

Edit: Basically what I'm trying to say here, is that I'd like to see smarty_mb_str_replace removed from the codebase and create compiler versions for modifiers such as the replace modifier.
Back to top
View user's profile Send private message
U.Tews
Administrator


Joined: 22 Nov 2006
Posts: 5059
Location: Hamburg / Germany

PostPosted: Fri Dec 12, 2014 3:32 am    Post subject: Reply with quote

Note that str_replace is unsafe when one of its parameter does contain not well formed UTF-8 strings.

See for example http://stackoverflow.com/questions/3786003/str-replace-on-multibyte-strings-dangerous

We had a couple issues reported by users. Thats why we choose a safe implementation.
Back to top
View user's profile Send private message
jrbeaure
Smarty n00b


Joined: 23 May 2014
Posts: 4

PostPosted: Fri Dec 12, 2014 6:23 pm    Post subject: Reply with quote

Ensuring the same character encoding (i.e. validating input) throughout is important (otherwise the code is inherently flawed), and there's a function for that which helps prevent invalid encoding attacks such as the link you provided describes.

http://php.net/manual/en/function.mb-check-encoding.php

Are you sure that smarty_mb_str_replace is the right solution? Is there a reason not to validate input when variables are being assigned to the template? I'm pretty sure that's been the general library solution for preventing SQL injection attacks (except in those instances they're "prepared statements" and not "templates").
Back to top
View user's profile Send private message
AnrDaemon
Administrator


Joined: 03 Dec 2012
Posts: 1183

PostPosted: Fri Dec 12, 2014 10:58 pm    Post subject: Reply with quote

The whole set of mb_* functions is one-sided and inherently flawed. I don't know, how you can call them "safer". The sole reason everyone's using them is because there seems to be no alternative.
From my experience, there's just no one reliable set of functions to work with all diverse encodings. You always need something from that other lib over there, in the end, your code is turned into an ugly pile of crap.
Back to top
View user's profile Send private message
jrbeaure
Smarty n00b


Joined: 23 May 2014
Posts: 4

PostPosted: Tue Dec 16, 2014 8:17 pm    Post subject: Reply with quote

AnrDaemon wrote:
The whole set of mb_* functions is one-sided and inherently flawed. I don't know, how you can call them "safer". The sole reason everyone's using them is because there seems to be no alternative.
From my experience, there's just no one reliable set of functions to work with all diverse encodings. You always need something from that other lib over there, in the end, your code is turned into an ugly pile of crap.


They work from a functionality standpoint. E.g. mb_strpos provides you character position rather than byte position, mb_convert_case works on all bicameral alphabetic characters (which include Latin, Greek, and Cyrillic alphabets), etc.

In this instance I'm talking about a performance enhancement. The counter-argument is a safety issue (invalid encoding attack). The counter-counter-argument is that mb_check_encoding can be used at a higher level to prevent that attack outright rather than it leaking into other functionality.
Back to top
View user's profile Send private message
AnrDaemon
Administrator


Joined: 03 Dec 2012
Posts: 1183

PostPosted: Tue Dec 16, 2014 9:01 pm    Post subject: Reply with quote

mb_check_encoding can be used to ensure data safety. Just don't make a mistake of using mb_detect_encoding. THAT one is pure evil.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Smarty Forum Index -> Smarty Development All times are GMT
Page 1 of 1

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Protected by Anti-Spam ACP