Smarty Forum Index Smarty
The discussions here are for Smarty, a template engine for the PHP programming language.
option to turn off cache directory sub-dividing
Goto page 1, 2  Next
 
Post new topic   Reply to topic    Smarty Forum Index -> Feature Requests
View previous topic :: View next topic  
Author Message
orrd
Smarty Rookie


Joined: 29 Jun 2010
Posts: 11

PostPosted: Wed Mar 07, 2012 7:58 pm    Post subject: option to turn off cache directory sub-dividing Reply with quote

Hi. Last month I mentioned in a forum thread that I'd life to have the option to turn off the automatic dividing up of cache folders (/0f/0d/2e/0f0d2ef2...) because I'm using cache groups to do my own sub-dividing (I can do it more efficiently myself since I know the structure of the site).

It's a large site that generates as many as 100,000 cached pages, and I don't want the additional filesystem overhead of having so many nested sub-directories for pages where it isn't necessary to divide them up in that way.

Is there a possibility of having the option to turn off the automatic sub-dividing of cache files into nested directories (/0f/0d/2e/0f0d2ef2...) ?
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7289
Location: Lincoln Nebraska, USA

PostPosted: Wed Mar 07, 2012 8:02 pm    Post subject: Reply with quote

That subdividing only happens if you intentionally turned on $smarty->use_sub_dirs = true. By default Smarty does not make sub folders.
Back to top
View user's profile Send private message Visit poster's website
orrd
Smarty Rookie


Joined: 29 Jun 2010
Posts: 11

PostPosted: Wed Mar 07, 2012 8:23 pm    Post subject: Reply with quote

Oh, I entirely forgot that use_sub_dirs was an option that I had turned on.

But if I turn off use_sub_dirs, will it still divide up the cache into subdirectories based on my cache groups?
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7289
Location: Lincoln Nebraska, USA

PostPosted: Wed Mar 07, 2012 8:28 pm    Post subject: Reply with quote

iirc, it does break down the caches by subdir based on group. give it a test.
Back to top
View user's profile Send private message Visit poster's website
orrd
Smarty Rookie


Joined: 29 Jun 2010
Posts: 11

PostPosted: Thu Mar 08, 2012 3:51 am    Post subject: Reply with quote

I just tested it, and unfortunately, no. Apparently with use_sub_dirs off, it just puts everything in the root directory. Instead of putting cache groups in their own directories, it just names the files with carrots separating the group names like "groupName^subgroupName^filename.html", etc.

So I'm looking for a way to have cache groups go into their own sub-directories, but without Smarty also sub-dividing the cache up beyond that.

So, any other ideas? I'm using version 3.1.7 by the way.
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7289
Location: Lincoln Nebraska, USA

PostPosted: Thu Mar 08, 2012 2:40 pm    Post subject: Reply with quote

I believe it is an all or nothing approach: subdirs for everything or not. However, have you actually benchmarked the system when use_sub_dirs is enabled? The filesystem should actually be faster at locating a deeply nested directory rather than scanning a single directory full of files. IMHO it is a red herring, it's not going to make any measurable difference even if you could split things up the way you want. If you have more than 10,000 cache files, I'd recommend enabling use_sub_dirs, otherwise leave it off. Don't worry yourself about the file structure in that folder.

FYI you could also find the logic in Smarty where the folders are divided and created and ignore the use_sub_dir setting just for templates, or make up your own structure.
Back to top
View user's profile Send private message Visit poster's website
orrd
Smarty Rookie


Joined: 29 Jun 2010
Posts: 11

PostPosted: Thu Mar 08, 2012 5:47 pm    Post subject: Reply with quote

I believe it is necessary that we simplify the directory structure of the cache as much as possible because operations like clearing the cache now take an excessively long time (more than 30 minutes on our site with around 70,000 cached pages). It appears that the depth and complexity of the directory structure is a cause of that problem, because before use_sub_dirs was on and everything was in a single directory, that wasn't an issue (clearing the cache would take about a minute). But we do have to have use_sub_dirs on because we ran into a Linux / ext3 limit on the number of files in a directory, and I do think it can be more efficient with the right directory structure.

We're using cache groups to create our own efficient directory structure that is only one or two directories deep, and only where needed, but enough to ensure there aren't too many files in any one directory. If we can eliminate the extra 3 unneeded directories of depth that Smarty adds on top of that for every file, that should bring down the overhead of cache operations significantly.

By the way, I really think the 3 level depth that use_sub_dirs adds to the cache directory structure is excessive for any site ("/0f/0d/2e/..."). A site with have to have 16 million cached pages before that structure would average more than 1 single page per directory. A site would have to have many billions of pages before that depth of nested directories would really be necessary.

Anyway, it sounds like my only options are to hope a future version has a feature to use sub dirs, but turn off the automatic directory subdividing. Or, I think it's possible to make a modified cache handler plugin that could do it right?
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7289
Location: Lincoln Nebraska, USA

PostPosted: Thu Mar 08, 2012 5:55 pm    Post subject: Reply with quote

So you are saying a removal of 70,000 files in one directory is faster than the removal of 70,000 files in nested directories? What are you using to remove them?

If you are using ext3 I think there is a limit of 32,000 inodes per directory. This is easily fixed by updating to ext4, virtually unlimited.

Another tid bit that my be of use... if you are trying to clear 70,000 cache files on a "live" system you will most certainly encounter race conditions. This is going to take some time to delete no matter what you do. What I have done in the past: First move the entire cache folder to a new name, say cache-delete, and quickly recreate a new (empty) cache folder. Then start the removal process on the moved folder. This will make the cache clearing "instant" to the web server, and the deletion process can carry on its merry way in the background. So from a unix command prompt, its something like:

Code:
mv cache cache-delete; mkdir cache; chown apache:apache cache; rm -rf cache-delete &


As for the smarty dir structure, we currently do not provide an interface to customize this. You could extend the object and rewrite the method, or just edit the source. I'll add this to the TODO to take a look at.

[edit] yes a custom cache handler could be created to do whatever you like, bu that requires setting up the entire handler, not just file structure.
Back to top
View user's profile Send private message Visit poster's website
U.Tews
Administrator


Joined: 22 Nov 2006
Posts: 4686
Location: Hamburg / Germany

PostPosted: Thu Mar 08, 2012 7:53 pm    Post subject: Reply with quote

It's the old behaviour of Smarty2 that sub_dirs the cache filepath contain also levels based on the filename hash.

Indeed this does not make too much sense because it is not very likely that a very large number of different template files will used with the same cache_id.

The $smarty->clear() method does slow down because of the number of directory levels which needs to be processed by the RecursiveDirectoryIterator.

I have it now on my TODO list to remove the file name hash from the sub_dirs levels.
Back to top
View user's profile Send private message
orrd
Smarty Rookie


Joined: 29 Jun 2010
Posts: 11

PostPosted: Thu Mar 08, 2012 9:19 pm    Post subject: Reply with quote

Quote:
I have it now on my TODO list to remove the file name hash from the sub_dirs levels.


Yay!

Quote:
Indeed this does not make too much sense because it is not very likely that a very large number of different template files will used with the same cache_id.


Wow... I was assuming the directory hash was based on the cache_id. But I just checked it, and you're right, it's actually based on the template filename. Wow, it really doesn't make any sense to at all to be doing that.

Edit: I figured out how to disable it in the Smarty PHP code (at least it seems to work...). For the benefit of anyone who is reading this thread and wants to disable cache directory hashing for now until it's removed from the next update, edit libs/sysplugins/smarty_resource.php libs/sysplugins/smarty_internal_cacheresource_file.php and remove or comment-out these lines from both files:

Code:
        if ($_template->smarty->use_sub_dirs) {
            $_filepath = substr($_filepath, 0, 2) . DS
             . substr($_filepath, 2, 2) . DS
             . substr($_filepath, 4, 2) . DS
             . $_filepath;
        }     


Update: In order to make clearCache() work properly, you also have to change this line in smarty_internal_cacheresource_file.php:

Code:
$_compile_id_offset = $smarty->use_sub_dirs ? 3 : 0;


To this:

Code:
$_compile_id_offset = 0;


Last edited by orrd on Wed Apr 04, 2012 3:45 am; edited 2 times in total
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7289
Location: Lincoln Nebraska, USA

PostPosted: Thu Mar 08, 2012 9:24 pm    Post subject: Reply with quote

Just a note... yes it is unlikely two pages will get the same cache_id (should be never) however the directory structure uses only the first two chars of the hash, then the next two, etc... so the chances of two hashes starting with the same two chars is actually quite high with tens of thousands+ of cache files. That was the initial thinking anyways.
Back to top
View user's profile Send private message Visit poster's website
U.Tews
Administrator


Joined: 22 Nov 2006
Posts: 4686
Location: Hamburg / Germany

PostPosted: Thu Mar 08, 2012 9:35 pm    Post subject: Reply with quote

Quote:
Just a note... yes it is unlikely two pages will get the same cache_id (should be never) however the directory structure uses only the first two chars of the hash, then the next two, etc... so the chances of two hashes starting with the same two chars is actually quite high with tens of thousands+ of cache files. That was the initial thinking anyways.


But it makes no sense has the hash is build just of the filename. The cache_id did already create upper levels in the directory structure.
Back to top
View user's profile Send private message
mohrt
Administrator


Joined: 16 Apr 2003
Posts: 7289
Location: Lincoln Nebraska, USA

PostPosted: Thu Mar 08, 2012 9:53 pm    Post subject: Reply with quote

Yes it needs revisited. The main reason behind using a hash is to make directory names filesystem safe regardless of what is used to generate them.
Back to top
View user's profile Send private message Visit poster's website
palypster
Smarty n00b


Joined: 22 Mar 2012
Posts: 1

PostPosted: Thu Mar 22, 2012 3:27 pm    Post subject: Reply with quote

[quote="U.Tews"]
The $smarty->clear() method does slow down because of the number of directory levels which needs to be processed by the RecursiveDirectoryIterator.
[/quote]

Can you please explain me, where am I wrong?

If user specify $resourceName - Why do you need to proccess directory levels? It seems to me, that you know exactly the path of the file (you know the $resourceName, so you know the hash and also you know the cacheId so you know also the right directory, if use_sub_dirs is used) you want to delete, so I don't get it, why you can't just unlink it directly.

If user does not specify $resourceName, just $cacheId that containt $groups (= sub-dirs), you can remove that directory (recoursively).

No iterating trough whole cached dir needed. Where am I wrong?
Back to top
View user's profile Send private message
U.Tews
Administrator


Joined: 22 Nov 2006
Posts: 4686
Location: Hamburg / Germany

PostPosted: Thu Mar 22, 2012 4:16 pm    Post subject: Reply with quote

The problem are calls where you specify the template but no cache_id.
This should delete all files for that templates. So we have to iterate over all subfolders created by the cache_id. It's the question to find an intelligent order for the subfolder name parts.

Anyway I found some resonable solution which will be implemented in the next major release 3.2 we are working on.
Back to top
View user's profile Send private message
Display posts from previous:   
Post new topic   Reply to topic    Smarty Forum Index -> Feature Requests All times are GMT
Goto page 1, 2  Next
Page 1 of 2

 
Jump to:  
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot edit your posts in this forum
You cannot delete your posts in this forum
You cannot vote in polls in this forum


Powered by phpBB © 2001, 2005 phpBB Group
Protected by Anti-Spam ACP