Monique Y. Mudama | 1 Feb 2011 02:00

automated mailbox trimming?

Hello all,

If this is answered in the wiki, I apologize.  Please point me to it
and I'll be on my way ...

This may be more of a MailDir-generic question, except that as I
understand it there's some wiggle room in the naming of the message
files.

I've just switched from an mbox-based mail system to Dovecot with
MailDir.  With mbox, I used logrotate to keep some of my mailboxes
(spam, sent, stuff like that) down to only the most recent messages.

What's the best way to go about mailbox trimming with dovecot's
implementation of maildir?  I was thinking of writing a script to simply
move or delete old files, but would I mess up dovecot's expectations
for directory/file structure that way?

My mail directories are in my user's home directory.

I'm fine with writing a script or application (time permitting, of
course), but before I do so, is there already a solution out there?

In case it's relevant:

$ dovecot --version
1.2.15
$ dovecot -n
# 1.2.15: /etc/dovecot/dovecot.conf
# OS: Linux 2.6.37 x86_64 Debian 6.0
(Continue reading)

Timo Sirainen | 1 Feb 2011 02:07
Picon
Picon
Favicon

Re: automated mailbox trimming?

On 1.2.2011, at 3.00, Monique Y. Mudama wrote:

> I've just switched from an mbox-based mail system to Dovecot with
> MailDir.  With mbox, I used logrotate to keep some of my mailboxes
> (spam, sent, stuff like that) down to only the most recent messages.
> 
> What's the best way to go about mailbox trimming with dovecot's
> implementation of maildir?

http://wiki.dovecot.org/Plugins/Expire although it's a bit annoyingly complex with v1.x. I'm
guessing you don't have all that many users, so v2.0 would make this simpler.

>  I was thinking of writing a script to simply
> move or delete old files, but would I mess up dovecot's expectations
> for directory/file structure that way?

There's "v1.0 cronjob equivalent" in the wiki page too. It'll work fine, no messing up Dovecot.

Stan Hoeppner | 1 Feb 2011 02:36

Re: Best filesystem?

Ron Leach put forth on 1/31/2011 4:06 AM:

>> git.kernel.org - linux/kernel/git/torvalds/linux-2.6.git/blob -
>> Documentation/filesystems/xfs.txt
>> 175         "fs.xfs.xfssyncd_centisecs     (Min: 100  Default: 3000  Max: 720000)
>> 176         The interval at which the xfssyncd thread flushes metadata
>> 177         out to disk.  This thread will flush log activity out, and
>> 178         do some processing on unlinked inodes.
>> 179 "
> 
> http://git.kernel.org/?p=linux/kernel/git/torvalds/linux-2.6.git;a=blob;f=Documentation/filesystems/xfs.txt;hb=HEAD
> 
> 
> Why does this period in the UPS availability-time matter?  Because the UPS
> available-time has, of course, to first be allocated to the application machines
> to close their applications, before the file servers can be asked to 'commit'
> any delayed allocations and close down themselves (I don't want the file servers
> to close down while Dovecot (and any other applications) still have relevant
> data yet unwritten to the file-servers).

You need to read all about the xfssyncd thread before you jump to conclusions
about what the above patch actually does, and does not, do, and how that may or
may not relate to your specific concerns here.

Note that when you do a shutdown, the kernel will flush all buffers, and XFS
will automatically push all writes to disk.  The patch you're looking at above,
IIRC, is a house cleaning parameter, not a normal operations parameter.  For
instance, if you create 10k directories that's not going to fit in the XFS log.
 Once it fills up, the inodes at the head of the log will start getting flushed
to disk, and new inodes will start coming in the tail of the log.  If
(Continue reading)

Timo Sirainen | 1 Feb 2011 02:49
Picon
Picon
Favicon

Re: Best filesystem?

On 31.1.2011, at 12.34, Ron Leach wrote:

>> MainConfig - Dovecot Wiki 
> > "fsync_disable = no
> > Don't use fsync() or
>> fdatasync() calls. This makes the performance better at the cost of
>> potential data loss if the server (or the file server) goes down."
> 
> http://wiki1.dovecot.org/MainConfig
> 
> Is mail_fsync a v2 item?  We're using Dovecot v1, for now.  Presumably
> 
> fsync_disable = no
> 
> is the default, so that fsyncs take place?

Right.

> As I understand it, Dovecot rebuilds its indexes if they become corrupted and, if that's the case, then
there is no filesystem vulnerability in respect of those.  We're using maildir.  How soon after each mail
message is written, moved, renamed, etc, does Dovecot issue fsyncs?  Is there much 'commit-delay' up to
that point, which might be a vulnerability window?

Success isn't returned to dovecot-lda or IMAP APPEND call until the mail has been fsynced. As long as the
disk doesn't lie and the filesystem doesn't lie, there is zero data loss when fsyncing isn't disabled with Dovecot.

> Finally, and I do apologise for all the questions, we're wishing to move to NFS.  (At the moment we have a 'one
box' Dovecot solution, but this makes upgrade of OS, upgrade of Dovecot, or upgrade of storage always a
problem.  We have already exported the new XFS filestore over NFS - but Dovecot is not (yet) using it, that's
the next step for us.) Does the fsync solution we've been discussing work just as well when the XFS
(Continue reading)

Stan Hoeppner | 1 Feb 2011 04:11

Re: Best filesystem?

Frank Cusack put forth on 1/31/2011 3:06 PM:
> On 1/30/11 5:07 PM -0600 Stan Hoeppner wrote:
>> To be clear, for any subscribers who haven't followed all of the various
>> filesystem and data security threads, with any modern *nix system, you
>> WILL lose data when power fails.  How much depends on how many writes to
>> disk were in flight when the power failed, and how one has their RAID
>> controller and inside-the-disk caches configured, whether using barriers,
>> etc.
> 
> That's incorrect.  When you fsync() a file, all sane modern filesystems
> guarantee no data loss, unless you tune that out administratively for
> performance reasons.  If you use a log structured filesystem (like zfs
> or WAFL) you can optimize the performance as well.  With other types
> of filesystems (like xfs), performance suffers severely under heavy
> sync write loads.

This depends on how the dev does his syncs.  If done intelligently, XFS
performance won't suffer.  In fact, the preferred write method to XFS for high
performance applications is using O_DIRECT.  Using O_DIRECT, correctly, with
XFS, actually _increases_ write performance versus going through the buffer
cache.  So you get the best of both worlds:  higher performance and data
guaranteed on disk.

But not all applications use fsync, O_DIRECT, et al.  The point I was making is
that on any general system, you will likely have some applications/daemons
writing without fsync or O_DIRECT, so you will likely suffer some data loss when
the plug is pulled or the kernel crashes.  If the timing of the crash is right
you can even lose data when using fsync.  Depends on how busy the system is and
how many synced writes are in flight when the power drops.  There truly aren't
any guarantees that data will always be on disk.  There are always corner cases
(Continue reading)

Stan Hoeppner | 1 Feb 2011 04:27

Re: Best filesystem?

Frank Cusack put forth on 1/31/2011 3:13 PM:
> On 1/30/11 5:07 PM -0600 Stan Hoeppner wrote:
>> To be clear, for any subscribers who haven't followed all of the various
>> filesystem and data security threads, with any modern *nix system, you
>> WILL lose data when power fails.
> 
> No, you won't, at least not necessarily.
> 
> I know I'm replying with just about the same content multiple times
> but there are multiple messages where you are spreading this
> misinformation.
> 
> It is possible to configure a file system to not suffer from data
> loss on power loss, and for mail stores that is generally the
> desired behavior.

Maybe not every time, but it should surely motivate OPs to look at their power
continuity solution(s).

Even using fsync et al, you can still lose data with power loss.  It all depends
on what is in flight where, on which bus or cable, and whether the pulses made
it to the platters.  fsync is a best effort.  It can't guarantee all the
hardware was able to play its part correctly before the electrons stopped
flowing to the disk head actuator or spindle motor.

This is common sense.  Anyone with the slightest knowledge of electricity and
background in electronics, and working with computers for any amount of time,
should realize this.

There is no 100% guarantee.  This is one reason why the massive power backup
(Continue reading)

Stan Hoeppner | 1 Feb 2011 04:37

Re: Best filesystem?

Ron Leach put forth on 1/31/2011 5:00 PM:

> What does everyone else do?  Lose emails?

No.  We have decent power backup systems and management interfaces so systems
don't abruptly lose power.  We also use good hardware with a good mainline
kernel driver track record.

I think you forgot about the two failure scenarios that you need to worry about
in this thread:

1.  Kernel/system crash
2.  Power loss

If you're using decent hardware with decent drivers that have been fleshed out
over the years in mainline, you can forget #1.  If you have decent UPS units,
management interfaces, and shutdown software, you don't need to worry about #2.

After those two are covered, the only thing you need to worry about is hardware
going flaky.  In that case, nothing will save you but good backups.  Thankfully
most hardware today is pretty reliable (system boards, HBAs, etc).

--

-- 
Stan

Frank Cusack | 1 Feb 2011 04:37

Re: Best filesystem?

On 1/31/11 9:11 PM -0600 Stan Hoeppner wrote:
> Frank Cusack put forth on 1/31/2011 3:06 PM:
>> That's incorrect.  When you fsync() a file, all sane modern filesystems
>> guarantee no data loss, unless you tune that out administratively for
>> performance reasons.  If you use a log structured filesystem (like zfs
>> or WAFL) you can optimize the performance as well.  With other types
>> of filesystems (like xfs), performance suffers severely under heavy
>> sync write loads.
>
> This depends on how the dev does his syncs.  If done intelligently, XFS
> performance won't suffer.  In fact, the preferred write method to XFS for
> high performance applications is using O_DIRECT.  Using O_DIRECT,
> correctly, with XFS, actually _increases_ write performance versus going
> through the buffer cache.  So you get the best of both worlds:  higher
> performance and data guaranteed on disk.

Most applications don't work well with O_DIRECT.  O_DIRECT is meant
as a tunable for write-mostly applications and a few other specific
classes.  A mail store is decidedly not in that class of application.
As a data point, zfs (and all log structured filesystems) does not
support O_DIRECT because it doesn't make sense given the on-disk
layout -- there is no performance benefit to be had.

> But not all applications use fsync, O_DIRECT, et al.  The point I was
> making is that on any general system, you will likely have some
> applications/daemons writing without fsync or O_DIRECT, so you will
> likely suffer some data loss when the plug is pulled or the kernel
> crashes.  If the timing of the crash is right you can even lose data when
> using fsync.  Depends on how busy the system is and how many synced
> writes are in flight when the power drops.  There truly aren't any
(Continue reading)

Frank Cusack | 1 Feb 2011 04:40

Re: Best filesystem?

On 2/1/11 3:49 AM +0200 Timo Sirainen wrote:
> fsync() makes sure that the data is sent to NFS server. I don't know if
> NFS protocol itself has a fsync() call that guarantees that the data is
> written on disk on the server, but I very much doubt it does. So I don't
> think NFS will help with any data guarantees.

I think it was Stan that pointed this out, sorry if that's a misattribution,
but all calls in v2 are synchronous, v3 and v4 have specific calls (which
are invoked by a local fsync()) which the NFS protocol requires that
the data be committed to disk, so that fsync() semantics are preserved.

But as I noted earlier, if the filesystem lies to the NFS stack, or
the NFS stack intentionally lies to the client, this may not be true.

As far as the NFS protocol is concerned though, there is such a call.

Frank Cusack | 1 Feb 2011 04:42

Re: Best filesystem?

On 1/31/11 9:27 PM -0600 Stan Hoeppner wrote:
> Frank Cusack put forth on 1/31/2011 3:13 PM:
>> On 1/30/11 5:07 PM -0600 Stan Hoeppner wrote:
>>> To be clear, for any subscribers who haven't followed all of the various
>>> filesystem and data security threads, with any modern *nix system, you
>>> WILL lose data when power fails.
>>
>> No, you won't, at least not necessarily.
>>
>> I know I'm replying with just about the same content multiple times
>> but there are multiple messages where you are spreading this
>> misinformation.
>>
>> It is possible to configure a file system to not suffer from data
>> loss on power loss, and for mail stores that is generally the
>> desired behavior.
>
> Maybe not every time, but it should surely motivate OPs to look at their
> power continuity solution(s).
>
> Even using fsync et al, you can still lose data with power loss.  It all
> depends on what is in flight where, on which bus or cable, and whether
> the pulses made it to the platters.  fsync is a best effort.  It can't
> guarantee all the hardware was able to play its part correctly before the
> electrons stopped flowing to the disk head actuator or spindle motor.
>
> This is common sense.  Anyone with the slightest knowledge of electricity
> and background in electronics, and working with computers for any amount
> of time, should realize this.
>
(Continue reading)


Gmane