Simon Tyler | 1 Jun 09:14 2005

Re: IMAP4 List Command Extensions

 

Looks like almost the same idea I had just with slightly different syntax.

 

My main driver behind this is to get UIDVALIDITY and UIDNEXT back. I fully understand Mark's issue with performance - getting the number of recent messages back could be inefficient. Perhaps if we just limited to the two UID values then these performance issue would go away?

 

Simon


---- Message from Alexey Melnikov <alexey.melnikov <at> isode.com> at 31-May-2005 18:02:00 ------

Simon Tyler wrote:

> Indeed the protocol should consider a wide variety of implementations.
> Such as one which has no large overhead in obtaining the extra
> information I was proposing. Our client will clearly backdown to using
> a STATUS command should the extended LIST response not be found.
>
>  
>
> I fully understand all the issues, that is why I felt it was a
> reasonable extension - servers which have issues returning the UID
> information in short timescales would obviously not choose to add in
> such an extension. Those that can obtain the information could do so.
>
Exactly.

I've posted a similar proposal some times ago:

<http://www.imc.org/ietf-imapext/mail-archive/msg02245.html>

which I intend to submit as a draft, once the LIST-EXTENDED is done.



Mark Crispin | 1 Jun 20:44 2005

Re: IMAP4 List Command Extensions


On Wed, 1 Jun 2005, Simon Tyler wrote:
> My main driver behind this is to get UIDVALIDITY and UIDNEXT back. I 
> fully understand Mark's issue with performance - getting the number of 
> recent messages back could be inefficient. Perhaps if we just limited to 
> the two UID values then these performance issue would go away?

It won't.

UIDNEXT is not necessarily in a fixed metadata value; if new mail has been 
appended then it may be necessary to go through a UID assignment step in 
order to calculate UIDNEXT.  [This same problem afflicts the UIDPLUS 
extension; but in a COPY or APPEND command it may be alright if it takes 
a little bit longer.]

UIDVALIDITY is normally constant, but it could be affected by the UIDNEXT 
step if it's determined that something is fouled up in the mailbox.

The bottom line is that unless the mail store has an index of all messages 
for the mailbox (and legacy mail stores do not), these are expensive 
calculations.

It gets worse.

Even opening a file to sniff at metadata can be costly.  The very first 
versions of UW imapd did that 15 years ago, to discriminate between 
mailbox files and non-mailbox files.  This was removed because it resulted 
in a huge performance penalty.  It may be only a fraction of a second, but 
when doing a LIST command that fraction of a section per mailbox adds up 
to several minutes.  Or worse.

This isn't as much of a problem on modern filesystems as in the past, but 
the problem has not been exterminated.

Even inode metadata from stat() can be expensive.  For example, a timing 
test on my home directory shows that "ls -RF" takes about twice as long as 
"ls -R", even when all the directories are in the buffer cache.  That's 
much better than 10 years ago, but still is not good.

Small systems with few users have less of a problem that large systems 
with tens or hundreds of thousands of users.

Dedicated mail stores designed for IMAP only use, with its own internal 
hierarchy that does not attempt to mirror a filesystem hierarchy, also 
have less of a problem.  [The problem with mirroring a filesystem 
hierarchy, especially on systems used for more than just IMAP, is that 
users make symlinks that create filename loops.]

But saying that "the problem can be reduced in some circumstances" does 
not equate to "the problem does not exist."

A well-designed client works well in a less than ideal world.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.

Simon Tyler | 2 Jun 09:45 2005

Re: IMAP4 List Command Extensions

 

I'm afraid I still disagree on this.

 

Whether or not a server stores UIDNEXT or calculates it is a server implementation decision. It is irrelevant when designing a protocol. A well designed server will implement the specification in an efficient way, if calculation is expensive then it will store it as metadata, if accessing the metadata is expensive it will cache the metadata. Moulding a specification to a perceived implementation issue (such as performance of obtaining UIDNEXT) will simply result in a poor specification. If we follow your logic then we can equally argue that the SUBSCRIBED option in the draft command extension can also be expense - how expensive is it to decide if a folder is subscribed? I would argue it is implementation dependent.

 

Simon


---- Message from Mark Crispin <mrc <at> CAC.Washington.EDU> at 01-Jun-2005 11:44:10 ------

On Wed, 1 Jun 2005, Simon Tyler wrote:
> My main driver behind this is to get UIDVALIDITY and UIDNEXT back. I
> fully understand Mark's issue with performance - getting the number of
> recent messages back could be inefficient. Perhaps if we just limited to
> the two UID values then these performance issue would go away?

It won't.

UIDNEXT is not necessarily in a fixed metadata value; if new mail has been
appended then it may be necessary to go through a UID assignment step in
order to calculate UIDNEXT.  [This same problem afflicts the UIDPLUS
extension; but in a COPY or APPEND command it may be alright if it takes
a little bit longer.]

UIDVALIDITY is normally constant, but it could be affected by the UIDNEXT
step if it's determined that something is fouled up in the mailbox.

The bottom line is that unless the mail store has an index of all messages
for the mailbox (and legacy mail stores do not), these are expensive
calculations.

It gets worse.

Even opening a file to sniff at metadata can be costly.  The very first
versions of UW imapd did that 15 years ago, to discriminate between
mailbox files and non-mailbox files.  This was removed because it resulted
in a huge performance penalty.  It may be only a fraction of a second, but
when doing a LIST command that fraction of a section per mailbox adds up
to several minutes.  Or worse.

This isn't as much of a problem on modern filesystems as in the past, but
the problem has not been exterminated.

Even inode metadata from stat() can be expensive.  For example, a timing
test on my home directory shows that "ls -RF" takes about twice as long as
"ls -R", even when all the directories are in the buffer cache.  That's
much better than 10 years ago, but still is not good.

Small systems with few users have less of a problem that large systems
with tens or hundreds of thousands of users.

Dedicated mail stores designed for IMAP only use, with its own internal
hierarchy that does not attempt to mirror a filesystem hierarchy, also
have less of a problem.  [The problem with mirroring a filesystem
hierarchy, especially on systems used for more than just IMAP, is that
users make symlinks that create filename loops.]

But saying that "the problem can be reduced in some circumstances" does
not equate to "the problem does not exist."

A well-designed client works well in a less than ideal world.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.


Arnt Gulbrandsen | 2 Jun 10:01 2005
Picon

Re: IMAP4 List Command Extensions


You seem to be pushing "quality" into the server implementation realm 
and pretending that a protocol itself has no quality.

But it has. A protocol can good, in which case it's easy to implement 
efficiently, and it can be bad, in which case it usually is difficult 
to implement efficiently.

Arnt

Simon Tyler | 2 Jun 10:17 2005

Re: IMAP4 List Command Extensions

 

I would like to see quality in both realms. I don't see that my UIDNEXT\UIDVALIDITY detracts from protocol quality but  actually adds to it. I think it is sufficiently easy to implement efficiently that it can be included. It is also optional, so a server that cannot implement efficienty it can ignore it.


Simon


---- Message from Arnt Gulbrandsen <arnt <at> gulbrandsen.priv.no> at 02-Jun-2005 10:01:52 ------

You seem to be pushing "quality" into the server implementation realm
and pretending that a protocol itself has no quality.

But it has. A protocol can good, in which case it's easy to implement
efficiently, and it can be bad, in which case it usually is difficult
to implement efficiently.

Arnt


Arnt Gulbrandsen | 2 Jun 10:12 2005
Picon

Re: IMAP4 List Command Extensions


Let me step back. AFAICT, there are two approaches to this:

1. Client sends LIST. When an interesting LIST response arrives, it 
sends a STATUS command. As the STATUS commands' responses arrive, the 
client has the information it wants.

2. Client sends LIST with some extension. For each mailbox matching the 
supplied pattern, the server does the same work that STATUS would do, 
and sends the result in a LIST response.

Do I understand it correctly?

Alternative 1 has a cost of two roundtrips, plus traffic costs for the 
LIST responses and for the responses to STATUS, plus CPU costs for 
LIST, STATUS and quite a bit of parsing.

Alternative 2 has a cost of one roundtrip, plus traffic for the LIST 
responses, plus CPU cost for LIST including STATUS.

If we assume that there's 100 LIST responses and 10 interesting 
mailboxes, then 1 costs one roundtrip more and 2 has 90 unnecessary 
STATUS processings and 90 unnecessary transmitted UIDNEXT/UIDVALIDITY 
pairs. (And of course, a client that implements 2 has to implement 1 as 
well, or else it won't work with plain servers.)

Hm. I suppose I must misunderstand terribly, because I cannot see any 
appeal in 2 at all.

Arnt

Arnt Gulbrandsen | 2 Jun 10:16 2005
Picon

Re: IMAP4 List Command Extensions


Simon Tyler writes:
> I would like to see quality in both realms. I don't see that my 
> UIDNEXT\UIDVALIDITY detracts from protocol quality but  actually adds 
> to it. I think it is sufficiently easy to implement efficiently that 
> it can be included. It is also optional, so a server that cannot 
> implement efficienty it can ignore it.

It's easy to implement efficiently for an IMAP server that has exclusive 
control over its storage. It's difficult to implement efficiently for 
other IMAP servers (uw-imapd, dovecot, binc, courier et al.) and not 
too easy for clients, either, because they have to implement a fallback 
in addition, and test both.

Arnt

Timo Sirainen | 2 Jun 10:46 2005
Picon
Picon

Re: IMAP4 List Command Extensions

On Thu, 2005-06-02 at 10:16 +0200, Arnt Gulbrandsen wrote:
> Simon Tyler writes:
> > I would like to see quality in both realms. I don't see that my 
> > UIDNEXT\UIDVALIDITY detracts from protocol quality but  actually adds 
> > to it. I think it is sufficiently easy to implement efficiently that 
> > it can be included. It is also optional, so a server that cannot 
> > implement efficienty it can ignore it.
> 
> It's easy to implement efficiently for an IMAP server that has exclusive 
> control over its storage. It's difficult to implement efficiently for 
> other IMAP servers (uw-imapd, dovecot, binc, courier et al.)

I'd just like to point out that while it may be difficult, it's not
impossible. Dovecot has taken ages to get into v1.0 and it's still not
there, partially because it has attempted to implement support for
legacy stores in very efficient way.

I wouldn't mind implementing combined LIST+STATUS reply. The client I
use in fact already does STATUS for all mailboxes and I like it that
way, it's all over in less than a second (~300MB in 30 mboxes, with
non-Dovecot aware procmail appending mails).

Anyway, I agree that doing a LIST and sending STATUS to non-\Unmarked
mailboxes is better. That could however be implemented to the LIST
+STATUS extension if needed. Wouldn't save more than one roundtrip and a
few bytes of traffic though.

Better yet would be instant notification of mailboxes which have
changed, and client could then do STATUS for them if it wanted. Mark
said a year ago he was working on something for this, but I don't
remember anything coming out of it?

Mark Crispin | 2 Jun 14:41 2005

Re: IMAP4 List Command Extensions


On Thu, 2 Jun 2005, Timo Sirainen wrote:
> I wouldn't mind implementing combined LIST+STATUS reply. The client I
> use in fact already does STATUS for all mailboxes and I like it that
> way, it's all over in less than a second (~300MB in 30 mboxes, with
> non-Dovecot aware procmail appending mails).

30 is not very many mailboxes.

Consider what happens when there are 300 mailboxes, or 3000 mailboxes, or 
30,000 mailboxes.

There are users who have that many!  And we're not talking about tiny 10MB 
mailboxes either.

Plenty of seemingly-good ideas fail when they are scaled upwards a few 
orders of magnitude more than originally considered.

I can guarantee that those orders of magnitude higher scaling *will* 
occur.  It's a miracle that IMAP, a protocol designed at a time when total 
mailbox size was limited to 1MB, has managed to survive nearly 20 years. 
IMHO, a large reason for IMAP's durability was because IMAP was designed 
to be scalable.

> Anyway, I agree that doing a LIST and sending STATUS to non-\Unmarked
> mailboxes is better.

Indeed.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.

Mark Crispin | 2 Jun 15:14 2005

Re: IMAP4 List Command Extensions


On Thu, 2 Jun 2005, Simon Tyler wrote:
> Whether or not a server stores UIDNEXT or calculates it is a server 
> implementation decision. It is irrelevant when designing a protocol.

This is completely wrong.

For better or worse, Internet protocols are molded to accomodate existing 
implementations.  You will receive strong pushback on any proposed 
protocol or protocol enhancement whose impact is perceived as harmful to 
one or more implementations.  Complaints that "it's a poor specification" 
if your demands are not fufilled will fall on deaf ears.

In fact, what you want is to have IMAP molded to accomodate *your* client 
implementation.  Rather than fix your client so it doesn't do something as 
ill-conceived as a STATUS on every LISTed mailbox, you want to change the 
protocol so the server does the stupid thing for you automatically, all to 
save your client some RTTs.  It doesn't address the underlying problem.

The entire point of the Internet protocol architecture is a set of 
interoperable protocols which can be implemented across a wide variety of 
systems, including systems with legacy infrastructure.

The design of protocols for a perfect world, in which you never need to 
concern yourself with the other guy's problems, is not the realm of 
Internet.  It is the realm of proprietary protocols and systems.

In the Internet world, implementors MUST consider the impact of what their 
application does; they can not pretend that they are in a vacuum where 
only they count.  There will be strong pushback against any proposal that 
is perceived as allowing one vendor to count coup against another.

In the proprietary world, you don't have to worry about such things.  You 
control both the client and the server, and any other implementor with the 
temerity to implement your protocol must follow your rules.  At any time, 
you can change the protocol to hurt the other implementor, and you're 
motivated to do so because they are your competitor.

In the Internet world, you have to limit the protocol to what all 
implementations can do well, so that everybody can play ball on a level 
playing field.

In the proprietary world, you can fine-tune the protocol to optimize your 
implementation.  It's your ball and your field.

In the Internet world, interoperability is king, and everything else must 
submit to the demands of that king.

In the proprietary world, the owner is king, and that king receives 
similar submission.

You seem to want the best of both worlds, while paying the price of 
neither.  It doesn't work that way.

-- Mark --

http://staff.washington.edu/mrc
Science does not emerge from voting, party politics, or public debate.
Si vis pacem, para bellum.


Gmane