Discussion:
[Dovecot] Dovecot and NFS with file locking
Nate Sanders
2006-05-02 04:57:12 UTC
Permalink
Greetings all,

I'm trying to get an understanding of a problem we are facing here.
We're currently running dovecot 1.0-beta3 and have a long standing issue
of system crashes on our mail server (Debian Linux 2.4.27-2-k7-smp).

Here's what is happening:

The machine hangs and the system load climbs as high as 80.0+. Yet, the
system response is not effected. Command line is still responds
instantly. There are multiple running dovecot PIDs, even if I stop the
service. If I try to kill or -9 the PIDs, they will not die. The machine
is DOA and must be forcefully restarted. Issuing a reboot will cause the
machine to hang when it attempts to unmount network shares.


Here's the setup:

- Dovecot 1.0-beta3
- lock_method = dotlock
- mmap_disable = yes

/var/mail is store locally on the mail server and access via NFS to ALL
remote machines. All remote machines have /var/mail sym linked to the
NFS share on Mail.

/home on Mail is NFS'd to another set of servers where IMAP mail folders
reside in mbox format. All client machines have /home sym linked to the
second NFS server.

In other words, there's a lot of NFS shares and one mail transaction can
involve 3 machines.


What I'm trying to find out is the current state of NFS locking with
Dovecot. This system hang happens 1-3 times a week. The current /home
NFS mounts are running from SGI machines on IRIX 6.5. Clients are all
Linux (debian) 2.4 or Linux (ubuntu) 2.6.

Is our setup too much for Dovecot to handle? Are there other variables
we're not looking at here?

Thanks everyone.
--
==============================================
Nate Sanders ***@ima.umn.edu
Associate Systems Manager (612) 624 - 4353
http://www.ima.umn.edu/
==============================================
Institute for Mathematics and its Applications
University of Minnesota
400 Lind Hall, 207 Church St. SE
Minneapolis, MN 55455-0463
==============================================
Ben Winslow
2006-05-02 21:45:59 UTC
Permalink
Post by Nate Sanders
Greetings all,
I'm trying to get an understanding of a problem we are facing here.
We're currently running dovecot 1.0-beta3 and have a long standing issue
of system crashes on our mail server (Debian Linux 2.4.27-2-k7-smp).
The machine hangs and the system load climbs as high as 80.0+. Yet, the
system response is not effected. Command line is still responds
instantly. There are multiple running dovecot PIDs, even if I stop the
service. If I try to kill or -9 the PIDs, they will not die. The machine
is DOA and must be forcefully restarted. Issuing a reboot will cause the
machine to hang when it attempts to unmount network shares.
It sounds like NFS is dying one way or another -- likely due to a bug on
either the client side (you could try compiling a newer 2.4 or 2.6
kernel) or the server side (I know jack about NFS on IRIX.) If you look
at the tasks in ps or top, the 'state' column is probably 'D' indicating
an uninterruptible sleep (which usually means the process is hung
waiting for an IO request to complete.)

Are there any messages in the kernel log indicating NFS timeouts?
Specifying 'intr' in the nfs mount options might enable you to actually
kill the running dovecot processes, unmount, and remount, but that won't
solve your real problem.
--
Ben Winslow <***@bluecherry.net>

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 827 bytes
Desc: OpenPGP digital signature
Url : http://dovecot.org/pipermail/dovecot/attachments/20060502/abe5f216/signature-0001.pgp
Nate Sanders
2006-05-02 22:16:08 UTC
Permalink
Post by Ben Winslow
It sounds like NFS is dying one way or another -- likely due to a bug on
either the client side (you could try compiling a newer 2.4 or 2.6
kernel) or the server side (I know jack about NFS on IRIX.) If you look
at the tasks in ps or top, the 'state' column is probably 'D' indicating
an uninterruptible sleep (which usually means the process is hung
waiting for an IO request to complete.)
Are there any messages in the kernel log indicating NFS timeouts?
Specifying 'intr' in the nfs mount options might enable you to actually
kill the running dovecot processes, unmount, and remount, but that won't
solve your real problem.
Yeah the state ends up hung on the PIDs. Right now we're working to
migrate all users from these two IRIX machines to a 2.4 Linux NAS. From
there we will do additional testing before we try anything else. I'm
sure a lot of the issue is between Linux NFS and IRIX NFS.

We end up with quite a few NFS and lock messages in the logs. I'm sure
the setup is not ideal for the current maturity of NFS usage in dovecot,
but that's why I was trying to get a little additional info.
Post by Ben Winslow
So if you're using mboxes, what about mbox_read_locks and
mbox_write_locks? Maybe it helps if you change them to be dotlocks also.
I will look into these as well, thanks.
--
==============================================
Nate Sanders ***@ima.umn.edu
Associate Systems Manager (612) 624 - 4353
http://www.ima.umn.edu/
==============================================
Institute for Mathematics and its Applications
University of Minnesota
400 Lind Hall, 207 Church St. SE
Minneapolis, MN 55455-0463
==============================================
Timo Sirainen
2006-05-02 21:52:28 UTC
Permalink
Post by Nate Sanders
- Dovecot 1.0-beta3
- lock_method = dotlock
- mmap_disable = yes
So if you're using mboxes, what about mbox_read_locks and
mbox_write_locks? Maybe it helps if you change them to be dotlocks also.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 189 bytes
Desc: This is a digitally signed message part
Url : http://dovecot.org/pipermail/dovecot/attachments/20060502/ff82f6d8/attachment.pgp
Nate Sanders
2006-05-03 02:52:02 UTC
Permalink
Post by Timo Sirainen
Post by Nate Sanders
- Dovecot 1.0-beta3
- lock_method = dotlock
- mmap_disable = yes
So if you're using mboxes, what about mbox_read_locks and
mbox_write_locks? Maybe it helps if you change them to be dotlocks also.
Here is some more info on lock methods used on the system.

mail:# postconf -d|grep mailbox_delivery_lock
mailbox_delivery_lock = fcntl, dotlock

mail:# grep mbox_read_locks /etc/dovecot/dovecot.conf
mbox_read_locks = fcntl

mail:# grep mbox_write_locks /etc/dovecot/dovecot.conf
mbox_write_locks = fcntl dotlock
--
==============================================
Nate Sanders ***@ima.umn.edu
Associate Systems Manager (612) 624 - 4353
http://www.ima.umn.edu/
==============================================
Institute for Mathematics and its Applications
University of Minnesota
400 Lind Hall, 207 Church St. SE
Minneapolis, MN 55455-0463
==============================================
Loading...