Discussion:
LDA randomly failing to write email to disk
Andy Dills
2014-09-24 21:11:28 UTC
Permalink
We're using 2.2.13 with pigeonhole 0.4.3, in a clustered environment
(maildir on netapp, dual dovecot instances where each server is both a
proxy and a backend).

Every now and then (once a month per user, maybe?), users will see a blank
email in their inbox. Investigating further, and we will see that the only
information recorded in the maildir file for the message is the
Return-Path, the Delivered-To, and the first Received line (the one
generated by the local LDA via LMTP).

For example, here is what I found in one such email today:

Return-Path: <xxx at xecu.net>
Delivered-To: <yyy at xecu.net>
Received: from mail5.xecu.net ([10.0.1.85])
by mail2.xecu.net (Dovecot) with LMTP idX86eBjgaI1RdyQAA3SxDBg
for <yyy at xecu.net>; Wed, 24 Sep 2014 15:24:20 -0400

Everything past that is lost, as if the either the LDA on mail2 exited
early or was never sent the information from the dovecot instance on
mail5.

Here is a little more detail from the header of the same email, different
recipient (all other recipients received the email properly, only one
failed to receive properly):

Return-Path: <xxx at xecu.net>
Delivered-To: <yyz at xecu.net>
Received: from mail5.xecu.net ([10.0.1.85])
by mail1.xecu.net (Dovecot) with LMTP id KPh8ItMXI1StBAAA/c3zFg
for <yyz at xecu.net>; Wed, 24 Sep 2014 15:24:20 -0400
Received: from mail5.xecu.net
by mail5.xecu.net (Dovecot) with LMTP id QsUBFoQZI1RjhgAAXyr1JQ
; Wed, 24 Sep 2014 15:24:20 -0400
Received: from mail5.xecu.net (localhost [127.0.0.1])
by mail5.xecu.net (Postfix) with ESMTP id 30AAE323BB18;
Wed, 24 Sep 2014 15:24:20 -0400 (EDT)
...


This is what I see in the logs of mail2, which is where the failure
happened:

Sep 24 15:24:20 mail2 dovecot: lmtp(51549, yyy at xecu.net):
X86eBjgaI1RdyQAA3SxDBg: sieve: msgid=unspecified: stored mail into mailbox 'INBOX'

When I look at the logs of mail1, which is where the successful delivery
happened:

Sep 24 15:24:20 mail1 dovecot: lmtp(1197, yyz at xecu.net):
KPh8ItMXI1StBAAA/c3zFg: sieve: msgid=<20140924192412.1435.qmail at xxx.xecu.net>: stored mail into
mailbox 'INBOX'

Nothing of note seems to be logged on mail5 (only a message from postfix,
nothing from the proxy side of the lmtp).

I do notice, when I check for the PID of 51549 in the logs, all of its
other transactions seem to register with proper msgids and were delivered
fine.

Also, I notice plenty of other messages that have the msgid=unspecified
error, but which were delivered with no problems and not truncated, so I'm
suspecting what may be happening is that somehow the backend instance is
not receiving the actual data portion, and only getting the envelope from
proxy instance.

How do I approach debugging this? It's very infrequent, but yet quite
annoying. Seems to have started since we upgraded to 2.2.13 (from an older
2.1 build) earlier this year.

Thanks,
Andy

---
Andy Dills
Xecunet, Inc.
www.xecu.net
301-682-9972
---
Timo Sirainen
2014-10-03 14:14:59 UTC
Permalink
Post by Andy Dills
We're using 2.2.13 with pigeonhole 0.4.3, in a clustered environment
(maildir on netapp, dual dovecot instances where each server is both a
proxy and a backend).
Every now and then (once a month per user, maybe?), users will see a blank
email in their inbox. Investigating further, and we will see that the only
information recorded in the maildir file for the message is the
Return-Path, the Delivered-To, and the first Received line (the one
generated by the local LDA via LMTP).
This is fixed in hg. I guess I'll just have to make v2.2.14 release soon.
Larry Rosenman
2014-10-03 14:20:29 UTC
Permalink
which HG Rev fixes this?
Post by Timo Sirainen
Post by Andy Dills
We're using 2.2.13 with pigeonhole 0.4.3, in a clustered environment
(maildir on netapp, dual dovecot instances where each server is both a
proxy and a backend).
Every now and then (once a month per user, maybe?), users will see a
blank
Post by Andy Dills
email in their inbox. Investigating further, and we will see that the
only
Post by Andy Dills
information recorded in the maildir file for the message is the
Return-Path, the Delivered-To, and the first Received line (the one
generated by the local LDA via LMTP).
This is fixed in hg. I guess I'll just have to make v2.2.14 release soon.
--
Larry Rosenman http://www.lerctr.org/~ler
Phone: +1 214-642-9640 (c) E-Mail: larryrtx at gmail.com
US Mail: 108 Turvey Cove, Hutto, TX 78634-5688
Andy Dills
2014-10-03 14:43:24 UTC
Permalink
Post by Timo Sirainen
This is fixed in hg. I guess I'll just have to make v2.2.14 release soon.
Thanks Timo, I had given up hope, and was starting to question if maybe I
was having hardware issues.

Should I feel hesitant about rolling out a fresh build from hg into
production? Would I be better off waiting for an official 2.2.14?

Thanks,
Andy

---
Andy Dills
Xecunet, Inc.
www.xecu.net
301-682-9972
---

Loading...