Discussion:
Replication problem
Ve (HOME)
2014-08-19 08:34:58 UTC
Permalink
Hi,

I'm using mdbox and replication. Due to configuration error
synchronization was not able to be done last week. Since then
the problem has been corrected but synchronisation for some mailbox
always failed in I/O stalled timeout for 600 seconds.

The link between the two servers is quite slow and multiple sync are
done in parallel leading to congested link.

I can't replicate with rsync as change in the mdbox has been done on the
two servers to get back to a working state.

What do you think could be done to resynchronized the two dovecot
server. Another question is what is this timeout ? Timeout of
communication i.e no data received during 600 seconds ( to me that looks
unlikely ) or 600 seconds for doing the full sync or 600 seconds for one
mail sync.

Thanks for any help.

Regards,

Vincent
Vincent ETIENNE
2014-09-09 23:04:25 UTC
Permalink
Le 19/08/2014 10:34, Ve (HOME) a ?crit :
> Hi,
>
> I'm using mdbox and replication. Due to configuration error
> synchronization was not able to be done last week. Since then
> the problem has been corrected but synchronisation for some mailbox
> always failed in I/O stalled timeout for 600 seconds.
>
> The link between the two servers is quite slow and multiple sync are
> done in parallel leading to congested link.
>
> I can't replicate with rsync as change in the mdbox has been done on
> the two servers to get back to a working state.
>
> What do you think could be done to resynchronized the two dovecot
> server. Another question is what is this timeout ? Timeout of
> communication i.e no data received during 600 seconds ( to me that
> looks unlikely ) or 600 seconds for doing the full sync or 600 seconds
> for one mail sync.
>
> Thanks for any help.
>
> Regards,
>
> Vincent
>
Hi

After some digging, the problem is this 600 seconds timeout that in my
case is unsuffisant to transfer one big mail. So retry and ..; same
result.. and again and again

I have verify with strace that data is exchange continuously during the
sync between the two host but i can't succed in uploading the file
during that time.

Is there a way to configure this timeout ?

Eventually a manual sync with a larger timeout to restore replication
before limiting maximum size in postfix ?

Possibly a feature would be to have a shorter timeout but applied to the
transmission ( ie. nothing receive during 30 sec = timeout )
or a timeout compuited base on size ( ie. 300 sec for 10 mo for example)

Any help appreciated

Vincent
Teemu Huovila
2014-09-10 09:56:49 UTC
Permalink
On 09/10/2014 02:04 AM, Vincent ETIENNE wrote:
> After some digging, the problem is this 600 seconds timeout that in my
> case is unsuffisant to transfer one big mail. So retry and ..; same
> result.. and again and again
>
> I have verify with strace that data is exchange continuously during the
> sync between the two host but i can't succed in uploading the file
> during that time.
>
> Is there a way to configure this timeout ?
>
> Eventually a manual sync with a larger timeout to restore replication
> before limiting maximum size in postfix ?
>
> Possibly a feature would be to have a shorter timeout but applied to the
> transmission ( ie. nothing receive during 30 sec = timeout )
> or a timeout compuited base on size ( ie. 300 sec for 10 mo for example)
>
> Any help appreciated
Currently there is no way to change it at run time. As a quick fix, if you compile your own Dovecot, you could try modifying
DSYNC_IBC_STREAM_TIMEOUT_MSECS in src/doveadm/dsync/dsync-ibc-stream.c . I think that is the timeout you are bumping up against.

br,
Teemu Huovila
Vincent ETIENNE
2014-09-10 10:49:27 UTC
Permalink
Le 10/09/2014 11:56, Teemu Huovila a ?crit :
> On 09/10/2014 02:04 AM, Vincent ETIENNE wrote:
>> After some digging, the problem is this 600 seconds timeout that in my
>> case is unsuffisant to transfer one big mail. So retry and ..; same
>> result.. and again and again
>>
>> I have verify with strace that data is exchange continuously during the
>> sync between the two host but i can't succed in uploading the file
>> during that time.
>>
>> Is there a way to configure this timeout ?
>>
>> Eventually a manual sync with a larger timeout to restore replication
>> before limiting maximum size in postfix ?
>>
>> Possibly a feature would be to have a shorter timeout but applied to the
>> transmission ( ie. nothing receive during 30 sec = timeout )
>> or a timeout compuited base on size ( ie. 300 sec for 10 mo for example)
>>
>> Any help appreciated
> Currently there is no way to change it at run time. As a quick fix, if you compile your own Dovecot, you could try modifying
> DSYNC_IBC_STREAM_TIMEOUT_MSECS in src/doveadm/dsync/dsync-ibc-stream.c . I think that is the timeout you are bumping up against.
>
> br,
> Teemu Huovila
>
Thanks will try and keep you inform of the result. May take some time (
i am not compiling dovecot now )
Really thanks because for now my replication is broken and so mail are
not receive for some user depending on
the instance of dovecot they connect....


Vincent ETIENNE
Teemu Huovila
2014-09-10 11:02:42 UTC
Permalink
On 09/10/2014 01:49 PM, Vincent ETIENNE wrote:
> Le 10/09/2014 11:56, Teemu Huovila a ?crit :
>>Currently there is no way to change it at run time. As a quick fix, if you compile your own Dovecot, you could try modifying
>> DSYNC_IBC_STREAM_TIMEOUT_MSECS in src/doveadm/dsync/dsync-ibc-stream.c . I think that is the timeout you are bumping up against.

> Thanks will try and keep you inform of the result. May take some time (
> i am not compiling dovecot now )
> Really thanks because for now my replication is broken and so mail are
> not receive for some user depending on
> the instance of dovecot they connect....
Cancel that advice. Timo did a change that should make changing the timeout by hand unnecessary. If you can, try running Dovecot
with this patch http://hg.dovecot.org/dovecot-2.2/rev/647162da8423. There should be no time outs, even for large mails.

Do you get any error messages, when there is a timeout?

br,
Teemu Huovila
Ve (HOME)
2014-09-10 13:28:01 UTC
Permalink
Le 2014-09-10 13:02, Teemu Huovila a ?crit?:
> On 09/10/2014 01:49 PM, Vincent ETIENNE wrote:
>> Le 10/09/2014 11:56, Teemu Huovila a ?crit :
> Cancel that advice. Timo did a change that should make changing the
> timeout by hand unnecessary. If you can, try running Dovecot
> with this patch http://hg.dovecot.org/dovecot-2.2/rev/647162da8423.
> There should be no time outs, even for large mails.
>
> Do you get any error messages, when there is a timeout?
>
> br,
> Teemu Huovila


Have tested with the patch from Timo ( applied to 2.2.13 version ) and
have successfully synchronized with
a mail double the size of the mail that causes trouble before. So the
changes looks correct.
But i have not tested that the timeout occured if the link is down or
broken.

Thanks a lot for the quick response. Very helpful.

Vincent ETIENNE
Continue reading on narkive:
Loading...