[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Sporadic 502 errors with Zimbra proxy

On 01/11/2014 01:14 πμ, Steve Hillman wrote:
> Hi folks,
>   Friday afternoon/evening (and Halloween to boot) is probably not the
> best time to send this out, but I'm just wondering if anyone has seen
> any issues under Zimbra 8 with occasional short-lived 502 errors using a
> proxied setup. The errors last precisely 1 minute and only seem to
> affect connections to one of our 4 mailbox servers.
> At first I thought we might be hitting up against zimbraHttpNumThreads,
> which was set to 1000 on our servers. However, increasing that to 2000
> on the affected server had no effect
> Then I thought for sure I'd nailed it -- the new DoSFilter in Jetty that
> was introduced in Zimbra 8 was logging errors about DoS attacks
> detected, with the target IP being our front-end proxy servers. So I
> changed zimbraHttpDosFilterMaxRequestsPerSec from 30 to 10000. I didn't
> see the errors get logged to zmmailboxd anymore, but I still get the
> same 502 errors. 
> The errors happen at random times during the day (but always during the
> day when the server is busiest), but they always last for precisely 1
> minute (determined by grepping for " 502 1310" in nginx.access.log,
> which is the error code and number of bytes that Nginx returns when this
> happens. The error codes show up in log lines for exactly a minute). The
> 1 minute thing seems really suspicious, and does suggest that a
> server-side back off is happening but on the other hand, during that 1
> minute, the affected server still seems to process requests, so it's not
> like *all* requests get blocked, but most of them do (such that if I
> keep selecting messages in my web client, they all fail with a 502 error)
> So before I open a ticket with Support, anyone seen this already and
> solved it?

We had sporadic 502 errors using zimbra 8.0.x and took us a long time to
pin-point the cause.

We tried all the things you mentioned, we tried the suggestions of
https://bugzilla.zimbra.com/show_bug.cgi?id=80135, yet the 502 errors
kept coming at random times.

In the end we spotted that when the errors occurred, in
log/zmmailboxd.out of one particular mailboxd server, garbage collection
was reported to take a long time..

"Total time for which application threads were stopped: 106.9952890
seconds" (upto 190 seconds).

That's a long time for all threads to freeze. In our case, it was caused
by a third party zimlet, ZxBackup of ZeXtras.

So check your log/zmmailboxd.out and your GC zm-stats for long GC runs.
