[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Bugs with mixed Zimbra 7/Zimbra 8 cluster



Rich,
    Happy hunting and good luck.    

1) When I performed my upgrade I just created new boxes that ran zimbra 7 migrated all the users to the new boxes and then upgraded the new machines to 8.0.4. The first thing I saw was that I was consuming more CPU and I was getting a lot of instability from my new machines. So what I ended up doing was disabling NIO for IMAP and for POP. (See BUG https://bugzilla.zimbra.com/show_bug.cgi?id=80878 and http://wiki.zimbra.com/wiki/IMAP_NIO ) I was getting messages like this as well.  


2013-09-09 08:43:04,641 ERROR [qtp1335237548-271491:http://127.0.0.1:80/service/soap/GetInfoRequest] [name=kpe1@psu.edu;mid=1274;oip=24.61.185.142;ua=zclient/8.0.4_GA_5739;] mailbox - Failed to lock mailbox
Lock Owner - qtp1335237548-271788:http://127.0.0.1:80/service/soap/GetInfoRequest prio=5 id=266794 state=BLOCKED
at com.zimbra.cs.account.cache.DomainCache.getByName(DomainCache.java:320)
at com.zimbra.cs.account.ldap.LdapProvisioning.getDomainByAsciiNameInternal(LdapProvisioning.java:2633)
at com.zimbra.cs.account.ldap.LdapProvisioning.getDomainByNameInternal(LdapProvisioning.java:2622)
at com.zimbra.cs.account.ldap.LdapProvisioning.getDomain(LdapProvisioning.java:2567)
at com.zimbra.cs.account.Provisioning.getDomain(Provisioning.java:390)
at com.zimbra.cs.account.AccessManager.checkDomainStatus(AccessManager.java:206)
...
...


Now this is bug is fixed but if you run into instability with high cpu utilization the first thing I would do is disable both.

nio_imap_enabled = false

nio_pop3_enabled = false


2) with the upgrade to 8, also comes with some galsync issues. You can create a galsync account on each machine that sync for a domain.  That part was a little tricky especially because it takes a while for the new galsync accounts to show up properly on the interface and on the machines, probably due to a split LDAP environment and if you have a multi node environment. Just look for that and also remember to sync your galsync accounts.

Because a lot of my customers were mad that their contacts were not automatically showing up when typing in the To:  field .


3) My customers ran into a lot of 502 errors from the web client after the upgrade as well.  (See BUG https://bugzilla.zimbra.com/show_bug.cgi?id=79707 ) That one is also fixed as well.  Here are the section of my upgrade notes on that explain.



For the 'zimbraHttpDosFilterDelayMillis', the recommended setting should be '0 = No delay'
It may be possible that the connections are being dropped by the mailbox due to the DoS filtering mechanism.

https://bugzilla.zimbra.com/show_bug.cgi?id=79707
DoSFilter: Add ldap attributes for delayMs and maxRequestsPerSec

zimbraHttpDosFilterDelayMillis - Delay imposed on all requests over the rate limit, before they are considered at all. -1 = Reject request, 0 = No delay, any other value = Delay in ms


Commands to run:

Set 'zimbraHttpDosFilterDelayMillis' to '0' so that no requests are rejected on your server with below command:


su - zimbra
zmprov mcf zimbraHttpDosFilterDelayMillis 0

Restart mailbox service

zmmailboxdctl restart


GOT THIS FROM SUPPORT


We've seen at customer sites a few relatively common problems that create timeouts and failures:

* Use of Firewall or NAT devices between proxies and mailstores that cause sessions to be automatically disconnected after some amount of time (e.g., 1m, 5m, 30m)
* Intermittent network problems
* Mailstore outages (e.g., server down, upgrades, hardware or virtualization problems, etc.)


Reconfiguring a few timeouts makes the proxies more resilient to these types of issues. These include the following:
# this will configure proxy to immediately reconnect on any failure
$ zmprov mcf zimbraMailProxyReconnectTimeout 0
$ zmprov ms `zmhostname` zimbraMailProxyReconnectTimeout 0 # if necessary, on each proxy


# this will configure proxy to ignore failures as regards to disconnects
$ zmprov mcf zimbraMailProxyMaxFails 0
$ zmprov ms `zmhostname` zimbraMailProxyMaxFails 0 # if necessary, on each proxy


Then, restart the processes on all proxies:
$ zmproxyctl restart


The key here is an assumption that marking a proxy as "down" or "bad" for upstream connections is never a good idea.
If the mailstore is down, even temporarily, then the proxies must try to reconnect as soon as possible or requested
in order to bring that user or mailstore back up as soon as it's available. These were the defaults.


$ zmprov gcf zimbraMailProxyReconnectTimeout
zimbraMailProxyReconnectTimeout: 60
$ zmprov gcf zimbraMailProxyMaxFails
zimbraMailProxyMaxFails: 1

 




Regards,
Pablo Garaitonandia
Penn State University
ITS, Administrative Information Services
(814) 865-6385
pablo@psu.edu


From: "Rich Graves" <rgraves@carleton.edu>
To: "zimbra-hied-admins" <zimbra-hied-admins@sfu.ca>
Sent: Friday, June 13, 2014 3:22:19 PM
Subject: Bugs with mixed Zimbra 7/Zimbra 8 cluster

We're looking at a possibly lengthy migration from Zimbra 7 to 8 (due to required platform and SAN switch).

What known (and unknown) bugs lurk in a mixed-version cluster?
-- 
Rich Graves <rgraves@carleton.edu>
Carleton.edu Sr UNIX and Security Admin