[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Ldap replica high cpu issue



Hi Tim,

Related to overall galsync performance:

In general, you probably want to leave zimbraGalSyncMaxConcurrentClients set to default of 2. Perhaps experiment with going as high as 4, but any further is likely to significantly slow down the rate of each galsync, which defeats the purpose of adding more concurrent galsync clients. The concurrent clients is only restricted during the update of the GAL metadata (mostly just the Contact IDs, rather than the entire GAL data set).

In general, recommendations for improving galsync performance:
1. Definitely upgrade all ZCO 5 and 6 clients to current version - this is very important, as ZCO 7 allows for trickle sync and trickle deletion (rather than reloading the entire GAL during updates)

2. Set zimbra_mailbox_galsync_cache to the size of the GAL, for example:

$ zmlocalconfig -e zimbra_mailbox_galsync_cache=100000

The number is derived from the number of items currently in the GAL sync account (set it slightly higher). We do not expect any significant negative impact on performance, due to the small amount of memory used.

3. If possible, put the galsync account on a mailstore by itself. This relieves memory pressure from use with other active accounts, and improves performance.

Sincerely,
-thom


----- Original Message -----
From: "Tim Ross" <tross@calpoly.edu>
To: "zimbra-hied-admins" <zimbra-hied-admins@sfu.ca>
Sent: Tuesday, August 21, 2012 1:46:45 PM
Subject: Re: Ldap replica high cpu issue


I wanted to share this response with the list. I received two responses (which were basically the same) sent directly to me rather than the list, but I found them very helpful, so I wanted to make sure the info got into the list archives: 

Tim, 

Check for ZCO 5 and older 6 users. In Zimbra 7 you can move them to a COS that disables the ZCO. I think ZCO 5 was especially bad at hammering the LDAP replica server. That cleared up most of our issues with high load on the LDAP. 

To further improve auto complete times we have also moved the galsync account and logger host to their own dedicated mailstore with no accounts on it. We did this for deployments with 50k or more users. 

I'd bet that once you get ZCO 5 users to stop hammering you, you will see significantly better performance. 

Also, here's a snippet from our support case with Zimbra on this a few months back 

set_cachesize value for the db. 

set_cachesize 0 424370176 0 

Can we get it turned up to this? 

set_cachesize 0 536870912 1 

---------- 
And this: 

I just discussed the issue with a senior QA engineer. Apparently, it is very likely with a large GAL that we can see non-stop requests for GAL sync which leads to very high CPU utilization of slapd. You may also want to set zimbraGalSyncMaxConcurrentClients. 

zimbraGalSyncMaxConcurrentClients 
Maximum number of concurrent GAL sync requests allowed on the system / 
domain. 

type : integer 
value : 
callback : 
immutable : false 
cardinality : single 
requiredIn : 
optionalIn : domain,globalConfig 
flags : domainAdminModifiable,domainInherited 
defaults : 2 
min : 
max : 
id : 1154 
requiresRestart : 
since : 7.0.0 
deprecatedSince : 

<Respondee's name withheld in case they prefer to keep a low profile> 


----- Original Message -----

From: "Tim Ross" <tross@calpoly.edu> 
To: "zimbra-hied-admins" <zimbra-hied-admins@sfu.ca> 
Sent: Tuesday, August 21, 2012 9:55:25 AM 
Subject: Ldap replica high cpu issue 


I have been working with Zimbra Support on this issue, but haven't dug up the root cause yet. I wanted to throw this out to the list to see if any of you may have experienced this or have some suggestions on what we might look at / settings to tweak. 

We just upgraded from ZCS 6.0.14 NE to 7.2.0. The Monday after the upgrade users started reporting seeing "Server Slow to Respond" warnings (higher percentage of these were IE users). With a little looking around we found that our ldap replica server's cpu load had increased dramatically post-upgrade. Pre-upgrade we ran less than 1 (when checking "top"). Post-upgrade, we would run 5-7 with spikes up to 10 and 11. We created a Gal Sync account on the mailstores and that helped a little. We tweaked a couple other settings and those helped a minor amount also. We now run 3-4 with spikes up to 7-9 about once or twice a day during the heavy usage times. We have 8 cpus and 16 GB of RAM on the ldap replica server, so even with these loads, we wouldn't really expect users to be receiving slow server notices. We are running Red Hat 5, 64 bit. We have approximately 30,000 accounts, but closer to 2,000 moderate to heavy users. All our other servers show very low loads, including the master ldap server. All the slapd process threads are running 200-400% cpu (again in the top, cpu% column) most of the time. 

We have gone through the Zimbra Large Deployment Performance Guide and made sure we followed the settings advice there as best we could. We installed Patch 1 for ZCS 7.2.0 and that didn't resolve this issue. 

Any ideas or suggestions? Any info I left out that would be useful? 

Thanks, 


Tim Ross 
Application Administrator 
Enterprise Applications Group 
Cal Poly State University, San Luis Obispo