| » About NACS |
Thursday May 15th, 2008 :: Spring Quarter, Week 7 |
|
|
|
NACS Mailbox Services (NMS) slowness problems during Spring quarter 2007 were solved by upgrading the Network Appliance file server that provides incoming mail storage. Earlier in the year we had also upgraded processors and memory, and during the summer we added disk storage. We expected a higher load in the fall, but felt we had sufficient capacity to handle it.
However, not only did the size of the user community grow by about 6%, the intensity of usage by each individual increased as well (by 60%). In addition, due to a recent upgrade, the POP server no longer “cached” mailbox information in memory. This resulted in excessive file accesses when POP users checked for new mail frequently. We also discovered that the new operating systems on half of our Webmail servers had a flaw that prevented them from using all of their installed physical memory. This meant there was insufficient space to “cache” files in memory resulting in excessive loads on Webmail and file servers. Finally, it should be noted that disk quotas for home directories were increased over the summer as well, resulting in more disk reads/writes overall.
As a result of these factors, NMS users experienced significant system slowness while accessing their email during the first few days of the Fall quarter.
We have now implemented “rate limiting” to prevent POP clients from checking for new mail too rapidly, improving the availability of resources for other users. We have also corrected the operating system flaw mentioned above. Performance seems to be significantly improved since approximately Wednesday, October 10th.
In addition to problems with slow performance, NACS Mailbox Services were largely unavailable between approximately 4 PM and 6:30 PM on October 10th. A problem of unknown origin caused IMAP servers to develop large loads and become unresponsive. Resetting the IMAP servers did not correct the problem, which was eventually corrected by resetting additional NMS components. We are investigating, but the problem may reoccur.
If you use POP, please be sure the interval at which you check for new email is set to 5 minutes or greater. You should also consider moving to
IMAP, which has a more efficient way of checking for new mail and provides improved functionality for accessing mail folders from multiple computers.
We will be providing information in the future that will help you move from POP to IMAP.
If you use email software such as Thunderbird or Eudora, be sure you are running a recent, stable version. Often newer versions have important bug fixes that benefit reliability and performance, even if they do not provide any new functionality that you need.
If you are a Webmail user, you may want to consider using email software such as Thunderbird instead. You will likely find the Thunderbird user
interface easier to use, and it uses email server resources more efficiently than Webmail. Typically Thunderbird users experience adequate to good
response even when central servers are heavily loaded. For more information on installing Thunderbird, please see:
http://www.nacs.uci.edu/email/thunderbird.html
Other locally installed email software such as Eudora works more efficiently as well. In the long run, we will be looking for Webmail software that provides improved functionality in an efficient manner, as it is clear that many of our users prefer to access email through their Web browser.
NACS Webmail services provides email to over 20,000 students, faculty and staff each day. On average during each of the first 3 days of week 1 in fall 2007, we had 22,769 different individuals using Webmail. That breaks down to 15,604 undergraduates, 5,783 staff, 925 graduate students, and 447 faculty. These individuals connected to the service an average of 251,638 times each day, which is about 11 connections for each person. Total daily users are up 6% from a similar period last year (undergraduates are up 5%, faculty are up 81%). Total connections are up by 69%, and connections per user are up 60%.
In short, more students and other individuals are coming to UCI and using Webmail, more faculty, staff and graduate students are using Webmail instead of other options, and the usage per individual is increasing as well.
We have a sophisticated, high-capacity network of servers to handle the large email load: 10 Webmail, 3 IMAP, 3 POP, and 3 Mail Delivery servers. The servers depend on two high-speed Network Appliance file servers for storage (about 14 terabytes of disk space in total). Communication among servers and storage is provided through a series of 4 one gigabit/second switches on a private network. Last, but certainly not least, access to the NMS servers is controlled via a ServerIron application switch from Foundry Networks. The ServerIron is what selects one of the Webmail, IMAP, or POP servers for your use when you access email.
A diagram of NACS Mailbox Services is available here:
http://www.nacs.uci.edu/email/images/NACSMailboxServices.pdf
Over the last year and a half we have increased total NMS processing power by a factor of 3 or 4, and have doubled data read/write capacity. This handled the load well during the last part of Spring, 2007, but did not provide a large amount of additional headroom. This means overall NMS performance is susceptible to changes in user or software behavior. One example of this was the flaw mentioned before where operating systems on the Webmail servers were determined to have a bug that resulted in them not using all of their physical memory. This meant less memory was available for caching files and overall storage read/write activity was up across NMS, affecting everyone’s performance.
We will be making significant changes to the NMS architecture to fully accommodate campus growth and to provide adequate headroom to absorb future perturbations in load characteristics. We also want to provide a single quota space for each user and be able to handle very large inboxes and mail folders with good performance.
At the heart of the problem is our current use of the historical Unix “mbox” mailbox format for storing inboxes and mail folders. Simply put, the mbox format is a simple sequential file - each time a server changes a mail folder, it must read the file into memory, make the change, and write it all out again. Multiply the many message accesses each user does during any given moment by the 10,000 or more users simultaneously accessing files, and you have a challenging capacity problem.
We will be moving from mbox to a new mailbox format known as “MIX” which was developed at the University of Washington. MIX offers a series of improvements over the mbox format, allowing messages and status information to be accessed more efficiently.
Moving to MIX is a major task, as it does not work over the “Network File System” (NFS) protocol that our file servers currently use. To solve this problem, and to give us greater flexibility in giving end-users the performance and storage capacity they require, we are also moving to a concept of “island” mail servers. When people access NMS, they will be assigned to a server with fewer dependencies on other parts of NMS. In the long run, this will give us a large amount of flexibility in assigning resources to people to meet their particular requirements, and will provide greater isolation between groups of users so that unexpected problems in one area do not slow things down for everyone.
Our goal is to move a significant number of NMS users to the new MIX format on island servers by the end of 2007.
This document was created to give interested parties an overall idea of the challenges of running NACS Mailbox Services, and plans for the future. It cannot fully capture all of the issues involved, or all aspects of our enhancement plans. If you have questions, please write to John Mangrich (mangrich@uci.edu) or to Dana Roode (dana.roode@uci.edu).