Skip Navigation

Monday November 23rd, 2009 » Fall Quarter, Week 9

 

UCI Storage Brick Project

Summary: The aim of the Storage Brick Project is to provide hardware and software specifications for a basis building block of pretty cheap, pretty reliable, pretty configurable storage that could be used as is, or used in multiples as part of a Storage Area Network.

Details of Storage Brick Hardware and Software

by Harry Mangalam (Please email me with comments, corrections, supplements, data pro and con.)  More discussion on this topic can be seen on the UC Computer Support List for June, 2006

From discussions with Computer Support personnel in other Schools, storage is more important than CPU cycles.  Not only is increasing storage required, but the kind of storage required is moving towards longer term, more robust storage.  Our Storage Brick project has as its goal to define a basic building block of storage for UCI, ~10-15 TB per brick, which can be used as is, or exported for use via a number of protocols, including SMB/CIFS, AppletalkIP, and NFS.

Storage Applications

This storage is to be used in applications which are NOT 'mission critical'.  Such applications would be anything related to finance (Payroll, benefits), medical records and human subject data, and anything in which the loss of such data might cost many times what the machine itself cost.  If the data is so critical that it could not be offline for even a few hours, or if its loss could result in very high liability,  that is an indication that it should not be on this kind of machine.  That said, our goals include specifying a system that will survive single disk and power supply failures transparently, and for which the data will survive the crash of the entire machine. While we are not now considering the solution of complete failover, we may be considering that later.

Such a Brick could be used in 3 main configurations:
  1. Simplest: used as is to provide very fast storage to the controller nodes.  This configuration provides bandwidth speeds of upwards of 100MB/s
  2. Slightly more complicated, not much more expensive: used as a Network Attached Storage (NAS) device to provide slower bandwidth  storage to network clients.  On Gb networks, this provides speeds up to 20-70MB/s, depending on the protocol used and how heavily requests are being made of it.
  3. Much more complicated, possibly much more expensive: as a basic unit of a Storage Area Network (SAN), essentially a bunch of NASs linked thru fast interconnects (optical FiberChannel or 10Gb ethernet) and overseen via a supervisor application that allows all the supervised storage to be sliced and diced in a number of ways.  The dataflow to this kind of storage is also limited to the speed of the network.  On Gb networks, this provides speeds up to 20-70MB/s, as above.
We could either specify the exact build specification or order a closest-fit.  Such a brick would use Linux as the OS and provide the disk space formatted as any modern filesystem, and supporting any storage protocol.  We anticipate that such a device will provide storage at approximately 1/5 to 1/10 the cost of current high-end storage like Sun, EMC, StorageTek, and NetworkAppliance.  As noted, this device is not meant to replace such devices in critical areas, but could provide storage that is close to the reliability of such devices at lower cost.

Risk

Part of the problem depends on what level of risk the individual organizations are willing to tolerate.  I'll repeat the above warning: If the data is so critical that it could not be offline for even a few hours, or if its loss could result in very high liability,  that is an indication that it should not be on this kind of machine.

Here are some scenarios.

Q: Can it suffer a disk failure without losing data? 
A: Yes. RAID5 + hotspare allow the RAID to keep working thru a disk failure, recruiting the hotspare to rebuild the RAID, although at a loss of some performance.

Q: Can it suffer 2 simultaneous disk failures? 
A: Yes, in different RAID5s, but not in the same RAID5. However, it could survive up to 2 simultaneous disk failures in each of the arrays as RAID6 (tho with the decrease in 1 more disk's capacity).

Q: Can it suffer a Power Supply failure and maintain data ?
A: Yes. It has redundant PSs.

Q: Can it suffer 2 simultaneous PS failures and maintain data?
A: Yes. Data integrity would be maintained via battery-backed array controllers but the array would be offline until the PS was replaced.

Q: Could it suffer a PC or controller system failure? 
A: Yes, but see answer immediately above.

Q: Can it be set up for complete failover? (completely separate replicate machines set to mirror each other). 
A: Tentatively yes.  (The failover mechanism is easy to set up, but the mirroring is tricky.  Depending on the services that needed to be replicated and the speed at which they needed to synchronized, there are some uilities that could do this (rsync, Unison), but all such utilities have edge cases where they could fail.  If your systems need to be this robust, this solution is probably inappropriate for you at this point.

Hardware

We are using the following as the basis for such as server:
The cost for such a container is ~ $6500 from a 2nd tier online reseller or VAR such as Serversdirect, ThinkCP, Penguin Computing, Western Scientific, Monarch Computers, etc.

Up  to 15 usable TB can be configured in one of these units using 24x750GB disks configured in 2 independent RAIDs.  For example: 2x11  disks in RAID5 + 1 hotspare).  This many disks costs an additional ~$12K at current (06.06) consumer prices.  This would still bring in the entire 15 TB system in at ~$20K with tax and shipping.  Using more economical 500GB disks ($300/per), you can get  10TB of usable space from such a device for and additional $7200, for a total of less than  $15K.  These prices are roughly 1/6 to 1/20 the cost of comparably sized devices from 1st tier vendors such as Network Appliance, Sun, HP, StorageTek, and EMC.  However..!

Protocols

From our conversations with faculty and computer support personnel, we want the software to  suppport the following protocols:

Secondary protocols that could be added relatively easily are:

Software Required

The hardware is the easy part.  Software is always harder.  The cost is easiest - it's all free (altho if you want to pay someone else to support the free software, that is also an option).  While choice of operating system can be divisive, my personal choice is Linux for reliability, security, performance, and breadth of applications available.  The BSD-derived OSs are also viable, but I don't have nearly as much experience with them.  I recommend using the Ubuntu base distribution for fast and easy setup and admin.  Others may have a preference for Red Hat Enterprise Linux (RHEL).  In my experience, the Debian-based distros stay closer to the mainstream kernel development, but RHEL is internally consistent over the long term.

The  web-based administration tool webmin (now part of the Open Management Consortium) can be used for many administrative functions, altho it is not available via Ubuntu's apt-get system.  The 3ware RAID system can also be managed via a web interface, altho it runs separately  from the Webmin tools.  As an aside, I've used the Areca controllers and utilities as well and while they have some advantages, they're not yet in the mainline kernel (altho they seem to be supported in the latest Ubuntu distribution (Dapper) and their supporting utilities are not up to the level of the 3ware utilities.

In terms of types of filesystems and protocols, Linux supports more than any  other OS.  It can export SMB/CIFS shares using samba as well as, or better than native Windows  servers.  Similarly, it can make storage available as NFS, AppleshareIP,  DAVfs, subversion & CVS version control systems, and even Andrew FS if  required.  Extensive volume management can be done in a number of ways, but  IBM's open sourced EVMS seems to cover most of them , supposedly even including snapshots.

Authentication to most services can be done by Kerberos (and so it can use UCI's kerberos-based UCINETIDs), local login, LDAP, or combinations thereof, although mixing & matching will involve more complexity.

In terms of backups, I've only considered a single type so far - automated  disk-based backups that would span up to a few months.  The system is called  BackupPC and after about a month of irregular  testing, reading, and posting to the fairly active list, it seems to have  most of what a good backup system should have.  I'll expand on this in  another posting in a bit; this kind of system requires a lot of detail.

Some of the requirements we've heard simply may not map well onto such a system, but many of them are supported.  The system obviously can be maintained remotely so it can be supported centrally or locally as desired. One crucial point that came out of the cross-school discussions was that local administrators wanted to have oversight on the system to enable local users, change permissions, change filesystem quotas, etc.  Certainly this can be done, but it will require an understanding of shared responsibilities if NACS is involved with the system at all.