NACS >
Category > UCI Storage Brick Project
UCI Storage Brick Project
Summary: The aim of the Storage
Brick Project is to provide hardware and software specifications for a
basis building block of pretty cheap, pretty reliable, pretty
configurable storage that could be used as is, or used in multiples as
part of a Storage Area Network.
Details of Storage Brick Hardware and Software
by Harry Mangalam (
Please
email me with comments, corrections, supplements, data pro
and con.) More discussion on this topic can be seen on the
UC Computer Support List for June, 2006
From discussions with Computer Support personnel in other Schools,
storage is more important than CPU cycles. Not only is
increasing storage required, but the kind of storage required is moving
towards longer term, more robust storage. Our Storage Brick
project has as its goal to define a basic building
block of storage for UCI, ~10-15 TB per brick, which can be used as is,
or exported for use via a number of protocols, including SMB/CIFS,
AppletalkIP, and NFS.
Storage Applications
This storage is to be used in applications which are
NOT 'mission critical'.
Such applications would be anything related to finance (Payroll,
benefits), medical records and human subject data, and anything in
which the loss of such data might cost many times what the machine
itself cost.
If the data is so critical that it could
not be offline for even a few hours, or if its loss could result in very high liability, that is an indication
that it should not be on this kind of machine. That said, our
goals include specifying a system that will survive single disk and
power supply failures transparently, and for which the data will
survive the crash of the entire machine. While we are not now
considering the solution of complete failover, we may be considering
that later.
Such a Brick could be used in 3 main configurations:
- Simplest:
used as is to provide very fast storage to the controller
nodes. This configuration provides bandwidth speeds of
upwards of 100MB/s
- Slightly
more complicated, not much more expensive: used as a
Network Attached Storage (NAS) device to provide slower
bandwidth storage to network clients. On Gb
networks, this provides speeds up to 20-70MB/s, depending on
the protocol used and how heavily requests are being made of
it.
- Much more
complicated, possibly much more expensive: as a basic unit
of a Storage Area Network (SAN), essentially a bunch of NASs linked
thru fast interconnects (optical FiberChannel or 10Gb ethernet) and
overseen via a supervisor application that allows all the supervised
storage to be sliced and diced in a number of ways. The
dataflow to this kind of storage is also limited to the speed of the
network. On Gb networks, this provides speeds up to
20-70MB/s, as above.
We could either specify the exact build specification or order a
closest-fit. Such a brick would use Linux as the OS and
provide the disk space formatted as any modern filesystem, and
supporting any storage protocol. We anticipate that such a
device will provide storage at approximately 1/5 to 1/10 the cost of
current high-end storage like Sun, EMC, StorageTek, and
NetworkAppliance. As noted, this device is not meant to
replace such devices in critical areas, but could provide storage that
is close to the reliability of such devices at lower cost.
Risk
Part of the problem depends on what level of risk the individual
organizations are willing to tolerate. I'll repeat the above warning:
If the data is so critical that it could
not be offline for even a few hours, or if its loss could result in very high liability, that is an indication
that it should not be on this kind of machine.
Here are some
scenarios.
Q: Can it suffer a disk failure without losing data?
A: Yes. RAID5 + hotspare allow the RAID to keep working thru a disk
failure, recruiting the hotspare to rebuild the RAID, although at a
loss of some performance.
Q: Can it suffer 2 simultaneous disk failures?
A: Yes, in
different
RAID5s, but
not
in the
same
RAID5. However, it could survive up to 2 simultaneous disk failures in
each of the arrays as
RAID6
(tho with the decrease in 1 more disk's capacity).
Q: Can it suffer a Power Supply failure and maintain data ?
A: Yes. It has redundant PSs.
Q: Can it suffer 2 simultaneous PS failures and maintain data?
A: Yes. Data integrity would be maintained via battery-backed array
controllers but the array would be offline until the PS was replaced.
Q: Could it suffer a PC or controller system failure?
A: Yes, but see answer immediately above.
Q: Can it be set up for complete failover? (completely separate
replicate machines set to mirror each other).
A: Tentatively yes. (The failover mechanism is easy to set
up, but the mirroring is tricky. Depending on the services
that needed to be replicated and the speed at which they needed to
synchronized, there are some uilities that could do this (
rsync, Unison),
but all such utilities have edge cases where they could fail.
If your systems need to be this robust, this solution is probably
inappropriate for you at this point.
Hardware
We are using the following as the basis for such as server:
- 5U rackmount container
- 2xOpterons or Athlon64, 4GB RAM,
- 3yrs onsite warranty,
- redundant Power Supplies (PS),
- 2200VA Uninterruptible (external) Power Supply
(UPS),
- mirrored IDE system disks,
- 2x12port 3ware SATA RAID controllers,
- 24 hotswap trays (but with no disks).
The cost for such a container is ~ $6500 from a 2nd tier online
reseller or VAR such as Serversdirect, ThinkCP, Penguin Computing,
Western Scientific, Monarch Computers, etc.
Up to 15 usable TB can be configured in one of these
units using 24x750GB disks configured in 2 independent RAIDs. For
example: 2x11 disks in RAID5 + 1 hotspare). This
many disks costs an additional ~$12K at current (06.06) consumer
prices. This would still bring in the entire 15 TB system in
at ~$20K with tax and shipping. Using more
economical 500GB disks ($300/per), you can get 10TB of usable
space from such a device for and additional $7200, for a total of less than
$15K. These prices are roughly 1/6 to 1/20 the cost of
comparably sized devices from 1st tier vendors such as Network
Appliance, Sun, HP, StorageTek, and EMC. However..!
Protocols
From our conversations with faculty and computer support personnel, we
want the software to suppport the following protocols:
- Windows SMB/CIFS aka Windows shares
- Network File System exports
- Webdav/davfs
- Apple File Server / Apple Filing Protocol (AFS/AFP)
Secondary protocols that could be added relatively easily are:
- Revision Control Systems such as CVS or subversion
- Web servers
- Integrated Wikis
Software Required
The hardware is the easy part. Software is always
harder. The cost is easiest - it's all free (altho if you
want to pay someone else to support the free software, that is also an
option). While choice of operating system can be divisive, my
personal choice is Linux for reliability, security, performance, and
breadth of applications available. The BSD-derived OSs are
also viable, but I don't have nearly as much experience with
them. I recommend using the
Ubuntu base
distribution for fast and easy setup and admin. Others may
have a preference for
Red
Hat Enterprise Linux (RHEL). In my experience, the
Debian-based distros stay closer to the mainstream kernel development,
but RHEL is internally consistent over the long term.
The web-based administration tool
webmin (now part of
the
Open
Management Consortium) can be used for many administrative
functions, altho it is not available via Ubuntu's apt-get
system. The 3ware RAID system can also be managed via a web
interface, altho it runs separately from the Webmin
tools. As an aside, I've used the Areca controllers and
utilities as well and while they have some advantages, they're not yet
in the mainline kernel (altho they seem to be supported in the latest
Ubuntu distribution (Dapper) and their supporting utilities are not up
to the level of the 3ware utilities.
In terms of types of filesystems and protocols, Linux supports more
than any other OS. It can export SMB/CIFS shares
using
samba
as well as, or better than native Windows servers.
Similarly, it can make storage available as NFS,
AppleshareIP, DAVfs, subversion & CVS version control
systems, and even Andrew FS if required. Extensive
volume management can be done in a number of ways, but
IBM's open sourced EVMS
seems to cover most of them , supposedly even including
snapshots.
Authentication to most services can be done by
Kerberos (and
so it can use UCI's kerberos-based UCINETIDs), local login,
LDAP, or
combinations thereof, although mixing & matching will involve
more complexity.
In terms of backups, I've only considered a single type so far -
automated disk-based backups that would span up to a few
months. The system is called
BackupPC and after
about a month of irregular testing, reading, and posting to
the fairly active list, it seems to have most of what a good
backup system should have. I'll expand on this in
another posting in a bit; this kind of system requires a lot of detail.
Some of the requirements we've heard simply may not map well onto such
a system, but many of them are supported. The system
obviously can be
maintained remotely so it can be supported centrally or locally as
desired. One crucial point that came out of the cross-school
discussions was that local administrators wanted to have oversight on
the system to enable local users, change permissions, change filesystem
quotas, etc. Certainly this can be done, but it will require
an
understanding of shared responsibilities if NACS is involved with the
system at all.