|
March 2000
Store More, Pay Less with Tape Archive Solutions
By Lowell Rapaport
The number-one tape library application may be backup, but this storage choice is also popular
for archive applications like document management, batch processing and data warehousing. Banks,
insurance companies and other organizations that regularly download large blocks of data often
discover that tape archive solutions can be a better choice.
Almost any kind of data that is kept for long periods of time can be kept in a tape archive. Common
examples are images of cancelled checks or medical records. If you have data that can be kept
off-line, tape is an excellent medium. Kept in storage, a tape can last for 10 years or longer. There
is a greater danger of a tape becoming obsolete than of it going bad. Most companies will move their
data to more modern media long before a tape loses its data.
Since off-line archives don't need to be kept in a large library, you can use a smaller library
for just the most current files. With modern, high-capacity tape drives, even a small library can have
a huge storage capacity. As an example, Overland Data (www.overlanddata.com) makes the Enterprise
Xpress, a system with two or four drives and 26 or 52 tape cartridge slots. In the base configuration,
the Enterprise Xpress is a high-end backup system. With add-on modules, it can be built into a library
with up to ten tape drives and 100 slots. Using the current DLT 8000 drives, this gives you a maximum
uncompressed capacity of 4 terabytes (TB).
The new SureStor DLT Tape Libraries from Hewlett Packard (www.hp.com/go/automated.com) also use a
modular design. The base unit is a two-drive, 20 slot library. Up to two identically sized modules can
be added to reach a maximum of six drives and 60 slots (2.4 terabytes with DLT 8000 drives). If 60
slots isn't enough, Hewlett Packard also makes the SureStore E libraries, which can house nearly
700 tapes (for a whopping 28 terabytes). Another small library of note is the Scaler 100 from ADIC
(www.adic.com). The Scaler 100 is a six-drive, 60-slot library. What sets it apart from the comparably
sized SureStor 6/60 is a thin server that lets you connect it directly to a network. Network
connections are generally too slow for backup (100 Base T connections max out at 12.5 Mbytes/sec. and
are usually much slower). This makes it pretty certain that a network attached Scaler 100 will be used
as an archive. At $26,000 for a unit with 2.4 terabyte capacity (with uncompressed DLT-8000 tapes),
the Scaler 100 is an inexpensive way for a small office or workgroup to gain a lot of near-line
storage.
Larger libraries can take on multiple roles. After reserving enough tape cartridges for a month's
worth of daily backups, the rest of the library can be used for a near-line archive. This saves money
by making more efficient use of the library. A backup-only solution may only be used for a few hours
at night. A library used for both backup and archiving makes the investment in hardware and software
work 24 hours a day.
The P3000 library from ATL (www.atl.com) takes a page from RAID and other high-availability devices.
It comes with hot-swappable fans, power supplies and DLT drives, making it easy to maintain. According
to Frank Berry, vice president of marketing at ATL, data warehousing is the driving force behind the
use of tape libraries for near-line archival storage.
"Companies use a large library like the P3000 for their data warehouse and they [use RAID for]
smaller data marts' at the department level," Berry explains. Since a department only
needs to keep information specific to its mission, it can use a RAID system locally. The data
warehouse is then updated periodically from the department's computers.
So far, all the libraries described here use DLT tapes. DLT tape drives are favored by library
manufacturers and users because the cartridges have a large uncompressed capacity, are reliable and
are fast when streaming data. But DLT has limitations when used in archival applications. In
particular, DLT tapes have a long average file access time of about a minute, which isn't nearly
as fast as some rival tape formats.
Two 8 millimeter formats, Sony's AIT and Exabyte's Mammoth, can store more data than DLT
while providing faster access speed (15 seconds and 30 seconds, respectively). AIT has the added
advantage of its memory-in-cartridge technology. The memory chip mounted in the cartridge stores the
tape's identity and directory information. When supported by software, this lets an AIT tape
mount without the drive having to read the tape.
Both Mammoth-2 and AIT-2 tape are two-reel cartridges that don't have to be rewound. When the
tapes are mounted, archived files are less than half the length of the tape away from the read head.
This is called midpoint loading, and it speeds performance in an archive applications.
The only drawback to these formats, when considering archival applications, is that the only large
libraries available are those from Exabyte and ADIC's Scaler 1000. This may change, according to
Ray Heineman, vice president of technical services with Breece Hill (www.breecehill.com). Heineman
says his company plans to adapt its existing DLT libraries to AIT and Mammoth tapes and drives later
this year.
"We actually developed the library robots a couple of years ago," says Heineman, "but
only now are users becoming interested in 8 millimeter tape libraries."
The Breece Hill DLT libraries support as many as 12 drives and 420 tape cartridges. With smaller 8
millimeter tapes, the same libraries could contain half again as many tapes.
The largest libraries of all are made by ADIC and StorageTek. ADIC's big libraries are the AML/J,
AML/E and AML/2. These models differ in combining mobile tape pickers, mobile tape racks or both. They
can also mix media, so if you have a large archive and want to move from one tape technology to
another, these libraries allow you to keep the old tapes near line.
For archiving applications, ADIC offers their "Infinite File Life" technology. This software
takes advantage of some tape drives' ability to analyze blank tapes for accumulated errors. When
the error rate reaches a predetermined threshold, the system automatically backs up the files onto
fresh media and the old tape is discarded.
Like Exabyte, StorageTek (www.storagetek.com) makes both tape drives and libraries. StorageTek has
carved a niche in the high-end enterprise and mainframe market. If you choose StorageTek's
high-end 9840 tape drive, which boasts an ultra-fast eight-second average time to file, you can
populate their largest library, the Powderhorn, with up to 6,000 slots. You can then combine library
storage modules to create enormous libraries. The company boasts that archives of more 1 billion
terabytes can be built (that's 1 million petabytes). If you have more modest needs, the company
makes large DLT libraries as well.
StorageTek makes an add-on for their libraries called the Virtual Storage Manager ($390,000). The VSM
is a massive disk cache that works with any StorageTek library. It compresses and caches data and acts
as a buffer between your server and the tape library. The VSM's cache can be as high as 930 GB.
It performs in hardware and firmware the caching functions most users are used to doing in software on
a server. In high-end systems, the VSM absorbs processor intensive functions and speeds
performance.
Tape vs. MO, CD & DVD: Comparisons & Strategies
In the past, imaging and document management archives have most commonly resided in magneto-optical
jukeboxes. MO is effective for archival use because it is fast, allowing direct access to the entire
archive within seconds. Optical media don't suffer physical wear as tape does, and they have a
longer shelf life. MO technology also offers WORM (write once, read many) media alternatives that
cannot be erased.
If you retain archives that are accessed in a random and unpredictable way across the entire archive,
MO jukeboxes or the latest DVD alternatives will give you the speedy access you'll require. But
the fact is, many archives aren't used that way. Generally, only a small portion of the archive
is used at any one time. If you are archiving check images, for example, only the last few months
worth of images might be accessed regularly. The rest, which can be several years worth, will sit
unused most of the time. And yet, you still have to have them near-line. On those occasions when an
older check image is required, users expect to be able to retrieve it fairly quickly.
For archives such as this, optical media like MO may not be required. Gigabyte for gigabyte, MO is
eight times as expensive as tape. A DLT tape costs about $100 for an uncompressed capacity of 40
gigabytes. This translates to $2.50 per gigabyte. In comparison, the cost for MO is about $19 per GB
and DVD-R costs about $10 per GB (for double sided media). CDs are inexpensive, but when your
archive starts approaching the terabyte range, CD becomes logistically impractical. CDs hold just 650
megabytes each. It would take more than 60 CDs to store as much as a single DLT tape. If you are
going to use tape for a near-line archive, you need a strategy for serving files quickly. Tape is a
fast medium for streaming data, but it is less well suited to locating and serving individual files
within an archive. One caching method is suggested by Lee Payne, senior product marketing manager at
Overland Data. "You can store small chunks of each file to cache while the entire file still
resides on tape," Payne explains. "When a file is requested by a user, the server starts
getting the data from the cache while the tape is mounted and the file located." This technique
is best applied to large files that would be too big to cache on hard disk. Another approach is to
cache the contents of entire tapes to hard disk. For this, you can use a striped RAID 0 array for
extra speed and to make many small magnetic disks into a single cache volume. This works best when
archive queries are made to just a few tapes.
A third method is to use a tape drive technology with the fastest time-to-data specification
available, such as the 9840 drive from StorageTek. The 9840 can shuttle and locate files anywhere
along a tape's length, and it has an average data access time of just eight seconds. The downside
is that the tapes store just 20 GB uncompressed, and the drives are expensive at $34,400. But if you
need to store a very large archive - and we're talking Pentagon-like proportions here
libraries using 9840 drives can reach 120 terabytes.
Most library vendors agree that tape is best used for either less frequently accessed data or data
that can be downloaded in chunks large enough to take advantage of tape's streaming speed.
"If you can use tape to read and write large sequential blocks of data, you get a speed advantage
over other storage media," says Payne. "MO in particular has poor write latency [because it
requires two passes to save a file]. If you're saving and downloading large amounts of data, tape
can give you a definite performance boost."
How big does your archive have to be for tape to be practical? Bryce Hein, product marketing manager
at ADIC says tape libraries are practical for archives larger than 300-400 gigabytes.
"Don't get a library that exactly fits your archive," Hein says. "You need room
for expansion. If you have an archive that is already 300 gigabytes, your library should have at least
a terabyte capacity."
Michael Fulps, industry marketing manager at StorageTek, contends the practicality threshold is
higher, about one terabyte. "A terabyte sounds like a lot, but is not hard to reach," he
explains. "When you collect all the decentralized storage in an enterprise and put it in one
place, it's not uncommon to have a terabyte of data."
Tape Formats and Their Tradeoffs
There are at least half a dozen different tape cartridges that can be used for archiving. Some,
such as the high-end StorageTek 9840 tape drives (10 Mbytes/sec), are used in high-performance
mainframe solutions. More familiar to Imaging & Document Solutions readers will be DLT, AIT,
Mammoth-2 and SLR.
DLT is currently the most popular high-end tape drive technology. DLT tape drives are manufactured by
Quantum (www.quantum.com) and Tandberg (www.tandberg.com). The transfer rate of a DLT 8000 drive is 5
Mbytes/sec, and the uncompressed capacity (all tape storage capacities given here do not include
compression) is 40 GB for a DLT IV tape recorded on a DLT 8000 drive.
AIT-2 is the second generation of Sony's 8-mm tape format. A cartridge holds 50 GB of data and
has a transfer rate of 6 Mbytes/sec. Exabyte's rival 8-mm tape technology, Mammoth-2, stores 60
gigabytes on a tape and transfer speed of 12 Mbytes/sec.
In addition to DLT, Tandberg makes SLR tape drives. SLR drives use quarter-inch tape (QIC). The latest
version, the SLR 100, stores 50 gigabytes per tape and transfers data at 5 Mbytes/sec.
On the horizon are two potential replacements for DLT tape. Ultrium, a single-reel linear tape based
on the Linear Tape Open (LTO) standard developed by IBM, Seagate and Hewlett Packard, is expected to
store 100 gigabytes per tape and have a transfer rate of 10 Mbytes/sec. Unfortunately, Ultrium will
not be backward compatible with existing DLT tape.
Quantum's Super DLT (SDLT) is expected to have full backward compatibility with existing DLT
tapes. SDLT tapes are expected to store more than 100 gigabytes with a transfer rate of more than 10
Mbytes/sec. However, neither Ultrium nor SDLT are available yet. Developers of both products expect to
start shipping drives in mid-2000.
There are a number of alternatives to these tape technologies. Low-end Travan and DDS tape drives are
primarily designed for desktop and workstation backup. Benchmark (www.benchtape.com) makes a DLT1
tape drive ($1,500) that uses some of the same technology found in DLT 8000 tape drives, including the
same DLT IV tapes, but with a transfer rate of 3 Mbytes/sec.
Lowell Rapaport
Software Eases Tape Management
To use a tape library as an archive, you need some means to make the library appear as a
logical volume on your computer. Software on your tape library server takes care of file access, media
management and hard disk caching needed to make it practical to interact with a tape library.
Jukebox management software can help here. According to Jerry Held, vice president of the storage
management division at OTG Software (www.otg.com), the company's Disk Xtender software can manage
tape archives as large as 30 terabytes. An example of a typical application is cancelled check
archiving.
"We have customers who switched from keeping cancelled checks to keeping digitized images in a
tape archive," says Held. "It would take as long as a week to find a paper check, but by
keeping check images on a tape archive, the time it takes to bring one up is reduced to just a few
minutes." When tape archives follow the 90/10 rule, that is 10 percent of the archive is used
90 percent of the time, tape archive software can cache the most frequently used parts of the library
on hard disk or a striped RAID array. Sutmyn Storage (www.sutmyn.com) makes Scimitar Virtual Tape
Server, a tape archive system where entire tape volumes can be cached on magnetic disk storage. The
caching is transparent to application software except for the hard disk speeds.
An opposite tactic is taken by FileTek (www.filetek.com). Their StorHouse software doesn't cache
tapes to hard disk. Instead, it uses intelligent queue management to group file requests together
first by tape to prevent thrashing, and then by position on the tape. By grouping multiple file
requests made to a tape this way, file requests can be read sequentially taking advantage of
tape's high streaming speed and reducing shuttling. Shuttling is where the tape drive repositions
a tape while locating individual files. On a small scale, it is similar to thrashing media in a
library.
StorHouse is not solely a tape archiving solution. It is a complete HSM system that migrages the
least-used files to tape. By definition, this means that all the files in the tape archive are likely
to be the ones best served by serially streaming media.
Lowell Rapaport
|