Intelligent Enterprise featuring Transform
START NEWS & ANALYSIS OPINION CHANNELS PRODUCT GUIDES REVIEWS TECHWEBCASTS
CONTACTS ARCHIVES ADVANCED SEARCH

March 2000

Store More, Pay Less with Tape Archive Solutions

By Lowell Rapaport

The number-one tape library application may be backup, but this storage choice is also popular for archive applications like document management, batch processing and data warehousing. Banks, insurance companies and other organizations that regularly download large blocks of data often discover that tape archive solutions can be a better choice.

Almost any kind of data that is kept for long periods of time can be kept in a tape archive. Common examples are images of cancelled checks or medical records. If you have data that can be kept off-line, tape is an excellent medium. Kept in storage, a tape can last for 10 years or longer. There is a greater danger of a tape becoming obsolete than of it going bad. Most companies will move their data to more modern media long before a tape loses its data.

Since off-line archives don't need to be kept in a large library, you can use a smaller library for just the most current files. With modern, high-capacity tape drives, even a small library can have a huge storage capacity. As an example, Overland Data (www.overlanddata.com) makes the Enterprise Xpress, a system with two or four drives and 26 or 52 tape cartridge slots. In the base configuration, the Enterprise Xpress is a high-end backup system. With add-on modules, it can be built into a library with up to ten tape drives and 100 slots. Using the current DLT 8000 drives, this gives you a maximum uncompressed capacity of 4 terabytes (TB).

The new SureStor DLT Tape Libraries from Hewlett Packard (www.hp.com/go/automated.com) also use a modular design. The base unit is a two-drive, 20 slot library. Up to two identically sized modules can be added to reach a maximum of six drives and 60 slots (2.4 terabytes with DLT 8000 drives). If 60 slots isn't enough, Hewlett Packard also makes the SureStore E libraries, which can house nearly 700 tapes (for a whopping 28 terabytes).

Another small library of note is the Scaler 100 from ADIC (www.adic.com). The Scaler 100 is a six-drive, 60-slot library. What sets it apart from the comparably sized SureStor 6/60 is a thin server that lets you connect it directly to a network. Network connections are generally too slow for backup (100 Base T connections max out at 12.5 Mbytes/sec. and are usually much slower). This makes it pretty certain that a network attached Scaler 100 will be used as an archive. At $26,000 for a unit with 2.4 terabyte capacity (with uncompressed DLT-8000 tapes), the Scaler 100 is an inexpensive way for a small office or workgroup to gain a lot of near-line storage.

Larger libraries can take on multiple roles. After reserving enough tape cartridges for a month's worth of daily backups, the rest of the library can be used for a near-line archive. This saves money by making more efficient use of the library. A backup-only solution may only be used for a few hours at night. A library used for both backup and archiving makes the investment in hardware and software work 24 hours a day.

The P3000 library from ATL (www.atl.com) takes a page from RAID and other high-availability devices. It comes with hot-swappable fans, power supplies and DLT drives, making it easy to maintain. According to Frank Berry, vice president of marketing at ATL, data warehousing is the driving force behind the use of tape libraries for near-line archival storage.

"Companies use a large library like the P3000 for their data warehouse and they [use RAID for] smaller ‘data marts' at the department level," Berry explains. Since a department only needs to keep information specific to its mission, it can use a RAID system locally. The data warehouse is then updated periodically from the department's computers.

So far, all the libraries described here use DLT tapes. DLT tape drives are favored by library manufacturers and users because the cartridges have a large uncompressed capacity, are reliable and are fast when streaming data. But DLT has limitations when used in archival applications. In particular, DLT tapes have a long average file access time of about a minute, which isn't nearly as fast as some rival tape formats.

Two 8 millimeter formats, Sony's AIT and Exabyte's Mammoth, can store more data than DLT while providing faster access speed (15 seconds and 30 seconds, respectively). AIT has the added advantage of its memory-in-cartridge technology. The memory chip mounted in the cartridge stores the tape's identity and directory information. When supported by software, this lets an AIT tape mount without the drive having to read the tape.

Both Mammoth-2 and AIT-2 tape are two-reel cartridges that don't have to be rewound. When the tapes are mounted, archived files are less than half the length of the tape away from the read head. This is called midpoint loading, and it speeds performance in an archive applications.

The only drawback to these formats, when considering archival applications, is that the only large libraries available are those from Exabyte and ADIC's Scaler 1000. This may change, according to Ray Heineman, vice president of technical services with Breece Hill (www.breecehill.com). Heineman says his company plans to adapt its existing DLT libraries to AIT and Mammoth tapes and drives later this year.

"We actually developed the library robots a couple of years ago," says Heineman, "but only now are users becoming interested in 8 millimeter tape libraries."

The Breece Hill DLT libraries support as many as 12 drives and 420 tape cartridges. With smaller 8 millimeter tapes, the same libraries could contain half again as many tapes.

The largest libraries of all are made by ADIC and StorageTek. ADIC's big libraries are the AML/J, AML/E and AML/2. These models differ in combining mobile tape pickers, mobile tape racks or both. They can also mix media, so if you have a large archive and want to move from one tape technology to another, these libraries allow you to keep the old tapes near line.

For archiving applications, ADIC offers their "Infinite File Life" technology. This software takes advantage of some tape drives' ability to analyze blank tapes for accumulated errors. When the error rate reaches a predetermined threshold, the system automatically backs up the files onto fresh media and the old tape is discarded.

Like Exabyte, StorageTek (www.storagetek.com) makes both tape drives and libraries. StorageTek has carved a niche in the high-end enterprise and mainframe market. If you choose StorageTek's high-end 9840 tape drive, which boasts an ultra-fast eight-second average time to file, you can populate their largest library, the Powderhorn, with up to 6,000 slots. You can then combine library storage modules to create enormous libraries. The company boasts that archives of more 1 billion terabytes can be built (that's 1 million petabytes). If you have more modest needs, the company makes large DLT libraries as well.

StorageTek makes an add-on for their libraries called the Virtual Storage Manager ($390,000). The VSM is a massive disk cache that works with any StorageTek library. It compresses and caches data and acts as a buffer between your server and the tape library. The VSM's cache can be as high as 930 GB. It performs in hardware and firmware the caching functions most users are used to doing in software on a server. In high-end systems, the VSM absorbs processor intensive functions and speeds performance.

Tape vs. MO, CD & DVD: Comparisons & Strategies

In the past, imaging and document management archives have most commonly resided in magneto-optical jukeboxes. MO is effective for archival use because it is fast, allowing direct access to the entire archive within seconds. Optical media don't suffer physical wear as tape does, and they have a longer shelf life. MO technology also offers WORM (write once, read many) media alternatives that cannot be erased.

If you retain archives that are accessed in a random and unpredictable way across the entire archive, MO jukeboxes or the latest DVD alternatives will give you the speedy access you'll require. But the fact is, many archives aren't used that way. Generally, only a small portion of the archive is used at any one time.

If you are archiving check images, for example, only the last few months worth of images might be accessed regularly. The rest, which can be several years worth, will sit unused most of the time. And yet, you still have to have them near-line. On those occasions when an older check image is required, users expect to be able to retrieve it fairly quickly.

For archives such as this, optical media like MO may not be required. Gigabyte for gigabyte, MO is eight times as expensive as tape. A DLT tape costs about $100 for an uncompressed capacity of 40 gigabytes. This translates to $2.50 per gigabyte. In comparison, the cost for MO is about $19 per GB and DVD-R costs about $10 per GB (for double sided media).

CDs are inexpensive, but when your archive starts approaching the terabyte range, CD becomes logistically impractical. CDs hold just 650 megabytes each. It would take more than 60 CDs to store as much as a single DLT tape.

If you are going to use tape for a near-line archive, you need a strategy for serving files quickly. Tape is a fast medium for streaming data, but it is less well suited to locating and serving individual files within an archive. One caching method is suggested by Lee Payne, senior product marketing manager at Overland Data.

"You can store small chunks of each file to cache while the entire file still resides on tape," Payne explains. "When a file is requested by a user, the server starts getting the data from the cache while the tape is mounted and the file located." This technique is best applied to large files that would be too big to cache on hard disk.

Another approach is to cache the contents of entire tapes to hard disk. For this, you can use a striped RAID 0 array for extra speed and to make many small magnetic disks into a single cache volume. This works best when archive queries are made to just a few tapes.

A third method is to use a tape drive technology with the fastest time-to-data specification available, such as the 9840 drive from StorageTek. The 9840 can shuttle and locate files anywhere along a tape's length, and it has an average data access time of just eight seconds. The downside is that the tapes store just 20 GB uncompressed, and the drives are expensive at $34,400. But if you need to store a very large archive - and we're talking Pentagon-like proportions here — libraries using 9840 drives can reach 120 terabytes.

Most library vendors agree that tape is best used for either less frequently accessed data or data that can be downloaded in chunks large enough to take advantage of tape's streaming speed.

"If you can use tape to read and write large sequential blocks of data, you get a speed advantage over other storage media," says Payne. "MO in particular has poor write latency [because it requires two passes to save a file]. If you're saving and downloading large amounts of data, tape can give you a definite performance boost."

How big does your archive have to be for tape to be practical? Bryce Hein, product marketing manager at ADIC says tape libraries are practical for archives larger than 300-400 gigabytes.

"Don't get a library that exactly fits your archive," Hein says. "You need room for expansion. If you have an archive that is already 300 gigabytes, your library should have at least a terabyte capacity."

Michael Fulps, industry marketing manager at StorageTek, contends the practicality threshold is higher, about one terabyte. "A terabyte sounds like a lot, but is not hard to reach," he explains. "When you collect all the decentralized storage in an enterprise and put it in one place, it's not uncommon to have a terabyte of data."


Tape Formats and Their Tradeoffs

There are at least half a dozen different tape cartridges that can be used for archiving. Some, such as the high-end StorageTek 9840 tape drives (10 Mbytes/sec), are used in high-performance mainframe solutions. More familiar to Imaging & Document Solutions readers will be DLT, AIT, Mammoth-2 and SLR.

DLT is currently the most popular high-end tape drive technology. DLT tape drives are manufactured by Quantum (www.quantum.com) and Tandberg (www.tandberg.com). The transfer rate of a DLT 8000 drive is 5 Mbytes/sec, and the uncompressed capacity (all tape storage capacities given here do not include compression) is 40 GB for a DLT IV tape recorded on a DLT 8000 drive.

AIT-2 is the second generation of Sony's 8-mm tape format. A cartridge holds 50 GB of data and has a transfer rate of 6 Mbytes/sec. Exabyte's rival 8-mm tape technology, Mammoth-2, stores 60 gigabytes on a tape and transfer speed of 12 Mbytes/sec.

In addition to DLT, Tandberg makes SLR tape drives. SLR drives use quarter-inch tape (QIC). The latest version, the SLR 100, stores 50 gigabytes per tape and transfers data at 5 Mbytes/sec.

On the horizon are two potential replacements for DLT tape. Ultrium, a single-reel linear tape based on the Linear Tape Open (LTO) standard developed by IBM, Seagate and Hewlett Packard, is expected to store 100 gigabytes per tape and have a transfer rate of 10 Mbytes/sec. Unfortunately, Ultrium will not be backward compatible with existing DLT tape.

Quantum's Super DLT (SDLT) is expected to have full backward compatibility with existing DLT tapes. SDLT tapes are expected to store more than 100 gigabytes with a transfer rate of more than 10 Mbytes/sec. However, neither Ultrium nor SDLT are available yet. Developers of both products expect to start shipping drives in mid-2000.

There are a number of alternatives to these tape technologies. Low-end Travan and DDS tape drives are primarily designed for desktop and workstation backup. Benchmark (www.benchtape.com) makes a DLT1 tape drive ($1,500) that uses some of the same technology found in DLT 8000 tape drives, including the same DLT IV tapes, but with a transfer rate of 3 Mbytes/sec.

—Lowell Rapaport


Software Eases Tape Management

To use a tape library as an archive, you need some means to make the library appear as a logical volume on your computer. Software on your tape library server takes care of file access, media management and hard disk caching needed to make it practical to interact with a tape library.

Jukebox management software can help here. According to Jerry Held, vice president of the storage management division at OTG Software (www.otg.com), the company's Disk Xtender software can manage tape archives as large as 30 terabytes. An example of a typical application is cancelled check archiving.

"We have customers who switched from keeping cancelled checks to keeping digitized images in a tape archive," says Held. "It would take as long as a week to find a paper check, but by keeping check images on a tape archive, the time it takes to bring one up is reduced to just a few minutes."

When tape archives follow the 90/10 rule, that is 10 percent of the archive is used 90 percent of the time, tape archive software can cache the most frequently used parts of the library on hard disk or a striped RAID array. Sutmyn Storage (www.sutmyn.com) makes Scimitar Virtual Tape Server, a tape archive system where entire tape volumes can be cached on magnetic disk storage. The caching is transparent to application software except for the hard disk speeds.

An opposite tactic is taken by FileTek (www.filetek.com). Their StorHouse software doesn't cache tapes to hard disk. Instead, it uses intelligent queue management to group file requests together first by tape to prevent thrashing, and then by position on the tape. By grouping multiple file requests made to a tape this way, file requests can be read sequentially taking advantage of tape's high streaming speed and reducing shuttling. Shuttling is where the tape drive repositions a tape while locating individual files. On a small scale, it is similar to thrashing media in a library.

StorHouse is not solely a tape archiving solution. It is a complete HSM system that migrages the least-used files to tape. By definition, this means that all the files in the tape archive are likely to be the ones best served by serially streaming media.

—Lowell Rapaport

 




Channels
Business Process Management
Content Storage
Content Management
Compliance
Enterprise Solutions
Document Scanning & Capture
Content Delivery & Publishing
Collaboration & Knowledge Management
Search and Classification
Locate an article from our print magazine. Just enter your Locator ID Number below.
ID#


NEWS FROM THE PIPELINE

OpenOffice.org 2.0 Closes On Final

New Study Finds Steep Growth For Smartphones

PalmSource Sale Cleared By Federal Agency

CTIA Panel Examines Enterprise Security Risks

[more]






HOME | ARCHIVE | REALWARE AWARDS

A Publication of the Network Computing Enterprise Architecture Group
Brought to you by CMP Media LLC, Copyright © 2005
Privacy Statement | Your California Privacy Rights | Terms Of Service