|
July 2002
ON STORAGE
Innovative Storage Tames 'Fixed' Content
by Lowell Rapaport
According to storage giant EMC of Hopkinton, MA, up to three quarters of all digital data is
"fixed" content. Once saved, fixed content never changes and is only accessed and viewed. Examples
include archived e-mails, documents, content posted to Web sites, product manuals, and
specifications and scientific content such as medical images and geologic data. EMC developed a new
storage technology, called Centera, specifically to address the needs of fixed, unstructured
content. "Storage for fixed content needs to guarantee authenticity, online access, long-term
retention, scalability, location-independent access and low cost," explains Ken Steinhardt, EMC's
director of technology analysis.
To satisfy all these needs, EMC developed what the company calls a new storage category, content
virtualization. Content virtualization essentially replaces the mounted volumes, drive letters,
directories and subdirectories found in existing storage systems. Instead, Centera uses an
application programming interface (API) to let developers send files to a self-managed storage
device instead of creating a path through a complex directory structure. The storage device returns
an identification number to the application based on the content of the file. Every file gets a
unique identifier.
"The only way for two files to have identical ID numbers is for the content of the files to be
identical," says Steinhardt.
Basing file identifiers on content makes Centera inherently reliable. If a file's contents don't
match its ID number, then Centera knows that the content has been corrupted. If Centera discovers
two files with the same ID number, then it knows the contents of the two files are identical. This
helps the system avoid file duplication. The ID number scheme also supports versioning. When a file
is changed, the modified file gets a new ID number based on the modified content. Centera inherently
protects and preserves multiple versions of content under development and prevents tampering with
old content.
Synopsis
Vendor: EMC, Hopkinton, MA
www.emc.com
Product: Centera
Description: Network-attached storage optimized for fixed content.
Strengths: Strong redundancy and scalability. Platform independent.
Weaknesses: Requires specific support from application developers.
Price: $210,000 for five terabytes.
|
Another advantage of content virtualization is that it makes storage system-independent. Centera
is a network-attached storage device, so applications interact with Centera via internet protocol
(IP). This approach spares application developers from the messy business of keeping track of drive
volumes and paths. Content repositories store only a file's ID number along with metadata
identifying the content.
"Before content virtualization, archives had to store both metadata and a file's complete path,"
says Steinhardt. "Because the path was open to the operating system, there was always the
possibility of a file getting lost. Users and applications don't interact directly with Centera's
internal file structure, so there is less risk of losing a file."
Rounding out the Centera package is redundancy based on large arrays of clustered servers.
Entry-level systems use a cluster of 16 servers. Each server holds up to four drives of up to 160 GB
each. To keep costs down, Centera relies on standard hardware: Pentium III processors, IDE hard
drives and Linux for its underlying operating system. Clustering the servers gives Centera load
balancing and automatic failover. In addition, Centera's internal software mirrors data across
drives and servers. More complex than RAID 0, which simply mirrors drives, Centera's mirroring puts
a file and its copy on different servers.
Centera systems scale from five terabytes across 16 servers to more than a petabyte across nearly
3,600 individual clustered servers.
"It's a pretty impressive solution," says Galen Schreck, an analyst with Forrester Research in
Cambridge, MA. "Content virtualization software has been around for some time, but EMC is one of the
few companies with the clout to attract developers and become a standard."
E-mail archiving and document management integrations with Centera are already available. OTG,
Tower Technology and Artesia Technologies are among the many content management software vendors
that have announced or completed integrations with Centera.
As for customers, one of the early adopters of Centera is Framingham, MA-based Connected Corp., a
backup software vendor with an application service provider business protecting 102 TB of data for
its online customers.
"We specialize in software that backs up every computer our clients own, including desktop PCs,
workstations and servers," says Tom Hickman, Connected's engineering operations manager. "An
enterprise backup is the result of multiple incremental backups made on a daily basis." Hickman says
a large backup customer for Connected can have as many as 80,000 individual computers, resulting in
80,000 incremental backups every day. Prior to deploying Centera, Connected used a two-stage backup
solution. Computers were first backed up to a RAID and then moved from RAID to tape.
"Centera's built-in mirroring makes it reliable enough for backup," says Hickman. "The clustering
makes it fast enough for an incremental enterprise backup. Plus, since it's online storage,
restoration of files is quick."
Hickman says Connected expects to save money with Centera. "It's too early to determine total
cost of ownership, [but] we currently need an administrator for every 25 TB of tape storage. Centera
should let a single administrator manage up to 100 TBs."
|
|