|
September 2001
ON STORAGE
Content Conundrum: Storage Management
by Lowell Rapaport
Robust storage management is essential for any content management system. These systems rely on complex databases and must store metadata, usage statistics, user data and the content itself. Each type of data has different storage requirements.
Legacy enterprise document management systems (EDMSs) include storage management. Many of these systems were developed when storage costs were about $1 per MB and network bandwidth was in short supply. They had to balance the need to keep files in fast, expensive local storage versus using valuable network bandwidth with hierarchical storage management systems.
In contrast to EDMSs, many of the newer Web content management systems do not offer storage management. This is due, in part, to the fact that storage is a lot cheaper and bandwidth is much more plentiful today. There are also more options for storage, including management software, storage area networks, DVD jukeboxes and network attached storage.
According to Pat Turocy, principal analyst with Doculabs, Chicago, storage options within content management systems can be redundant. "If a company already has storage management [as part of a SAN or archiving system] then there is no need to double buy," she says. "Plus, an independent storage management system can be used for any application, not just content management."
Storage is not a core competency for content management vendors, points out Erik Ottem, senior marketing executive at SAN equipment provider Gadzoox Networks, San Jose, CA. "Storage is becoming a utility, letting application vendors concentrate on content," Ottem says. "The cost of storage is so low that it is no longer necessary to micromanage to gain scalability."
However, there are potential pitfalls if storage systems aren't carefully matched to the needs of content management. "It is possible to lose the links between content and metadata in a database when content is migrated out of primary storage," Turocy says.
Storage management systems can only see file attributes. Without specific instructions, these systems have no idea which files are essential and which are obsolete, and they can't update a database after changing the location of files that the database points to.
The most obvious advantages of having storage management functions built into content management are the potential performance gains associated with fine-tuning storage. If, for example, content comes in many different sizes, files can be directed to RAID systems optimized for large and small files. If it is the subject of active searches, content - such as metadata - can be stored on fast storage while dated and inactive content can be placed on near-line storage. Scalability is gained not by adding raw storage capacity, but by using available storage more efficiently.
Some vertical applications may also require that a content management system be able to control storage. For regulatory reasons, financial or medical data may need to be kept on optical storage. A content management system needs to differentiate regulated content and put it in archival storage.
When a content management system directly controls storage, the system is aware of the available storage devices and their capabilities. Administrators have the advantage of a single console through which they can manage content and storage at the same time.
Content management systems can also control storage indirectly through a storage management system, but it requires careful integration and customization. The advantage of the indirect model is that the storage management system is also available to other applications, lowering storage costs and giving administrators more flexibility.
No matter which approach you choose, you must think about storage infrastructure when you set up any content management system. If you expect your system to remain relatively small, then you may get by with separate storage management. If, however, you need the content management system to scale up while maintaining optimal performance, then consider the advantages of tighter integration.
|