December, 1996
Easing the Growing Pains of an Enterprise System
In a world faced by mortality, everything has its lifecycle - even your enterprise imaging and document management system. It's your job to make sure the enterprise system you employ today can grow up and play with all the other enterprise system components tomorrow.
Not long ago -- a mere decade or so -- there weren't any enterprise systems in existence. That's changed radically. Today, nearly every company connects LANs and WANs across departments or across applications. They share buckets of information in one large structure or between many geographical sites.
Existing systems are as old as 11 years old or as young as four. Every technology advances over the years. Since an enterprise system
affects just about every aspect of your networks and desktop, it needs to grow with the technology that will upgrade those devices.
How you connect your system throughout your enterprise depends on your needs. Often, a company uses an imaging system in a department, which inputs information, processes and shares it across many seats. The department loves the system so much that it catches on and before you know it, an entire company and its many sites are connected. Voilý... An Enterprise is Born.
Well, that's one way. Another way is through a need to share information throughout a company. There may have been one application that had to be shared between 100 users at one site. Over time, that application expands to multiple applications, seats and sites. This scenario involves more than just letting departments talk to each other -- it's about letting everyone in your LAN/WAN communicate.
An enterprise system's conception and birth is one aspect of a purchasing decision.
Picking the features you want is another. How your system will develop and survive is something you have a little more control over than you might think. Not only does technology change, but so do business needs. You need a
system with chameleon-like characteristics. It should blend into anything new you impose on it.
"Cost of ownership is a hidden cost that a lot of companies ignore when they implement enterprise software," says Larry Warnock, Documentum (Pleasonton, CA 510-463-6800)
director of market development. "You need to define what you want it to do and how you're going to judge if it did it. Then make your business decision on that criterion only."
There are many issues to consider, like scalability and the system's ability to support a corporate infrastructure. To evaluate this, find out how many network options the software supports -- NT, Unix, etc. Chances are your enterprise is running a few different networks. You need something that will support all of them.
The life expectancy
of your enterprise system
isn't in your hands alone. Responsibility also lies in the company that sells you the software. If they don't have a strategy for backward compatibility and a direction for growth, you don't either. What good is it to employ a system by a software company that isn't going to be around in three years to upgrade it?
Vendor support is like a child's doctor. Your child can go to school, ride his bike down to the park, but what if the kid falls off the bike? OK, you can fix a scratched knee, but can you set a broken leg? The doctor can. Your enterprise vendor should be able to keep you up and running.
This includes on-site support, updates, developer bug fixes and new releases. Like a child, far from a hospital or doctor, who constantly breaks his bones, a system plagued with the same issues will also grow up deformed.
The Doctor's In
Trying to predict the life of your enterprise system, like that of your child, is hard. You can nurture it, take care of it, but if it's got defects, you're in for heartache. At the same time, how long it will live isn't important if it doesn't do what you need. (The enterprise system, of course, not the child.)
Think of a library and the books, magazines, newspapers, videos, CDs and LPs they have in one building. All that information has to be organized and easily accessible. If you saw it somewhere once, it had better still be there when you need it. The samething applies to an enterprise
system and the many documents, images and information contained throughout that
system. Where everything is and how it's being used is all contained in an index.
Want more info? Try the Internet. It has oodles of
information just waiting to be discovered. This world of information is making tracks into enterprise systems through intranets. Some companies say that the Web, the tool that lets you access the Internet, is the key to an enterprising future. What the actual impact of the Internet will be remains to be seen.
A company's infrastructure is only as strong as its weakest link. In a WAN, a good number of people have access to a lot of information. Securing the data, regardless of where it sits on an enterprise and who sees it, is key. The security provided by these enterprise systems will make all the difference.
It's one thing to make sure your data is secure, but making sure your data is there is quite another. It's been a tried and true method to
attach jukeboxes to the enterprise system. Prices dropping and capacities rising on magnetic disk may be giving jukes a run for their money. That's a trend for active data. Storage decisions for non-active data also need to be considered when you're looking at an enterprise system.
We take a look at how companies prescribe solutions for these integral parts of an enterprise system.
Check Out the Enterprise Library
The index, sometimes called the catalog, is the heart of an enterprise system. It organizes all the documents and images in a system. Indexing is how each item is classified and stored.
Sometimes the nature of the information will define how it should be indexed. General correspondence is
often useful for full-text index, while accounting transactions, invoices, etc. are better indexed using standard index fields like account number.
Another way to consider
index issues is to decide who needs access to what type of information in the enterprise system and how often. Some solutions have lots of options.
Which brings us to replication. Replication actually supports a much -- needed feature -- the ability to have many people look at one document simultaneously from many sites. If many people are looking at the same document at the same time, they may all have additions they want to make to it. This screams potential danger. You might end up with numerous conflicting versions of a
single document running around your enterprise.
One way to deal with this hazard is to maintain a separate annotation layer. Changes are never made to the original image. Instead, if someone accesses a document and wants to add or delete something, they actually make annotations that are attached to the image. Another option is copying the document and saving it as a new document and then indexing it. Whichever option you choose, there's always a trail on the document and what happens to the data.
In either event, you need a form of document locking, which the enterprise system should provide. The first time someone accesses a document, it becomes locked. When changes are made, the image is updated in the original repository. So when other people decide to view that document, they can, but they don't have the ability to make any changes to it. Once the first user closes that document, it becomes unlocked and the process starts again.
FileNet (Costa Mesa, CA 714-966-3400) indexes in what they call the
document entry subsystem. FileNet offers many ways to index documents. This lends itself to flexibility resulting from the need for change often based on cyclical factors and new procedures. These are just a few of FileNet's extensive indexing options:
As the document is scanned, it's also indexed simultaneously. This can be done with barcodes or OCR. If you use OCR, there is a verification step to make sure the OCR was done properly.
Asynchronously from scanning. Someone in another area can access the batch of scanned documents and index them. This person can be anywhere in the enterprise. Multiple phases based on document type. A document can go into one of several queues (for example, one queue for checks, another for invoices and so on), where it gets indexed even further. This approach is usually related to time constraints and the
importance of the document.
For example, a lot of mail may get dumped on your desk. Some of it may be time- critical. Other pieces may be important (like your tax return check). Then, of course, there's the junk mail. So you would want to divide the mail into separate piles. Next, the individual piles would be scanned into the system. Once a batch is scanned, it can be separated even further into different queues (a list of documents or work items), where the document would then be available for different processes.
Divide indexing. In this approach, one person scans while many people index the documents. You can pick how many people do this. Assisted indexing. If you type in a certain thing (a word, letter, number, etc.), it triggers another application that may have the same information. This way, indexing is constantly compiling. Let's say you have a letter that refers to invoice 535. Based on the invoice number, it will be
indexed by the imaging system, which calls the invoice application for further indexing. This provides many routes to one document -- in this case, to the letter referring to
invoice 535. You can access this document through either the imaging or invoice system.
ViewStar (Alameda, CA 510-337-2000) has a flexible index engine set up for customers using Oracle or Sybase. It includes partitioning, which defines the physical location and shape of the data on the source document. This resembles an index card in a card catalogue found in the library. Just as you can tell different things about a book by looking at different regions of the card, partitioning extracts index data that can be managed from a database.
The index is stored chronologically and taken offline using storage media such as tape, optical, etc. and then stored somewhere else. ViewStar's enterprise system is set up so you can have multiple copies of the index at multiple sites. This is particularly helpful in a WAN situation where people need quick access to remote documents.
IBM (Research Triangle Park, NC 919-543-5221) also handles multiple indices. It permits a primary index and a secondary index. The information provided also resembles the card catalogue prototype. This works well in applications where people need to know if something has been processed or not. A client (who may be a customer service attendant) can query the index server and retrieve information on a document without actually seeing it. This speeds up simple searches.
An interactive approach to indexing comes from Zylab (Gaithersburg, MD 301-590-0900). This evolved from taking a look at how information enters an enterprise. Using Zylab's indexing, you can
either do updates in batches or interactively. (Batch indexing means adding information to a full-text system while searching and accessing at the same time.) This method forces you to block out users to do updates and then let them back on after. The interactive approach gives you real-time access to information.
Another option is setting up automated indexing for convenience. This way, the end user doesn't have to intervene to initiate indexing. You can even set up a calendar of indexing events on a desktop client or a client dedicated to the task.
If you handle indexing at the desktop level, the machine better have a lot of power. This means enough temporary space on disk, a CPU with enough horsepower and memory in the system so other applications running on the machine don't suffer during updates. The other option is to have a dedicated client or to schedule indexing so it's done at night. "This is a time-critical decision based on how real-time updates have to be done," said Chesleigh Jonker, Zylab's director of toolkit sales.
For forms processing,
automatic indexing using OCR, ICR, and barcode technology, look at Data General's (Westboro, MA 508-366-8911) solution. Express-Track is a batch-scanning and indexing subsystem that interacts with the server component of the AV Image system (another one of Data General's products). It supports up to 200
index fields for each document. Point-and-shoot OCR lets you put a box around any scanned document and OCR it into a field. Any field(s) can be turned into an external database that resides outside of the image database.
Surfing Through the Fire
The new millennium is fast approaching. It won't be a shock if we're all riding the Internet wave on a long board. The big kahunas in that ocean will be the enterprise systems that take their seats along for the ride.
Feith (Fort Washington, PA 215-646-8000) and Documentum predict this trend will engulf the future. Their enterprise systems are already integrated with Web technology. They index information by field (card catalog) as other non-Internet systems do, and by full text, which seems to be most popular in imaging systems that use Web technology.
Don Feith, president of Feith, says, "Document management in an enterprise should be done through Web technology. It's something that's available to everyone in an enterprise with very
little support cost per user. You don't have to give any software to a client other than the Web browser they already have."
Full-text systems index each word in a document. Feith uses UTR (universal text recognition) to do this. UTR reads all the words in a document that enters a system
using OCR, barcode or binary text recognition. Then Feith puts it into their full text
retrieval engine called TREE.
The actual document (or object) is stored on Feith's mass storage server in an
industry-standard index data-base like Oracle, Informix or Sybase. You can get to the document through two routes: full-text retrieval using TREE or the database index.
Documentum's approach is similar. All documents are controlled by a server (DocPage Server) that stores,
controls and indexes documents and Web pages. The field index is stored in a
relational database (such as
Oracle or Sybase). The content is indexed with Verity's full-text indexer. They also index the document based on where it fits in the business process or on its relationship to workflow. The workflow engine is included.
Their newest add-on product, RightSite, really
extends an enterprise system's access to the Internet. It brings in existing Web sites and lets you build custom
applications onto them. A spider takes information from an existing site by tracing the hyperlinks and retrieving all the links and contents of the site. It then downloads the links and contents into Documentum's system. The spider goes out to an existing Web site and draws the entire site into the server. So it's replacing the existing file system with the Web contents. Then the system has total control over the Web site. You can work with it from your desktop and keep track of and manage the documents in that site.
A Blanket of Protection
When you have a lot of information available to a lot of people, there's a potential for trouble. What kind of trouble? You don't even want to think about it. Instead,
figure out how to prevent it. In an enterprise system, that solution is security.
Often, a company will already have a security system in place, even if they haven't yet migrated to an enterprise level of connectivity. Enterprise software should conform to the security already in place. It shouldn't be the only security mechanism in an enterprise.
Each enterprise system will have its own security built into it. Some systems will do security updates (like password changes, user additions and deletions) automatically. Other systems will delegate these tasks through a series of pop-up windows. An example of this would be when you start an application, a window might be there first asking for your name and password. If you pass, you get in. We talked to different companies to find out how they handle it. This is what we found.
There are different levels of security. C2 is the lowest security option defined by the government. B level is more secure. Data General uses B2. It ensures a specific user can do particular things. It protects all the users in the enterprise, even the system administrators, and controls what they can or can't do. This level works with Unix. According to Data General, NT can only have C2 as a
security option. Beware!
The system administrator in FileNet's solution
establishes a profile for every user on the enterprise. The user profile defines an ID and document class (type of document) the system has, as well as passwords. When you add a document to a system, you give it a document class, which defines the access rights to that document. You can also provide security parameters to the annotations on a docu- ment. Do it manually with dialogue boxes or automatically.
ViewStar has a comprehensive security method. All information in a ViewStar enterprise system is held in databases that have different security levels based on geography, user, network and
partition. Segment the database on whatever criteria makes sense to you. Sometimes do it for access security, which limits who can get to what documents. ViewStar's security sits on top of the company's existing security system.
A five-tier security option comes from Feith: Read, Write, Modify, Delete and Broadcast. When documents enter the system, they go into the various groups set up by the Feith system that have certain security attached. This includes things like knowing a document exists but not being able to view it. Or a user being able to see it but not modify it.
It's broken down even further. You log in with a password. Once in, you can join a group if you have the group's name and password. Groups have an administrator who controls who's in the group and the people who can join. They also define what you can do in the group. It's not enough to be accepted.
For example, the group administrator gives you permission to broadcast a document to the group. They also have the ability to withhold that permission. They can specify that certain documents may not be shared with the group.
The Data Stops Here
Securing data is pointless unless you can store it somewhere. In an enterprise system, storage (online, near-line, off-line) is essential. Regardless of how you decide to store your data, you have to be able to migrate your data. Making sure data is around as long as you want it isn't about deciding how long that will be. It's about deciding the best way to keep it so that it will be there when you want it.
Zylab's Jonker notes, "If you don't adequately examine the needs of the data you're trying to preserve, you're going to make the wrong decision. When storage issues of your enterprise system boil down to a budgeting question, there's going to be a bad decision. Data needs to be available for many, many years."
IBM's Schwartz agrees: "The biggest investment is in preserving the security and integrity of the data. What's put on an object library has to last a long time. The way you get at it may change over time, but the data has to
be protected."
Storage should be transparent and irrelevant to the user. It's the technology department's job to figure out the best choice for the enterprise. When employinga storage system, remember users shouldn't be concerned with where the information comes from, just that they can get it QUICKLY.
Typically, jukeboxes reside somewhere on an
enterprise system. Though it's still a popular attachment to a server, it's not the only choice. Remember, a large population of users access the information. Achieve access through careful system design. Implement a plan to ensure it.
Don't buy a jukebox just because it's inexpensive. Look at how the software controls and manages it. Examine configuration utilities and the
online/off-line diagnostics.
Magnetic disk is another option. Prices have dropped. Capacity has increased. RAID is becoming more visible in these environments.
Disk arrays have plenty of benefits. They're fault tolerant and constantly being refreshed. They're great for frequently accessed information because they're faster. Everyone hates waiting (or latency, in storage lingo). Lack of latency is hard to beat. RAID arrays are being attached to the enterprise right off the server. What's more, they've gotten relatively cheap. RAID pays.
Some vendors see customers using hybrid solutions that combine magnetic, optical and microfilm. The possibilities are endless. The hybrid approach is a viable solution to examine in a large enterprise system, because of the variety of documents filtering through the system. The solution boils down to high availability.
Besides picking a hardware solution that's right for you, you have to implement a backup and restore strategy. We're not talking about a department's storage needs. We're talking about an entire enterprise, where lots of users are affected. Examine the economics of what you're storing and the access to it you need to provide.
HSM (Hierarchical Storage Management) should be integrated into enterprise systems as well. A multi-level storage hierarchy lets you have an entire enterprise hierarchy to move data around transparently.
Wang's OPEN/image is designed to use jukeboxes as permanent archival storage. Now, OPEN/stor provides a whole new dimension in storage capabilities that sit alongside it. Companies are leaning toward centralized data storage because of security issues. So they're implementing fully secure glass houses.
OPEN/stor sits underneath the server, invisible to network users. There, it manages data by taking it off the server and moving it someplace secure (the glass house). In the data's place, it leaves a marker of where the data is instead of the actual data. If the server is ever stolen, the thief only gets the data placemarks. No stolen data nightmares.
Besides moving data based on when it was accessed last, the system can also do forced migrations. If you want to move the data to the glass house after you use it, you can.
OPEN/stor manages NTFS volumes (File System for NT) on the network. A single OPEN/stor machine can manage multiple NTFS volumes in potentially multiple NT systems. It also works with IBM's ADSM (Adstar Distribution Storage Manager) for mainframes. ADSM provides links to storage devices for the AIX platform. In this scenario, ADSM looks like a tape library. Data from all the servers gets funnelled into the ADSM, which stores the data.
Related Articles: