December 1998
Prime Recognition Excels at Voting
Much has been made of so-called OCR "voting," a process in which multiple OCR engines are used to recognize the same strings of text. When the engines disagree on a character classification, an arbitration algorithm is invoked that compares the results and confidence levels of each engine and then "votes" on the best choice.
The trick in voting is to pick engines that are orthogonal to each other. In other words, each recognition engine makes different kinds of mistakes than the others. The use of complementary recognition models means that one OCR engine's recognition strengths are another's weaknesses.
Voting can produce outstanding recognition results -- if it is set up properly. For example, Prime Recognition (San Carlos, 650-631-9800), which is known for pioneering OCR voting, claims increased accuracy of between 60% to 85% when the right algorithms are combined with the right OCR engines.
Since approximately 66% of the money consumed by the "Imaging OCR life cycle" is spent on editing recognition errors, a low substitution error rate translates into much higher productivity -- especially when the recognition task at hand involves recognizing thousands of pages of text or more in a day.
Prime Recognition is the voting system that is most widely used in high-production OCR installations. For example, both the U.S. Patent Office and the U.S. Trademark Office are currently using Prime's voting scheme at 24 servers to OCR millions of free-text pages a day. If a third-party vendor offers voting, chances are that it is Prime's.
Main Article: