There is an area of data security and encryption dedicated to hiding secrets, key data if you will, in plain sight called steganography. The information is there and often staring the reader in the face but without knowing the key to extracting it the information has no meaning and remains a secret. It’s the same with the key data in an invoice or text within other structured documents.
An ancient story of steganography involves a rich Greek who shaved the head of his slave, tattooed a message onto his scalp and then allowed the hair to grow back. Once the slave had a full head of hair again, he was sent to one of the Master’s friends with instructions to shave the slave’s head upon arrival.
inSTREAM™ automatically extracts key data from documents and delivers guaranteed perfect input to a customers line of business application such as an ERP, CRM or finance system. We’ve developed inSTREAM™ to do this with the minimum amount of human involvement possible. But inSTREAM™ has to carry out code breaking for every invoice from every supplier, every type of mobile phone bill either sent by post email or fax, every format of timesheet, survey or other document. On a daily basis it has to handle some very poor quality documents, even some printed on dot matrix printers whose ribbons have passed their best, meaning only the imprints are there to read but thankfully we haven’t had a tattooed head as yet!
Templates don’t work
One of the most common examples of hiding messages in plain sight is a “cardan grille”. It works by cutting holes in a page to create a template, this template is placed over the original document and the holes show only the data that should be seen. Many invoice processing systems work using this technique but they are fundamentally flawed in that they cannot cope with key data that might move up and down the page, onto different pages or other subtle changes to the layout of a document.
We all like ‘human friendly’
In reality, most “human friendly” documents simply don’t adhere to a fixed template. Instead the formats change to adapt to the data they contain such as the number of line items on an invoice. Other systems require an IT professional to spend days defining rules for each and every invoice. Asking them to define how to locate fields based on relative positions to other text on the page. The outcome may be satisfactory but the setup and configuration time is prohibitive.
inSTREAM is different, even when people are involved in the process it enables those who index documents to work in the same way as they normally do and, using the contextual information created by inSTREAM™ about the document prior to showing it to an operator, it uses its unique awareness or artificial intelligence to automatically create a ‘signature’ of the document which it can use for all future documents from the same source. This unique ability to learn sets inSTREAM™ apart and means that human intervention is minimised to reduce costs and improve performance and accuracy with amazing results.
Author: Richard Hill, Technical Director
Comments are closed.