DIGI-Xtract

A fully automated data extraction solution that can eliminate the need for human intervention.

A Document Processing solution built on the base of Machine Learning, Deep Learning technologies to perform document classification, data extraction, and quality control applied to various document types.

DIGI-Xtract supports multiple and can also be customized for special document types respectively to the client’s business languages & requirements. The service can be securely and remotely hosted at DIGI-TEXX’s Data Center or simply deployed at the client’s premises using state-of-the-art technologies.

Supported form types:

Structured forms: Application, ID/passport, birth certificate, bank statement, financial statement, payslip, invoice, receipt, tax invoice, bill, purchase order, quotation, delivery note, etc.
Semi-structured forms: Patient medical histories, field inspection notes, labor contracts, incident reports, work permits, confirmation letters, etc.
Unstructured forms: Handwritten letters, personal notes, personal journals, handwritten research, handwritten historical documents, examination sheets, catalogs, land register books, construction and architectural drawings, etc.

DIGI-Xtract COMPONENTS

The following functions can be used individually without the need to deploy the entire solution.

AUTO CLASSIFY
AUTO EXTRACT
AUTO QC

AUTO CLASSIFY

DIGI-Xtract recognizes and classifies various document types automatically, using the Auto Classify component. It can accurately detect document types based on vectorization. The system then routes the document to the Auto Extract component to extract data for an optimized accuracy rate.

The Auto Classify function also classifies the quality of the input images, then DIGI-Xtract will route the classified images to the Image Quality Enhancement function.

AUTO EXTRACT

Auto Extract includes Field Detection & Text Extraction, using our Optical Character Recognition (OCR) technology.

With predefined data fields, the Field Detection component picks up the correct data field from the image and processes the extraction securely based on the snipped image.

After Field Detection, depending on which field type a piece of information is, a corresponding data extraction engine will be used for extracting the text out of the snipped image.

With this method, full information on client documents will not be seen or shared by any third party.

Auto Extract produces a confidence score for each data field. The score can then be used to determine the quality of the extraction in the set of rules of the Auto QC component.

AUTO QC

Auto QC runs the quality control based on a complex scoring combination:

Common rules such as the format of IBAN Number, ID Card, Postal Code, Age, Gender, Date/Time, etc.
Business rules based on the client’s business domain
Data Field Relationships such as [age, gender, disease], [title, salary, business], [hospital, treatment, age, gender], etc.
Image Quality Analytics: clear/unclear, blurred, skewed, flipped, distorted, low resolution

With the traditional quality control approach, there are different methodologies with human involvement. With Auto QC, the process is broken down into data levels and tracked by metadata in various steps. The Auto QC runs through 100 percent processed data and points out potential errors.

With the score, Auto QC can detect the potential error and control the Straight-Through-Rate (STR) so that the system can decide to let the data go through or transfer it to the data correction step for quality enhancement.

AUTO CLASSIFY
AUTO EXTRACT
AUTO QC

AUTO CLASSIFY

The Auto Classify function also classifies the quality of the input images, then DIGI-Xtract will route the classified images to the Image Quality Enhancement function.

AUTO EXTRACT

Auto Extract includes Field Detection & Text Extraction.

With predefined data fields, the Field Detection component picks up the correct data field from the image and processes the extraction securely based on the snipped image.

After Field Detection, depending on which field type a piece of information is, a corresponding data extraction engine will be used for extracting the text out of the snipped image.

With this method, full information on client documents will not be seen or shared by any third party.

Auto Extract produces a confidence score for each data field. The score can then be used to determine the quality of the extraction in the set of rules of the Auto QC component.

AUTO QC

Auto QC runs the quality control based on a complex scoring combination:

Common rules such as the format of IBAN Number, ID Card, Postal Code, Age, Gender, Date/Time, etc.
Business rules based on the client’s business domain
Data Field Relationships such as [age, gender, disease], [title, salary, business], [hospital, treatment, age, gender], etc.
Image Quality Analytics: clear/unclear, blurred, skewed, flipped, distorted, low resolution

DIGI-Xtract FEATURES

Automated extracting data from structured/unstructured/semi-structured forms/documents

Image Quality Enhancement at the preprocessing step

API gateway integration

Manual data entry elimination

Web Monitoring Services for real-time tracking and automatic reporting functions

High performance and quality

High availability of back-end processing systems

PROCESS OF THE PRODUCT

STRAIGHT-THROUGH PROCESS (STP)/AUTOMATION PROCESS

GUARANTEED PROCESS/AUTOMATION PROCESS WITH HUMAN TOUCH

ACCURACY RATE

Our accuracy rate calculates a confidence score that measures the certainty of the extracted data from its original image. A higher accuracy rate, which is dependent on the quality of the assessed document, brings better data quality and supports analytical purposes.

To ensure top-notch quality for our clients, DIGI-Xtract uses intelligent engines that automatically check and confirm the extracted data by comparing it with other databases or external sources. This leads to a great level of accuracy.

The accuracy rate can be measured by various units such as character, word, field, and line.

CLIENT SUPPORT

DIGI-XTRACT is supported and delivered by an excellent onboarding team partnered with our service management team.

All projects are monitored 24/7 by our Network Operating Center to ensure optimal service availability.

DIGI-TEXX provides an end-to-end client experience from the first step of analysis to the final step of implementation and enhancement. On top of that, the service management team accompanies clients throughout the whole operation phase to ensure a smooth transition and successful delivery.

WHAT MAKES US DIFFERENT?

AUTOMATION WITH 24/7 MONITORING

Fully automated solution with no human intervention and a transparent process with Web Monitoring Services that provide data status for each step.

EASY INTEGRATION AND QUICK SETUP

Based on the client’s demand any customized transfer methods (Secure Transfer Protocols, API, Email) fit the client’s system. 2-4 week setup time.

FLEXIBLE PRICING MODELS

We offer various options allowing our clients to accommodate different segments and their specific requirements. Our flexible pricing model includes subscriptions, pay-as-you-go, and bundling.