BUSINESS CHALLENGES
Our Client
We serve a leading international insurance and financial services company with over 1.5 million customers operating in Asia, Canada, and the United States. In the Vietnam market, their network of 80 offices provides financial advice, insurance, wealth management, and asset management services for individuals, groups, and institutions.
Project Challenges
Limited OCR’s Captured Capacity
The client’s current OCR engine is powered to capture ID cards, yet, the onboarding process is now open for various ID document types (ID cards, Passports, Birth certificates, military ID cards, etc.). This leads to the limitation of documents processed by the OCR engine, resulting in an increase in the human workforce for verification.
Project Objective
- Shorten the document and data processing time for one document to < 1 minute.
- Facilitate an end-to-end automatic approval process while ensuring data accuracy at the highest level.
Project Scope
Build a straight-through process for customer and agency onboarding by enhancing the OCR engine’s extraction capacity
- Document types:
- Identity documents (ID cards, Passports, Birth certificates, Military ID cards, etc.)
- Application forms
- Languages: English and Vietnamese
- Service time: 24/7
- Committed accuracy rate: 95%
SOLUTION
Data Extraction Solution
The quality of the input data plays a significant role in defining the output quality, therefore, DIGI-TEXX has developed a three-step data extraction with no human verification needed.
DIGI-TEXX applies Image Quality Enhancement technology in the pre-processing step to transform the images and make them more OCR-friendly in later processing stages.
This technology identifies the key features and details of the images, then adjust them using digital image processing techniques like:
- Remove image background noise
- Adjust skew and rotation
- Crop the excess areas
- Tune the brightness, sharpness, and other color settings
The processed documents will be processed by DIGI-XTRACT, a Document Processing service built by DIGI-TEXX’s software development team.
DIGI-XTRACT is powered with Machine Learning (ML) and Deep Learning (DL) technology to enrich the data extraction quality to more documents like birth certificates, passports, military IDs, and bank statements.
Auto QC runs the quality control based on confidence level – a complex scoring combination to ensure the highest output quality:
- Common rules such as the format of ID cards, Postal Code, Age, Gender, Date/Time, etc.
- Business rules based on the client’s business domain
- Data Field Relationships
- Image Quality Analytics: clear/unclear, blurred, skewed, flipped, distorted, low resolution.
When the extracted data is below a predefined threshold of confidence level, a notification will be sent to the client for further steps.
BUSINESS OUTCOME
- Processing time per document is shortened from 3 minutes to 5 seconds/ document.
- Accuracy Rate: 60% to 97% (on field level)
- Enhance the client’s document processing capacity from 95,000 pages/month to 3 million pages/month
- The data output quality is no longer dependent on human