digitalisation of documents : desktop App (server/client) -- 2
Оплачивается при доставке
what we have :
several different documents
- some documents come from scanner (scanner put some pdf/img in a specific local net folder)
- some other documents come from email, pdf or image but in any case are already digital since the beginning.
what we need:
Final purpose of this system: classification of various documents(pdf, image, etc..) and content datas manager
Please keep open the possibility to ri-elaborate these datas with specific procedure to be defined after. These procedures will have a specific link in the app menu.
-user: can create a template and tag it (category). He can choose numbers of datas needed to extract from document. That means each template documents has (probabilly) a different table of datas to full fill.
-system: pre-analisys of document, template comparing, and then category proposal with relevant datas-user: check system proposal, changing what's wrong.
-system: learn the changing and store data in data base
Just a simple idea of what we have in our mind is
1. input documents (pdf, images with text, ...)
2. compare document with existing templates
3. If no models match -> go in learning mode
a. Add new document template
b. User can select some area in the image and for EACH of them should define at least:
i. Tag (choose between category name or add a new one)
ii. Positioning in the document (area selected)
iii. If the value you will find here is fixed or variable value
c. All this feature will be store in DB as per template
d. The value of this feature will be store in DB as document processing
4. If the models match (=all fixed value of at least one the templates match with document)
a. Software check with the template all the feature value you need to store
b. Using match accuracy criteria (level could be define from user using a general setting board/window) system found 2 level of comparison result:
i. All field matching with high accuracy : GREEN situation, system store data automatically
ii. Not all field matching with high accuracy : YELLOW situation, system store in the database after user approval
At the end we will have a database with all document stored inside with all the datas we need and categorized and searchable by tag
To do this you could use AI OCR/object detection and do on, standard OCR-text, zoning OCR and also a mixed or a double check with different method on the same document. The important thing of course is reach the final purpose.
ID проекта: #23560675