There are many off the shelf products I'm already looking at, which charge based on each page converted, but I'm wondering if someone here has the experience in building something custom from known building blocks that will enable us to:
1. Extract data from a PDF or Image documnet via OCR tools and PDF parsing (do not want to have to set it up for each template, that's where the neural net training by classification needs to happen, and there would be 5 classifications)
2. Structure that captured data into a JSON or XML file (We would use that to create records from).
The goal is to pay for a custom solution that we can completely host on our own server, and not have to pay annual or per document fees to an off the shelf software company. Also, I prefer to stay away from a solution that requires all the data be sent out for processing via API.
requirement: python NLP, VIDEO MEETING.