Build workflows in KNIME
ONLY MAKE AN OFFER WHEN YOU HAVE ALREDY DONE KNIME DEVELOPMENT PROJECTS!!
ALL OTHER OFFER WILL BE IGNORED.
1. Extract content
Extract content from PDF documents in a folder
2. Isolate textblocks
- All textblocks in the document are identified (isolated)
- The start- and end location of each textblock in the document is identified
- A textblock can contain several paragraphs, sentences, words or a single word.
3. Label textblocks
- Automatically assign label to each text block, based on specific keywords for that each label.
- The specific keywords are retrieved for a table in a mySQL database. These keywords are used to see if they match the words in the textblock. If a textblock contains less then 4 words, then there must be a 100% match. If the textblock contains 4 words or more, the match van be partial. A treshold value (x%) is used. So for example if more then 85% percent of the textblock matches the keywords of a label in the mySQL table, there is a succesful match and this specific label is used for this textblock.
4. Store textblocks
Each textblock is stored in a table in a mySQL database with these values: