I have a Python script (Scrapy or Selenium, I am up to suggestions) to extract information within some specific websites - daily(auto) or manually.
The page is in portuguese, but I will provide a video with the scrapping instructions and the arguments for the function.
1. User arguments:
- start date.
- final date.
- Keywords(can be a list of words) to look for.
- User chooses the local path to the files to be downloaded.
- Access the page.
- Searches the tabs that can have useful information(I will provide the specific parts for each webpage to make the queries) links inside that domain.
- Download(if the page serves files in doc, html or pdf) and look for the keywords.
- Extract all the related content (PDF files or the text in html).
- Go around Captchas(if the page has captcha)
- All the extracted content must have the URL which the information/file is available in the webpage - can be done by logs.
- All the extracted content must have the DATE which the scrapping has been made - can be done by logs.
- All key-fields (like CSSSelector for a date field) should be configurable for each spider.
- The URL to start scrapping each webpage should be configurable.
- If page contains Authentication(Login/Password), user will fill the configuration for it.
1. I already have a script working for a webpage. The goal for this project is adapt this script to work similary on another page.
2. I WILL RELEASE THE MILESTONES ONLY AFTER YOU SEND ME THE CODE AND I AM TOTALLY SATISFIED (I WILL RUN TESTS TO CHECK FUNCTIONALITY).
3. MUST BE PYTHON CODE.
I have many projects at hand and would be great to stablish a good relation with you, since I constantly need someone to work with me.
9 фрилансеров(-а) в среднем готовы выполнить эту работу за $94
Hi there, I am a software engineer by profession and I can do this work for you. Please let me know if you want me to do this work for you. Regards, Raghav