I have a proxy supported python scraper hosted on Heroku using Django framework that does the following:
1) user input a list of keywords
2) scraper goes to [login to view URL] and searches for each keyword and returns product information for the first page of search results for each keyword (usually 60 products per keyword). Product information scraped includes title, description, features, review rating, price, seller, etc. It also use [login to view URL] api to return google keyword planner search volume data for each search term.
3) when complete it exports the data to a spreadsheet
The scraper is fully functional and no issues there.
The issue i am having is my scraper is using too much memory and will abruptly stop with it hits the usage limit on my heroku account (1GB). I am looking for someone with experience troubleshooting and optimizing python scraping code that could make my scraper run more efficiently. There may be a smoking gun that is the main cause of the memory leak.. im not sure. See attached image of the heroku metrics page showing the memory spike.
If it cannot be optimized further than it already is, my last resort I can think of is to restrict the number of scraped results for a given keyword (for example 25 scraped product results per keyword) so the export file doesn't return 60 results per page, thereby reducing the time it needs to run and memory used. This would also require altering the code that restricts the number of keywords the user can search on each run.
The deliverables for this project include:
1) optimizing the source code (currently on Heroku) so i can scrape a minimum of 10 keywords & 25 results/keyword per run without the scraper stopping.
2) I also have a couple really simple tweaks i need made to fix a couple fields in the export data.
I'm hoping to find someone with strong experience with python scraping, Django, and proxies that can fix my problem simply and quickly. I have a lot of future work possibilities for someone highly skilled in python web scraping. Please message me with further questions.
Hello, there. I have good experience with multi-thread in python. I want to check your scraper script. I can't sure without checking the code. Please message me. Thanks.
22 фрилансеров(-а) в среднем готовы выполнить эту работу за $199
Hi There, I can do it very quickly & effectively. I'm having more than 18 years of web development experience. Looking forward to work with you! Thanks!
Hi there . I am a Python expert, and after reading your requirements caredully, I am sure I can write an Python application for you thatbcan scrape the Amazon sites
I don't know which scraping technique the previous developer has used. You have to show us the code. Then we can bring you ideas to optimize your code.