I am a linguist building a Google-based desktop search engine Engle, which processes and organizes Google search results as reference material for English writing and English learning. Currently, Google is prevalently used as a means of understanding and checking English expressions, and Engle makes this process much more convenient and powerful, by turning Google into an engine dedicated to a single purpose. Through many years’ research and user testing I have developed an ideal design with detailed algorithms (written in English). I am now looking for a developer to put them into code as MVP.
Engle has the following modules (see attached file):
* Input interface: conveniently takes user entries with drop-down menus from the search box and generates search keywords from it.
* Scraper: accesses [login to view URL] (not Custom Search) from a user IP at rates that minimize Google captcha and scrapes results for the keywords.
* Result processor: sorts and ranks the results by URLs and keyword matches.
* Output interface: displays results by their type and rank and supports various user controls over the results (multi-tab windows; nested i-frames; move, delete and add) and secondary searches.
I already have a scraper and have done many experiments with captchas. But it is in need of repair and revision. It would be better to build a new one, but hopefully current one is still useful as a resource. It is a .NET 4.5.2 Windows application written in C# and has detailed notes.
I am looking for a developer or a team to build the entire system, but will also consider developers with a strong expertise on some part of it. Importantly, I need someone to stay with me for a long term through completion, debugging, and new iterations. I see the project as a process of building partnership, so please apply only if you are interested in a long-term partnership.
Now, about how to proceed, If I shortlist you, I will send you the Engle MVP blueprint for feasibility, price, and delivery time. This is a document of about 6500 ~ 7000 words with several drawings. As this will take much of your time, I will pay for your assessment and feedback by creating a small private project for you. I would also expect comments and advice on clarity, problems, alternative approaches, etc. Based on your feedback I will make the final choice and we will start building milestone by milestone.
My LinkedIn profile: [login to view URL]
1. Will you build the whole or part of the system? If part, which part?
2. If you build only a part, will you be able to do the system integration and make sure it works properly as a whole including debugging? What issues do you anticipate in this process?
3. If you build the scraper, will you always be available to help when Google changes HTML markup? Can you suggest a system to ensure a prompt response? If so, how much will each fix/system cost?
4. Your background relevant to this project, why you are interested in this project, and why you are a good fit
5. Your public profile such as LinkedIn and Github if available
6. Coding language and technical platform you plan to use
7. Your price for assessment and feedback on the blueprint
8. Test question if you propose to build the scraper:
In my tests with my scraper, I found strong evidence that Google knows the difference between manual requests and bot requests although the scraper made requests at humanly feasible rates, 5 ~ 15 seconds apart: i) Even when the scraper is blocked with a captcha, manual search is still allowed; ii) It is much easier to solve captchas caused by a manual search (usually just a click) than it is to solve captchas caused by a scraper (long series of images):
8a. Does Google really know the difference?
8b. If so, how does it detect the difference?
8c. If so, why doesn’t it block the scraper from the beginning?
8d. What is your strategy to build a human-like scraper, and how far can you succeed?