Enable a Python application to run in [Amazon Elastic MapReduce] environment by modifying well-documented and well-structured source-code.
The original application was developed to retrieve Wikimapia information and designed to enable proto-parallel processing: it can subdivide one task in order to run it in parallel in multiple computers and then collates the results. However, it was not developed to take advantage of [Hadoop].
# Preliminary Analysis:
Amazon provides an extended example on how to distribute Python processes ??" check [Finding Similar Items with Amazon Elastic MapReduce, Python, and Hadoop Streaming]? to get an idea of the desired result.
The application to adapt has fewer than 900 lines. See attached [[url removed, login to view]] for original project requirements - which were achieved superbly? - and the signatures of the two components that comprise the working application.
# Required Knowledge:
Familiarity with Hadoop and Amazon AWS is essential. Although Amazon Elastic MapReduce was just deployed, anyone with experience with Hadoop will pick it up fast. Some experience with [Hadoop Streaming] is a plus.
At ease with Python: some modifications to the original code will have to be made but it will mostly require being able to reorganize the code for Hadoop processing.
# 1. Modified Python scripts to run in Amazon Elastic MapReduce;
2. Documentation and working examples on how to use the scripts in Amazon Elastic MapReduce.