Create a pyspark cluster and execute a job


We have 1000 gz files with 700-1B json objs of 6KB each. Average gz file is 250MB and 2.5GB when unzipped. We have a total of approx 700M-1B files that we have to move to s3 and mongo.

Need to setup a pyspark processing pipeline that will process this and move the data.

Навыки: Python, PySpark, ETL, Обработка данных, Data Extraction

Показать больше: create ipcop cluster, job pays 1000 day, typing job per 1000, typing job per 1000 words, create crystal report execute netc, create invoice free lance job, create mysql cluster virtual machine, create naukri com type job portal website, execute job vb net, job pay 1000 weekly, captcha job per 1000, want job essay 1000 words, dream job essay 1000 words, part time job earn 1000 daily, job hi, create website and earn more than $1000 month, Big Data Entry Job ( Over 1000+ Entries), terraform create kubernetes cluster

О работодателе:
( 41 отзыв(-а, -ов) ) Mumbai, India

ID проекта: #32291933

1 фрилансер в среднем готов выполнить эту работу за ₹2500


I've worked 2 years as part of the team in charge of developing, deploying and supporting a prospecing solution project, working fully on Amazon EMR clusters written in python/pyspark. The Pipeline had a scheduled ETL Больше

₹2500 INR за 7 дней(-я)
(0 отзывов(-а))