Some Spark and hive queries

Job Description:

Spark Use Case (Movie Review Analysis)

IMBD is an online database of movie-related information. IMBD users rate the movies and provide reviews.

They rate the movies on a scale of 1 to 5; 1 being the worst and 5 being the best. The dataset also has additional

Information, such as the release year of the movie. You have to analyze the data collected and answer the following questions.

You need to find:

1) The total number of movies

2) The maximum rating of movies

3) The number of movies that have maximum rating

4) The movies with ratings 1 and 2

5) The list of years and number of movies released each year

6) The number of movies that have a runtime of two hours

Steps to follow:

1. Create a table in RDBMS (MySql, MSsql, Oracle) and load the data in table (usign bulk insert).

2. Ingest the data using Sqoop to HDFS locaton

3. Create a Hive External Table

4. Read External Table using PySpark Session

5. Perform the Spark POC query and Save the file in Parquet data formate

6. After save the file again create a External table in hive and load the parquet data. 7. Optional Create a BI report using (Tablue, PowerBI and Kibana)

Note I'm shareing the bulk inset query for your refernce (MSSQL)

create table customers


Customer_id int, Cust_name varchar(100), City varchar(20),

Grade nvarchar(10), Salesman_id int



INSERT customers

FROM 'C:\Users\Ramkrishna\Desktop\SQL\MYSQL\Qerry\[login to view URL]' --location with filename WITH






Data File you will require for above can be downloaded from Myeclass in the Project Section named as:

[login to view URL]

Навыки: MySQL, SQL, Oracle, Hadoop, Spark

О клиенте:
( 0 отзыв(-а, -ов) ) B 5 Block, India

ID проекта: #35296358

9 фрилансеров(-а) готовы выполнить эту работу в среднем за ₹14044


Hello... I am interested

₹57000 INR за 7 дней(-я)
(128 отзывов(-а))

Hi, I'm an experienced data scientist with over 7 years of active development experience building Machine learning and AI systems using multiple tools and technologies including R, Python and PySpark. I hold a Masters Больше

₹12000 INR за 7 дней(-я)
(50 отзывов(-а))

Hello, I have read your project description Spark and hive queries. I am an expert in database systems and I have developed many database systems including but not limited to a database system for a School, County Bur Больше

₹9900 INR за 5 дней(-я)
(32 отзывов(-а))

PYSPARK EXPERT HERE!!! "Satisfy the client with my ability and passion" This is my slogan here. I hope you will be interested in me. Thanks.

₹10000 INR за 3 дней(-я)
(1 отзыв)

I can gurantee for good product. Hey I'm interested in your project, I have read out your requirements. We have 5+ years experience .We have worked on similar Projects to What You are Looking for. We Have A Variety of Больше

₹4500 INR за 7 дней(-я)
(0 отзывов(-а))

*Extensive experience in working with structured data using HiveOL, Join operations, optimizing Hive Queries * Experience in importing and exporting data using Sqoop from HDFS to Relational Database.

₹4000 INR за 7 дней(-я)
(0 отзывов(-а))

Hi I can Analyze and Visualize this data as per your Requirement. Also Provide you description of each step that will help you to understand the project.

₹6000 INR за 7 дней(-я)
(0 отзывов(-а))

I have 4+ years experience as Data Engineer. I have hands on experience on python, SQL, Hadoop, AWS services, and visualization tool as an power BI. I worked on different database and files like SQL, SAP HANA, parquet, Больше

₹11000 INR за 7 дней(-я)
(0 отзывов(-а))

I can do the work with the steps you mentioned and I can create a script python and spark and this will be very good and you can run it on the data at any time just change the location of the data file

₹12000 INR за 7 дней(-я)
(0 отзывов(-а))