Some Spark and hive queries
Бюджет ₹1500-12500 INR
- Freelancer
- Работа
- MySQL
- Some Spark and hive queries
Job Description:
Spark Use Case (Movie Review Analysis)
IMBD is an online database of movie-related information. IMBD users rate the movies and provide reviews.
They rate the movies on a scale of 1 to 5; 1 being the worst and 5 being the best. The dataset also has additional
Information, such as the release year of the movie. You have to analyze the data collected and answer the following questions.
You need to find:
1) The total number of movies
2) The maximum rating of movies
3) The number of movies that have maximum rating
4) The movies with ratings 1 and 2
5) The list of years and number of movies released each year
6) The number of movies that have a runtime of two hours
Steps to follow:
1. Create a table in RDBMS (MySql, MSsql, Oracle) and load the data in table (usign bulk insert).
2. Ingest the data using Sqoop to HDFS locaton
3. Create a Hive External Table
4. Read External Table using PySpark Session
5. Perform the Spark POC query and Save the file in Parquet data formate
6. After save the file again create a External table in hive and load the parquet data. 7. Optional Create a BI report using (Tablue, PowerBI and Kibana)
Note I'm shareing the bulk inset query for your refernce (MSSQL)
create table customers
(
Customer_id int, Cust_name varchar(100), City varchar(20),
Grade nvarchar(10), Salesman_id int
)
BULK
INSERT customers
FROM 'C:\Users\Ramkrishna\Desktop\SQL\MYSQL\Qerry\[login to view URL]' --location with filename WITH
(
FIELDTERMINATOR = ',',
ROWTERMINATOR = '\n'
)
GO
Data File you will require for above can be downloaded from Myeclass in the Project Section named as:
[login to view URL]
9 фрилансеров(-а) готовы выполнить эту работу в среднем за ₹14044
Hello, I have read your project description Spark and hive queries. I am an expert in database systems and I have developed many database systems including but not limited to a database system for a School, County Bur Больше
PYSPARK EXPERT HERE!!! "Satisfy the client with my ability and passion" This is my slogan here. I hope you will be interested in me. Thanks.
*Extensive experience in working with structured data using HiveOL, Join operations, optimizing Hive Queries * Experience in importing and exporting data using Sqoop from HDFS to Relational Database.
Hi I can Analyze and Visualize this data as per your Requirement. Also Provide you description of each step that will help you to understand the project.
I have 4+ years experience as Data Engineer. I have hands on experience on python, SQL, Hadoop, AWS services, and visualization tool as an power BI. I worked on different database and files like SQL, SAP HANA, parquet, Больше
I can do the work with the steps you mentioned and I can create a script python and spark and this will be very good and you can run it on the data at any time just change the location of the data file