
mini project HDFS-HIVE
€30-250 EUR
Оплачивается при доставке
The rendering will be in the form of a report with the list of commands and screenshots of commands, results and NiFi development + export of the nfi template
Work to do:
HDFS:
In HDFS, create in HDFS command lines (hdfs dfs -??????) the following tree structure /data/common/raw/DATABASE_M1/ETUDIANT_M1
In HDFS command lines, Create a file [login to view URL] in this directory (having 3 columns firstName, lastName,email, with your data)
Display HDFS command line contents of directory
Display the HDFS command line contents of the file
HIVE:
Create a database DATABASE_M1
With HQL, create a database DATABASE_M2
With HQL, create a hive table ETUDIANT_M1 in the DATABASE_M1 database pointing to the data/common/raw/DATABASE_M1/ETUDIANT_M1 directory
With HQL, Display the contents of the STUDENT_M1 table
With HQL, Create an ETUDIANT_M1_PART table in the DATABASE_M1 database partitioned on the DateRecep field (in year month, day, hour, minute format: YYYYMMDDHHmm) and pointing to the /common/raw/DATABASE_M1/ETUDIANT_M1_PART directory
Create an external table STUDENT_M2 in the DATABASE_M2 database
NIFI :
Expose a NIFI API to receive external file data (use the 2 HandleHttpRequest and HandleHttpResponse)
Send, 10 times, the data [login to view URL] (attached to course) to nifi api.
Convert data received with CSV format to avro format
Drop the data in the directory (use the processesor putHdfs) HDFS /common/raw/DATABASE_M1/ETUDIANT_M1_PART/DateRecep=202210ddHHmm (this value must be generated dynamically by nifi, (use an attribute of the flowfile with a date value in the requested format ex: Variable_DateRecep with value DateRecep=${now():format('yyyyMMddHHmm')}
Do a select on the table, what do you notice?
Run the following sql command Msck repair table DATABASE_M1.ETUDIANT_M1_PART;
Copy the data (via an hql query executed by NIFI) from the ETUDIANT_M1_PART table to the ETUDIANT_M2 table so as to keep only the latest version of the file sent (used the OVERWRITE keyword and in the where clause of the select use the value of the last score.
ID проекта: #36234284
О проекте
7 фрилансеров(-а) готовы выполнить эту работу в среднем за €171
Greetings I'm a data engineer with extensive experience in hadoop hdfs , Hive, and big data solutions. I'm confident that I can deliver high-quality work within your budget and timeframe. Let's discuss further. Mounir
Hi, how are you? I go through the description and read it carefully, I know exactly what you are looking for. I have 5+ years’ experience in these skills Big Data Sales, Apache Kafka, Hadoop, Spark and Hive. I have so Больше
Hi, I have already worked a project very similar to yours and I believe I can make this work in 7 days maximum due to my knowledge of the big data ecosystem. We can talk in details if this interests you.
I am a 6+ years experienced data engineer. I can do the development for you in 1 week with professionalism.
Hi, I can do this effectively as i have expertise in Hadoop, hive , nifi... Plz visit my profile for more info. Thanks
Hi there , I have been working in big data Hadoop projects &. I excel at Hadoop , Hive . Let me know if I can help you on this . Thanks