Завершено

Python script for data cleaning using Pandas DataFrame

Please read all before bidding.

I need a python script to clean data from an excel file then save the clean data to another excel file.

I wrote the structure of the script, what I expect exactly and how it should be written. I also have code for reading and saving file. What is missing is the data cleaning part.

Detailed request:

#fill Regex in config file based on Type (manually)

#read confFile into config_DF

#save dataframe to modified_DF

# drop rows based on drop_rows in job file

#check the column names are exactly as in the conf file. order not important

#remove trailing and leading spaces in values

#replace cells with wrong format by empty string (check Regex column in configuration file)

# replace cells with wrong values (below min, over max, is zero ...) by empty string

#Fill missing values based on rules in conf df "Missing Value Fill Method" [using pandas, numpy and scipy only]

#implement methods for 'ma' moving average with previous and next available values from same column[first value and last value missing will be equal to closest value available], and 'lr' linear regression using sklearn and 'knn' using fancyimpute ([login to view URL])

#apply sigma filtering on all columns based in multiplier in conf file. Should generate a df of bool, inSigmaDF, where each value is True if inside the +- sigma multiplier for each column (after calculating mean and std), False otherwise. Then delete all rows in modified_DF that contain at least one False in the inSigmaDF

#save modified_DF to outputFile

#each function above should return a dataframe or zero on success and a non-zero code 1,2,3 on failure/exception.

if it works with pipeline, modified_DF should be verified after each call and in case it is an int, return the int

#use try/except. this function should not throw exception to calling function but returning a non-zero code 1,2,3 ....for different errors

#USE 'apply' and 'lamda', never loops, to perform on columns. data['date'] = data['date'].apply(lambda x: somefct(x))

# when calling functions, use pipes (pdpipe, [login to view URL]) in the form:

pipeline = [login to view URL](modified_DF)

pipeline+= [login to view URL](modified_DF, config_DF)

pipeline+=[login to view URL](modified_DF, config_DF)

...

outDF = pipeline(df)

mainly the job consist of writing one main function (clean) and many small functions to clean data

Notes:

Other details and info required will be discussed as needed

All code should be documented (functions should have comments explain all variables and return values, and main part of the code).

Notes

Python 3.6+ should be used

Create an env to run the code in it

All python code should have [login to view URL] using pipreqs

Needed skills: Python, pandas, numpy, SciPy, sklearn

Extra skills: pdpipe, fancyimpute

Квалификация: Python, Обработка данных, NumPy

Показать больше read csv file using python script, python script extract data web page, extracting data webpages using python, python script crape data, data web using visual basic script, python script read url data, python script extract data, python script read data text file, python script extract data website, python script data extraction csv, data cleaning using vba, python script extract web data, python script data website, python script data site, python script modify file data, python script download historical data yahoo, python script rs232 data, python send data php script, import text data excel using script, python script send email using imap

О работодателе:
( 3 отзыв(-а, -ов) ) Beirut, Lebanon

ID проекта: #22637191

Поручен:

Misha100top

Hi, Mr Alex [login to view URL] you are going well! I checked your project carefully. I have rich experience with python develop python is one of my top skill. If you give me all data for the project, I will start working immediate Больше

$50 USD за 1 день
(0 отзывов(-а))
0.0

9 фрилансеров(-а) в среднем готовы выполнить эту работу за $46

michaels225

It's a piece of cake for me Hi, sir Thank you for your job posting. I have enough experience with Python, VBA, Data Processing. So, I am very confident to satisfy you absolutely. Let's achieve success together. Thank Больше

$50 USD за 1 день
(16 отзывов(-а))
4.6
akshaynagpal1995

Data scientist I have a vast experience in an array of fields and I accept new challenges. I am available for hire to work on projects. Statistics Machine Learning Deep Learning Computer Vision Natural Language Proce Больше

$45 USD за 1 день
(4 отзывов(-а))
2.3
inhe121

Dear sir! ⭐I am very interested in your project and I am exciting. ⭐I read your project details carefully and I though that I am the best fit developer for your project. ⭐I have rich experience about your project, so I Больше

$60 USD за 1 день
(1 отзыв)
0.8
akiramatsui0305

hi Client. i have read and understood your request.i am interested in your project and can do it very well. I have wide experience in Python development and i am looking forward to contact me, please. i want to consult Больше

$45 USD за 7 дней(-я)
(1 отзыв)
0.8
kamaluddin97

Hi potential client, I have read and understand your job description. I feel excited to working with you. do contact me if your interest with my service.

$45 USD за 7 дней(-я)
(0 отзывов(-а))
0.0
reachteja2

Hi There, I am having 6+ years experience in IT in data science, python and machine learning concepts. I have experience in Pandas, Numpy libraries for Data Analytics. I have experience in Matplotlib, Seaborn, Plotly Больше

$50 USD за 7 дней(-я)
(0 отзывов(-а))
0.0
ahshanul

Hi i can do this.

$35 USD за 8 дней(-я)
(0 отзывов(-а))
0.0
siddhartha1995

Hello, I have been working with python and pandas for a long time now and have delivered various projects on the same. I believe that I can successfully deliver your project.

$35 USD за 1 день
(0 отзывов(-а))
0.0