Please read all before bidding.
I need a python script to clean data from an excel file then save the clean data to another excel file.
I wrote the structure of the script, what I expect exactly and how it should be written. I also have code for reading and saving file. What is missing is the data cleaning part.
#fill Regex in config file based on Type (manually)
#read confFile into config_DF
#save dataframe to modified_DF
# drop rows based on drop_rows in job file
#check the column names are exactly as in the conf file. order not important
#remove trailing and leading spaces in values
#replace cells with wrong format by empty string (check Regex column in configuration file)
# replace cells with wrong values (below min, over max, is zero ...) by empty string
#Fill missing values based on rules in conf df "Missing Value Fill Method" [using pandas, numpy and scipy only]
#implement methods for 'ma' moving average with previous and next available values from same column[first value and last value missing will be equal to closest value available], and 'lr' linear regression using sklearn and 'knn' using fancyimpute ([login to view URL])
#apply sigma filtering on all columns based in multiplier in conf file. Should generate a df of bool, inSigmaDF, where each value is True if inside the +- sigma multiplier for each column (after calculating mean and std), False otherwise. Then delete all rows in modified_DF that contain at least one False in the inSigmaDF
#save modified_DF to outputFile
#each function above should return a dataframe or zero on success and a non-zero code 1,2,3 on failure/exception.
if it works with pipeline, modified_DF should be verified after each call and in case it is an int, return the int
#use try/except. this function should not throw exception to calling function but returning a non-zero code 1,2,3 ....for different errors
#USE 'apply' and 'lamda', never loops, to perform on columns. data['date'] = data['date'].apply(lambda x: somefct(x))
# when calling functions, use pipes (pdpipe, [login to view URL]) in the form:
pipeline = [login to view URL](modified_DF)
pipeline+= [login to view URL](modified_DF, config_DF)
pipeline+=[login to view URL](modified_DF, config_DF)
outDF = pipeline(df)
mainly the job consist of writing one main function (clean) and many small functions to clean data
Other details and info required will be discussed as needed
All code should be documented (functions should have comments explain all variables and return values, and main part of the code).
Python 3.6+ should be used
Create an env to run the code in it
All python code should have [login to view URL] using pipreqs
Needed skills: Python, pandas, numpy, SciPy, sklearn
Extra skills: pdpipe, fancyimpute
9 фрилансеров(-а) в среднем готовы выполнить эту работу за $46
Hi potential client, I have read and understand your job description. I feel excited to working with you. do contact me if your interest with my service.
Hello, I have been working with python and pandas for a long time now and have delivered various projects on the same. I believe that I can successfully deliver your project.