data analysis (MAX $30) WILL NOT FUND MORE
Paid on delivery
Hi, I have a big dataset of several years which consists of crop variables (for e.g A, B, C and D) each of these individual target variables needs to be predicted from a set of 30 independent variables.
I want to implement both statistical (for e.g huber regression, multiple linear regression, etc.. ) and machine learning models (random forest regression, KNN-regressor, XGBoost, etc) to predict the target variables. I will hire someone who has extensive knowledge in implementing machine learning codes in python environment. I will let you choose 4 best statistical models and 4 best machine learning models.
You will also be required to:
1. identify the best test split size for each of the target variables (is it 90:10, 80:20, 70:30, 60:40, 50:50?). For example you identified that 70:30 works best)
2. use this split size to train (70%) the model and test (30%)
3. perform cross-validation
4. hyperparameter tuning
5. minimize overfitting and other issues
6. Generate a table like table 4 in [login to view URL] showing the performance of all statistical and machine learning models using evaluation metrics (R2, RMSE, MAPE, MAE or other metrics)
7. A graph of predicted versus observed like graph of figure 6 in [login to view URL]
8. Finally a predictor importance figure like figure 7 in [login to view URL]
Deadline: 1 month from now (5 July 2023)
Useful reference articles are:
1. [login to view URL]
2. [login to view URL]
PLEASE NOTE THE PYTHON CODE IS READY AND YOU ONLY HAVE TO RUN AND GENERATE THE GRAPHS, TABLES (EXPORT DATA TO WORD AND BUILD SIMPLE TABLES). IMPLEMENT A CODE SIMPLE STATISTICAL ANALYSIS COMPARING TREATMENTS ETC.. DESCRIPTIVE STATISTICS
CONFIDENTIALITY SHOULD BE MAINTAINED AND DATASET SHOULD BE DELETED AFTER COMPLETION OF PROJECT NDA AND IP AGREEMENT WILL BE SIGNED
Project ID: #36685321
About the project
12 freelancers are bidding on average $25 for this job
Data Scientist: Expert in statistical & ML models. Implementing Python code, evaluating performance, generating tables/graphs. Deadline: 5 July 2023. Confidentiality assured.