Estimated Job duration: 1 - 2 weeks
the project consists in finding probabilities in a horse races data set ,
by using the R Xgboost( or catboost or adaboost, if xgboost fails) algorithm.
Worker must have good knowledge of R xxboost algorithms.
I only need the raw code, without any user interface.
to be provided :
- one piece of R code providing a model created from a training set
and adding to this set several columns of results:
- probability provided by the model
- normalized probability, relatively to each race,and the deduced odds
- the ranking relatively to each race.
- VERY IMPORTANT :the created model will then be applied to the test set in order to check that
the obtained results are close to the training set ones (avoiding overfitting)
with the mlogit from R, I could get 24% success(*) on the 8000 first races and 23% on the remaining races
so, test % success should be superior for the new algorithm to be worth using
- another piece of R code using the previous saved model on unseen data file
attached is a sample of the data set. on request I provide the whole data set
(*) success means ratio of (nb of actual first horse in races / nb of horses with highest found predicted probability in each race )
29 freelancers are bidding on average $461 for this job
Dear customer, I have delivered 100+ projects in the field of data analysis during the last few years. I would love to start working together on this awesome project. Enjoy your day, Joerg.
Hello i am R language and statistics expert and i am very interested in your task i read your job description and i think i can help you i will do my best . thank you
Hello, dear sir. I am expert in matlab, R, math, physics. I have a lot of experiences in many R, statistics, probability projects. I will do your projects well. Please contact me. Thanks.