Predict Soccer Match Total Goals With Machine Learning

  • Status: Closed
  • Prize: $20
  • Entries Received: 13
  • Winner: Gozienkwocha

Contest Brief

Soccer is the world's most popular sport.
**This contest will test whether you're among the best Machine Learning engineers on Freelancer.com**
Your challenge is to use ML & Deep Learning to build a model that can best classify the TOTAL number of goals scored in a soccer match given publicly available data.

The data provided includes details on a team's recent performance, probability of winning, match location, date, recent performance against the opposing team & other recent info. In all, there are close to 100 input variables provided.
You can find a definition of each input variable here: http://bit.ly/Column_definitions

For each soccer match/ fixture:
If the total goals scored by both teams is greater than 2.5, its outcome is recorded as Over.
If the total goals scored is less than 2.5, its outcome is recorded as Under.
This data is recorded under each dataset’s last column called “outcome”.

A leaderboard of top 10 performing models will be posted daily on the contest's comments section.
The competition will run for 8 days.
A payout has been guaranteed & will be provided to the winner of the contest.

The data & other material:
There are 3 datasets provided (found in “Data CSVs.zip” zip file).
1. training_data.csv - This contains 100 000 matches & their outcomes that you will use to train your model(s).
2. validation_data.csv - This contains 50 000 matches & their outcomes that you will use to test/validate your model(s) performance.
3. testing_data.csv - This contains 500 matches (without outcomes) that you will need to predict with your model & submit their results as a list of 0 or 1 as part of your submission.
When predicting, if you predict less than 2.5 total goals, you will need to label that outcome as 0, if you predict more than 2.5 total goals, label that as 1.
4. A helper_script.ipynb python notebook has been provided. This script contains prebuilt functions that will help with data cleaning, encoding, imputing & model training. You may use this script to transform the data & train your model.

Performance Criteria:
- The F1 Score (https://en.wikipedia.org/wiki/F1_score) will be used to determine your model's performance against other contestants.
- This F1 Score will be based on the predictions you make for the data in point 3 above (testing_data.csv).
For the leaderboard, F1 Scores will be rounded off to 3 decimal places.
- Should there be a tie, all of the top positioned contestants will each get the guaranteed payout.
- ** You may only post 2 submissions per day **

Programming Language:
1. You are encouraged to use Python for model construction.
2. You may use any classification technique as you see fit (Deep Learning, Machine Learning)

Submission:
Your submission must contain 3 things.
1. A list of your model's predictions for the first 250 matches on the testing_data.csv file. This must be posted as a comment under your submission. The comment must be of the form: First 250 entries: [0,1,0,1,0,0...,0,0]
2. A list of your model's predictions for the second 250 matches on the testing_data.csv file. This must be posted as a comment under your submission. The comment must be of the form: Second 250 entries: [0,1,0,1,0,0...,0,0]
3. A picture of your validation data F1 Score (calculated on 'validation_data.csv').

You are welcome to post any questions that you have on the contest's chat board.

Are you among the best of the best in Machine Learning?
PROVE IT by winning this contest.

Recommended Skills

Employer Feedback

“Chigozie's solution was cutting edge, easy to understand & showed deep understanding of the problem. I would highly recommend him any Data Science/ Machine learning tasks & plan to work with him in future.”

Profile image LuyandaD, South Africa.

Public Clarification Board

  • Gozienkwocha
    Gozienkwocha
    • 1 month ago

    Many thanks to the contest holder. It was really an enjoying time working on this project.

    • 1 month ago
    1. Gitesh98
      Gitesh98
      • 1 month ago

      Can you please share your file?

      • 1 month ago
    2. Gozienkwocha
      Gozienkwocha
      • 1 month ago

      Hi Gitesh, I would have shared my file but the contest holder hasn't given me the permission to do so. Have left a message for him in that regards and I'm yet to receive any response. I needed his permission because during the handover I signed an agreement to will all code rights to him, so if I share this file, I may be violating the agreement, hence I seek for his go-ahead before doing so. Once I get his green lights, I'll do well to share it with you. Thank you for your understanding.

      • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    I have verified the results in entry #15 .

    Without disclosing the contestant's exact methods, their solution involved the following:
    1. Data cleaning & removing of duplicates.
    2. Handling missing values & feature engineering on columns with dates.
    3. Using the median values to fill in missing data.
    3. Using 2 ensemble modules to model the outcome & blending the predictions from these to arrive at a final outcome.

    This entry has been awarded the contest's prize

    • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    The contest has now closed.
    A huge thank you to all contestants for participating.

    I will now evaluate the best performing entries & award the prize.

    • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    Leaderboard 12/03/2021:

    1. Entry #15 . F1_Score: 66.161
    2. Entry #11 . F1_Score: 66.048
    3. Entry #5 . F1_Score: 65.144
    4. Entry #7 . F1_Score: 64.588
    5. Entry #8 . F1_Score: 64.437
    6. Entry #3 . F1_Score: 63.889
    7. Entry #2 . F1_Score: 63.887
    8. Entry #10 . F1_Score: 63.809
    9. Entry #9 . F1_Score: 61.504
    10. Entry #4 . F1_Score: 60.429

    • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    Leaderboard 11/03/2021:

    1. Entry #11 . F1_Score: 66.048
    2. Entry #5 . F1_Score: 65.144
    3. Entry #7 . F1_Score: 64.588
    4. Entry #8 . F1_Score: 64.437
    5. Entry #3 . F1_Score: 63.889
    6. Entry #2 . F1_Score: 63.887
    7. Entry #10 . F1_Score: 63.809
    8. Entry #9 . F1_Score: 61.504
    9. Entry #4 . F1_Score: 60.429
    10.

    • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    Attention to all contestants

    1. I have now updated the leaderboard in a comment below.
    2. The contest will close in 18 hours, you may still submit entries until the contest has closed.
    3. Once the contest has closed, new entries will be evaluated & the leaderboard will be updated.
    4. The contestants with the top 3 entries will be asked in private chat to submit the notebooks used to generate their predictions.
    5. These notebooks will be used to reproduce results & verify that a winning entry has not been faked.
    6. The list of correct outcomes will be shared with you so that you may verify the results calculated for your own entry & that of others.

    7. Once a winning entry has been verified, the prize amount will be awarded.

    Thank you for your participation so far :)

    • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    Leaderboard 10/03/2021:

    1. Entry #5 . F1_Score: 65.144
    2. Entry #7 . F1_Score: 64.588
    3. Entry #8 . F1_Score: 64.437
    4. Entry #3 . F1_Score: 63.889
    5. Entry #2 . F1_Score: 63.887
    6. Entry #10 . F1_Score: 63.809
    7. Entry #9 . F1_Score: 61.504
    8. Entry #4 . F1_Score: 60.429
    9.
    10.

    • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    Leaderboard 09/03/2021:

    1. Entry #5 . F1_Score: 65.144
    2. Entry #7 . F1_Score: 64.588
    3. Entry #8 . F1_Score: 64.437
    4. Entry #3 . F1_Score: 63.889
    5. Entry #2 . F1_Score: 63.887
    6. Entry #9 . F1_Score: 61.504
    7. Entry #4 . F1_Score: 60.429
    8.
    9.
    10.

    • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    Leaderboard 08/03/2021:

    1. Entry #5 . F1_Score: 65.144
    2. Entry #3 . F1_Score: 63.889
    3. Entry #2 . F1_Score: 63.887
    4. Entry #4 . F1_Score: 60.429
    5.
    6.
    7.
    8.
    9.
    10.

    • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    Leaderboard 07/03/2021:

    1. Entry #3 . F1_Score: 63.889
    2. Entry #2 . F1_Score: 63.887
    3. Entry #4 . F1_Score: 60.429
    4.
    5.
    6.
    7.
    8.
    9.
    10.

    • 1 month ago
  • rawatpankaj9876
    rawatpankaj9876
    • 1 month ago

    In goal_home and goal_away column
    "negative" sign indicate what

    • 1 month ago
    1. LuyandaD
      Contest Holder
      • 1 month ago

      It expresses the maximum predicted goals that each team is expected to get.
      e.g. -3.5 means this team is expected to score 3 goals or less.

      • 1 month ago
  • Gozienkwocha
    Gozienkwocha
    • 1 month ago

    Hello. I would like to ask the contest holder if the values in the match winner have any significant meaning. Like if '1 N' means home team won, 'N 2' if away team won and so on. Or do they signify the score outcomes of the match? Like "1 N" mean the match eded in 1-0 in favour of home team, 'N 2' mean match ended in 0-2 in favour of home team and 1 and 2 mean that there was a draw. Thank you.

    • 1 month ago
    1. Gozienkwocha
      Gozienkwocha
      • 1 month ago

      Does it also imply that 1 and 2 mean outright win for home and away, respectively? I mean. 1 means the public are predicting that the home side will win and 2, the away side

      • 1 month ago
    2. LuyandaD
      Contest Holder
      • 1 month ago

      yes, 1 means home team is predicted to win.
      2 means away team is expected to wean outright.

      • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    Leaderboard 06/03/2021:

    1. Entry #3 . F1_Score: 63.889
    2. Entry #2 . F1_Score: 63.887
    3.
    4.
    5.
    6.
    7.
    8.
    9.
    10.

    • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    Edit to instructions:
    Please ignore this instruction: "When predicting, if you predict less than 2.5 total goals, you will need to label that outcome as 0, if you predict more than 2.5 total goals, label that as 1."

    If you use encoding on the outcome column, the value of Under will become 1 & the value of Over will become 0.
    Your entries will be graded according to this rule going forward.

    • 1 month ago
  • dataexpert18
    dataexpert18
    • 1 month ago

    #increaseprize #increaseprize #increaseprize #increaseprize

    • 1 month ago
    1. LuyandaD
      Contest Holder
      • 1 month ago

      Hi, which number is your entry?

      • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    Leaderboard 05/03/2021:

    1. Entry #2 . F1_Score: 63.89
    2.
    3.
    4.
    5.
    6.
    7.
    8.
    9.
    10.

    • 1 month ago
  • LuyandaD
    Contest Holder
    • 1 month ago

    Hi there.
    I am the contest's holder.
    You are welcome to post any questions here.

    I will be updating the scoreboard once every day.
    I will also post the f1 scores of each entry in its comments section.
    Reminder, only 2 entries per contestant per day.

    • 1 month ago

Show more comments

How to get started with contests

  • Post your contest

    Post Your Contest Quick and easy

  • Get tons of entries

    Get Tons of Entries From around the world

  • Award the best entry

    Award the best entry Download the files - Easy!

Post a Contest Now or Join us Today!