data cleansing using Java code

This contest is currently locked as the contest holder did not choose a winning entry within 14 days of the contest closing.

data cleansing using Java code

Prize (USD)

Contest Brief

We have a system that uses Machine Learning (WEKA) to cleanse missing and incorrect data.

The goal of this project is to improve and expand the Java program which cleans up Street Address, Latitude, and Longitude. Currently only Street Address is implemented and it leaves 852 items uncleansed. Latitude and Longitude have to be added and overall results improved. Latitude and Longitude present in the data file is 99% correct; goal of cleansing program is to select canonical Lat/Lon for each address and to detect and fix outliers.

This solution is for a commercial application. Using external services like Google Maps in the solution is not permitted. The current version here already outperforms Google Maps in accuracy for correction of street names (for example, "South C Ave" is incorrectly classified by Google as "S AVE E", when correct answer is "S AVE C"). So please do not insist about using Google Maps or other services, they are inadequate.

You may use external services to compare your results, with some analysis.

Contest Entries will be judged on three measures: 1) Correctness of results, 2) Elegance of Solution and Code, 3) Memory and Run-Time Requirements.

The contest duration is 14 days. I will monitor the project and post clarifications when asked.

ATTACHMENTS: Java program and Database dump (if you cannot open Zip file, try using 7-Zip). You need to download Weka [url removed, login to view] jar file yourself.

You may remove WEKA and use a different open-source library if you wish.

Recommended Skills:

Post a Contest like this

Previous Poll Results

Submit Your Entry

Bonus XP Boost: +14 XP

Drag and Drop multiple files here.

Describe your entry here (optional)

1000 characters
NEW! - Set a price for your entry and if you don't win, you will have a second chance for the contest holder to buy your entry!
Upgrade your entry
  • Seal your entry to ensure your idea is unique. Only you and the contest holder will be able to view your sealed entry.

    $0.50 USD
  • Highlight your entry to make it visually stand out from the rest!

    $0.50 USD
Total: $ 0.00 USD
This entry is entirely my own original work and I agree to the Freelancer Terms and Conditions.
Submit Freelancer Loading...

Please ensure the following:

  • You've read the contest brief
  • You've read feedback provided by the contest holder
  • You've looked at other entries and read the message board

Supported file types:

Public Clarification Board