Completed

Perl: Email address cleaner and email domain extractor

This project was successfully completed by idleswell for $185 USD in 5 days.

Get free quotes for a project like this
Employer working
Completed by:
Skills Required
Project Budget
$30-$250 USD
Completed In
5 days
Total Bids
12
Project Description

Perl and regex required. The task is to write a parser that accepts a csv file and a command line option to # the column with the email address and:
* Create a new column with a clean copy of the email address. Any cruft (illegal characters, spaces, invalid format, etc) will be removed. Every reasonable effort will be made to try to salvage the email address from a field that may have additional stuff around it.
* A column with the email domain will also be created.
* A column with an error code will be created. The error code column will essentially indicate that a) Did not require any modification. It passed. b) Required modification, but has been standardized and looks good now. c) Something there, but it was hopeless. d) Nothing there. Null record.

CPAN modules will be evaluated and recommended to help with this process. A series of post development error analysis and correction steps will be made until the process outputs clean data the vast majority of the time. Or classifies the email as invalid.

Certain rules will be made like converting gmail..com to [url removed, login to view], converting '@@' to '@' and other corrections that the data will show us.

For the record: this project has absolutely nothing to do with spam and everything to do with building tools to help my customers keep their data clean.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online