You have chosen to sponsor your bid up to a maximum amount of .
Perl and regex required. The task is to write a parser that accepts a csv file and a command line option to # the column with the email address and:
* Create a new column with a clean copy of the email address. Any cruft (illegal characters, spaces, invalid format, etc) will be removed. Every reasonable effort will be made to try to salvage the email address from a field that may have additional stuff around it.
* A column with the email domain will also be created.
* A column with an error code will be created. The error code column will essentially indicate that a) Did not require any modification. It passed. b) Required modification, but has been standardized and looks good now. c) Something there, but it was hopeless. d) Nothing there. Null record.
CPAN modules will be evaluated and recommended to help with this process. A series of post development error analysis and correction steps will be made until the process outputs clean data the vast majority of the time. Or classifies the email as invalid.
Certain rules will be made like converting gmail..com to gmail.com, converting '@@' to '@' and other corrections that the data will show us.
For the record: this project has absolutely nothing to do with spam and everything to do with building tools to help my customers keep their data clean.