Perl and regex required. The task is to write a parser that accepts a csv file and a command line option to # the column with the email address and:
* Create a new column with a clean copy of the email address. Any cruft (illegal characters, spaces, invalid format, etc) will be removed. Every reasonable effort will be made to try to salvage the email address from a field that may have additional stuff around it.
* A column with the email domain will also be created.
* A column with an error code will be created. The error code column will essentially indicate that a) Did not require any modification. It passed. b) Required modification, but has been standardized and looks good now. c) Something there, but it was hopeless. d) Nothing there. Null record.
CPAN modules will be evaluated and recommended to help with this process. A series of post development error analysis and correction steps will be made until the process outputs clean data the vast majority of the time. Or classifies the email as invalid.
Certain rules will be made like converting gmail..com to [url removed, login to view], converting '@@' to '@' and other corrections that the data will show us.
For the record: this project has absolutely nothing to do with spam and everything to do with building tools to help my customers keep their data clean.
12 freelancers are bidding on average $113 for this job
I have made hundreds and hundreds of Perl scripts in 15+ years and among them a few with similar requirements to extract and correct E-mail addresses so this job is nothing new to me.
Seasoned web scraper. I worked on many similar projects, I have big experience in data mining projects. I can finish this task in short time, with the best quality.
Project Description 1. Write a perl script that would take 3 command line options: --inputfile, --outputfile, and --rawmailcolumn 2. Syntax would look like: [url removed, login to view] -i [url removed, login to view] -o [url removed, login to view] -r 4 or just More