Enhance Bash Script: CSV parser

IN PROGRESS
Bids
11
Avg Bid (USD)
$72
Project Budget (USD)
$30 - $250

Project Description:
I have a bash script that accepts an input csv file, allows for several command line options such as delimiter and rolls up details on the level of duplication of fields on each column of data into a report. It can handle files with millions of rows by pulling one column of data into memory at a time, writing a temporary file of the most duplicated fields (e.g it does something similar to this for each field in the input file: cut -d\, -f1 filename.csv | sort | uniq -ci | sort -nr | head -20)

The script requires some modifications and enhancements including:
* Better parsing of csv files. It handles files with rows like: foo,bar,baz or "foo","bar","baz". But it has issues parsing "foo",123,"bar".
* Certain fields require special parsing. e.g. I would like the option to treat www.foo.com and http://foo.com and all it's variants as the same URL and therefore get counted as a dupe.

There are a few other tweaks of similar nature that I would like to get incorporated into the script which we can discuss. Looking forward to hearing from a bash guru.

Thanks.

Skills required:
Shell Script, UNIX
Hire galaxywatcher
Project posted by:
galaxywatcher United States
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the project creator or as one of the bidders to view bids.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.