I need a tool which will allow me to search through about 4000 csv files each one being about 30 mb (total is about 100 gigabytes of data) for keywords. I would like to be able to use wildcards as well. I need a visual GUI as well. also a drag and drop interface. I am attaching a sample csv file. The search should be FAST. I want the search to be fast enough to at least be able to search through all the files for a single keyword in 90 minutes.
I want to have searches like
("red barron" or "green curry" or ("hot potato" and "cold soup")) and "sour dough" and NOT "french fried potatoes"
I would like to have wild cards for prefixes, suffixes, and within the words or phrases.
Please tell me the wild card functionality you can provide in your PM bid.
Notes: If I search for the word 'wine' I don't want a word like 'swine' unless specified by some sort of wild card.
THIS TOOL MUST BE ABLE TO SEARCH 2-byte languages like Japanese, Korean, Chinese. NOTICE that in the sample input file (attached) there are both two byte and single byte text in there. This must be handled properly. It can have an option box to say whether the character set is Western or not if that will speed up Western character searches.
Of course, there should be an option for giving a name and location to the resulting output file.
THE OUTPUT FILE MUST BE (see attached sample [url removed, login to view] file):
return the same fields as the input file
Date, Username,Text, location fields must all be quoted by default but there should be an option for whether a field will be quoted or not for any of the fields.
All commas except field delimters must be removed by default but there should be an option for each field to leave them in.
The key to whether the output file is in the correct format is whether it can be opened in Excel, saved by Excel, and then reopened and still have the correct number of fields. Also, Excel should read the date field as a date.
There should be some sort of progress bar to show how many files have been searched.
In the attached file there are 7 fields. The main field to be searched is field number 4. Other fields should be searchable as well however. Any field in the file should be searchable by selecting that field number (the default should be column 4, the "Text" field.)
If I have a new format for the csv files with additional columns then the tool should be flexible enough so they should also be searchable.
There should also be a Username, Location, Tweet ID range, Date Range search boxes on the tool.