Search and Replace text in TXT documents - 1

IN PROGRESS
Bids
31
Avg Bid (USD)
$127
Project Budget (USD)
$30 - $250

Project Description:
Text Document Search and Replace project

The purpose of this project is to create software that will read a text document,and eliminate all data except what is required, and put the data elements in a specific order.

You must have the ability to search in a folder, or folders, subfolders, sub-sub folder etc.

You must sort your documents based on which fields you were able to match, and any possible combination of matches:
All 5 requirements
requirements 1,2,3,4
requirements 1,2,3
etc.

The minimum goal is to match requirements 1, 2, 3, and 4. There will not be very many documents that match #5.

I will provide a file of all CAS numbers for you to match.

There are five data elements that we will be searching for in each document:
1. Company Name
2. Product Name
3. Most current date on the document
4. CAS numbers
5. GHS compliant.

Your final document will look like this:
Company Name: ABC Supply Company
Product Name: Red Dye #5
Date: November 5, 2013
CAS: 12-223-1122
CAS: 1-223-234
CAS: 123-12-123
GHS: GHS

For every string, ignore not alphanumeric’s, such as ; : - . etc.
replace all comma’s with a space (so ABC Company, Inc. will be ABC Company Inc)

1. Company Name - placed in the Author field:
Look for a string that looks like a company name, so it has one of the following (with a non-alpha before and after) :
company
companies
corporation
enterprises
laboratories
laboratory
labs
corp
co
llp
lp
llc
industries
ind
international
intl
s.a.
Inc
incorporated
ltd
limited
pty
supply


or, if there is not a match above, it has a string immediately following:
company
company name
company id
corporate office
manufacturer
manufacture
manufacturer information
distributor
manufacturer(s) name
supplier
distributed by
company address
Responsible Party
manufactured
supplied
supplier name and address
contact
contact details
Consignee
importer
manufacturer/supplier
manufactured for
Company Identification


or, if there is not a match above, it is part of an address sequence:
Revkem
PO Box 28104
Green Bay, WI 54324


AQUASOLVE CHEMICAL CO.
P.O. box 1952
Houston, Texas 77251





2. Product name - placed in the Title field
Has a string immediately following this string:
product name
product id
product identity
msds name
trade name
common name
product
brand name
identity
identity (As used on label and list)
Product Identity (Name / Number)
product trade name
Product Brand Name
Trade Name and Synonyms
material name
chemical product
commercial product name


3. Most current date in the document- place in the date field. Search for all of the dates found in the document, compare the dates found, and keep the most current.

4. CAS numbers - place in the CAS field, one per line
All CAS numbers will be in a format like this: x-x-x, where there is one or more number for each x. (ex. 23-1122-221, or 113-23-1223). I will provide a list of CAS numbers for you to compare the number found. If the CAS number sequence matches a CAS number on my list, then place that CAS number in the CAS field. There may be 0, 1, or many CAS numbers on each document.

5. GHS version MSDS - Place the word “GHS” in the Keyword field. You will search the document for one or more strings of text, and if those strings match exactly, you will put the word GHS in the Keyword fiels.

USE THE SAMPLE FILES FOR EXAMPLES.

Skills required:
.NET, C Programming, C# Programming, C++ Programming, Visual Basic
Additional Files: PDFtoTEXT.zip
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


Hire rajanlengde
$ 367
in 7 days
$ 110
in 3 days
$ 77
in 2 days
$ 220
in 5 days
$ 83
in 1 days
$ 66
in 3 days
$ 132
in 5 days
$ 65
in 5 days
$ 154
in 7 days
$ 110
in 3 days