Completed

Search and Replace text in TXT documents - 1

This project was successfully completed by BSSTechnology for $66 USD in 2 days.

Get free quotes for a project like this
Employer working
Project Budget
$30 - $250 USD
Completed In
2 days
Total Bids
31
Project Description

Text Document Search and Replace project

The purpose of this project is to create software that will read a text document,and eliminate all data except what is required, and put the data elements in a specific order.

You must have the ability to search in a folder, or folders, subfolders, sub-sub folder etc.

You must sort your documents based on which fields you were able to match, and any possible combination of matches:

All 5 requirements

requirements 1,2,3,4

requirements 1,2,3

etc.

The minimum goal is to match requirements 1, 2, 3, and 4. There will not be very many documents that match #5.

I will provide a file of all CAS numbers for you to match.

There are five data elements that we will be searching for in each document:

1. Company Name

2. Product Name

3. Most current date on the document

4. CAS numbers

5. GHS compliant.

Your final document will look like this:

Company Name: ABC Supply Company

Product Name: Red Dye #5

Date: November 5, 2013

CAS: 12-223-1122

CAS: 1-223-234

CAS: 123-12-123

GHS: GHS

For every string, ignore not alphanumeric’s, such as ; : - . etc.

replace all comma’s with a space (so ABC Company, Inc. will be ABC Company Inc)

1. Company Name - placed in the Author field:

Look for a string that looks like a company name, so it has one of the following (with a non-alpha before and after) :

company

companies

corporation

enterprises

laboratories

laboratory

labs

corp

co

llp

lp

llc

industries

ind

international

intl

s.a.

Inc

incorporated

ltd

limited

pty

supply

or, if there is not a match above, it has a string immediately following:

company

company name

company id

corporate office

manufacturer

manufacture

manufacturer information

distributor

manufacturer(s) name

supplier

distributed by

company address

Responsible Party

manufactured

supplied

supplier name and address

contact

contact details

Consignee

importer

manufacturer/supplier

manufactured for

Company Identification

or, if there is not a match above, it is part of an address sequence:

Revkem

PO Box 28104

Green Bay, WI 54324

AQUASOLVE CHEMICAL CO.

P.O. box 1952

Houston, Texas 77251

2. Product name - placed in the Title field

Has a string immediately following this string:

product name

product id

product identity

msds name

trade name

common name

product

brand name

identity

identity (As used on label and list)

Product Identity (Name / Number)

product trade name

Product Brand Name

Trade Name and Synonyms

material name

chemical product

commercial product name

3. Most current date in the document- place in the date field. Search for all of the dates found in the document, compare the dates found, and keep the most current.

4. CAS numbers - place in the CAS field, one per line

All CAS numbers will be in a format like this: x-x-x, where there is one or more number for each x. (ex. 23-1122-221, or 113-23-1223). I will provide a list of CAS numbers for you to compare the number found. If the CAS number sequence matches a CAS number on my list, then place that CAS number in the CAS field. There may be 0, 1, or many CAS numbers on each document.

5. GHS version MSDS - Place the word “GHS” in the Keyword field. You will search the document for one or more strings of text, and if those strings match exactly, you will put the word GHS in the Keyword fiels.

USE THE SAMPLE FILES FOR EXAMPLES.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online