MSDS PDF Document Properties project with matching text files
The purpose of this project is to create software that will read a text document, and put specific data elements from the text document into a matching PDF document properties box (both files have the same name such as file1.pdf and file1.txt).
You must have the ability to search process documents in a folder, or folders, subfolders, sub-sub folder etc.
You must sort your documents based on which fields you were able to match, and any possible combination of matches:
All 5 requirements
The minimum goal is to match requirements 1, 2, 3, and 4. There will not be very many documents that match #5.
I will provide a file of all CAS numbers for you to match.
There are five data elements that we will be searching for in each document:
1. Company Name
2. Product Name
3. Most current date on the document
4. CAS numbers
5. GHS compliant.
For every string, ignore not alphanumeric’s, such as ; : - . etc.
replace all comma’s with a space (so ABC Company, Inc. will be ABC Company Inc)
1. Company Name - placed in the Author field:
Look for a string, only on the first page of the pdf document, that looks like a company name, so it has one of the following (with a non-alpha before and after) :
or, if there is not a match above, it has a string immediately following this string:
supplier name and address
or, if there is not a match above, it is part of an address sequence:
PO Box 28104
Green Bay, WI 54324
AQUASOLVE CHEMICAL CO.
P.O. box 1952
Houston, Texas 77251
2. Product name - placed in the Title field
Has a string immediately following this string:
identity (As used on label and list)
Product Identity (Name / Number)
product trade name
Product Brand Name
Trade Name and Synonyms
commercial product name
3. Most current date in the document- place in the Subject field. Search for all of the dates found in the document utilizing the date formatting criteria used in the date changer program, compare the dates found, and keep the most current.
4. CAS numbers - place in the Keywords field, one per line
All CAS numbers will be in a format like this: x-x-x, where there is one or more number for each x. (ex. 23-1122-221, or 113-23-1223). I will provide a list of CAS numbers for you to compare the number found. If the CAS number sequence matches a CAS number on my list, then place that CAS number in the Keyword field. There may be 0, 1, or many CAS numbers on each MSDS.
5. GHS version MSDS - Place the word “GHS” in the Keyword field. You will search the document for one or more strings of text, and if those strings match exactly, you will put the word GHS in the Keyword fiels.
PLEASE LOOK AT THE ATTCHED FILE FOR SAMPLE DOCUMENTS THAT YOUR SOFTWARE WILL BE SEARCHING THROUGH.
We will only consider bids under $500 USD
Funds will be place in escrow for the project