I need a rather large XML (text) file parsed. I believe the text file is about 55mb uncompressed. I think you could use one of several techniques to parse this file: 1) You could use XSLIT to transform this file into its components. 2) You could use PERL or some other scripting language to rewrite the file There is one extraordinary problem with using XSLIT to translate the file-I think the XML file is NOT well-formed. The entire file is composed of 23,000+ records. A simplified representation of a record looks like this: 4609 American Medical Association Medicine Medical Care Jama: Journal Of The American Medical Association Jama; Journal Of The American Medical Association Jama Of course, each "record" is much more detailed. I am looked to remove the extraneous information. I need these records parsed in several ways. The first way I need it parsed is by turning each individual record into several records based on The format would look like this: Preferredtitle, id jakeid, serialinfo issn, Title An example of the records derived from the record above would be: Jama: Journal Of The American Medical Association, 4609, 0098-7484, Journal Of The American Medical Association Jama; Journal Of The American Medical Association, 4609, 0098-7484, Journal Of The American Medical Association Jama, 4609, 0098-7484, Journal Of The American Medical Association I need the fields in these records delimited with a "|" The second way I need this file parsed is to create several records based on the field The format would be: Jakeid, subject class="lcsh" source="lc" An example of the records derived from the record above would be: 4609, American Medical Association 4609, Medicine 4609, Medical Care I have attached a file that shows a real record
## Deliverables
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. 2) Installation package that will install the software (in ready-to-run condition) on the platform(s) specified in this bid request. 3) Complete ownership and distribution copyrights to all work purchased. 4) Two parsed text files and the code that parsed it
## Platform
VB, PERL or XSLIT--your choice