Closed

Basic XML Parser skeleton (simple if you know DOM/SAX)

This project was awarded to muskad202 for $127.5 USD.

Get free quotes for a project like this
Employer working
Awarded to:
Project Budget
$100 - $150 USD
Total Bids
6
Project Description

The purpose of this project is to create an XML parser that utilized? a combination of both? DOM and SAX to parse a specially formatted XML file, executes formatting functions for each field and then adds the results into a database. An example of the XML format? will be? provided. The purpose of this assignment is to create a well formatted skeleton class that would read the provided XML input file and provide me with a place to later add special logic for processing the XML content.

## Deliverables

**XML Parser Skeleton**

**Overview:**

The purpose of this project is to create an XML parser that utilized? a combination of both? DOM and SAX to parse a specially formatted XML file, executes formatting functions for each field and then adds the results into a database. An example of the XML format is provided below. The purpose of this assignment is to create a well formatted skeleton class that would read the provided XML input file and provide me with a place to later add special logic for processing the XML content.

**Input:
**Here is an example input file:

| *<xml>
<record num="0">
? ? ? <Description> Some text here <br/>*

*? ? ? ? ? ? May include any amount? <p> html </p> code, etc.*

*? ? ? </Description>*

*? ? ? <Name>*

*? ? ? ? ? ? Some text here as well*

*? ? ? ? </Name>*

*? ? ? <Date>*

*? ? ? ? ? ? ? 5/5/5 something <a href="...">something</a>*

*? ? ? </Date>
</record>*

*<record num="1">*

*? ? ? <Date> 7/7/7? 5:5:5 </Date>
? ? ? <Name> <div><a href="..."> originalAttribute="href" originalAttribute="href" originalPath="..."> originalAttribute="href" originalPath=""...">" Some name </a> </div>*

*? ? ? ? </Name>*

*? ? ? <Extra_Info> Lots of text </hr>? May include any amount? <p font="font"> html </p> code, etc.*

*? ? ? </Extra_Info>*

*</record>
</xml>* |

? About the format of the XML file:

? 1)? ? ? ? The file is always split up into “<record>?? entries. The file can be very large - thousands or even tens of thousands of records. For this reason you need to use a SAX parser to read in the individual records. However, the records themselves will never be too large therefore the records themselves can be loaded via DOM ??" more info on this further in the document.

2)? ? ? ? Within each record are “fields??. Each field is the top level XML tag name within the record. For example, the first record had the fields “<Description>??, “<Name>?? and “<Date>??. Each field will need to be associated with a method for processing it, and the contents of the field needs to be passed to the processing method as a DOM tree.

3)? ? ? ? Within each field there could be any amount of HTML/XML tags, they all need to be loaded in memory in DOM but only the method that deals with the field would ever process them. Very often the methods processing the fields would just use “asText()?? to get all the text contents, but sometimes they will need to use the DOM elements also.

**Output**:

The output of the program will be actually be adding the processed contents of the records into a MySQL database. The database connection preferences can be hard coded. When starting the program needs to open a database connection, at the end of each record the formatted contents would be added to the database.

**Program Requirements:**

1)? ? ? ? The program needs be a simple stand-alone command line executable.

2)? ? ? ? The name of the input file should be passed as a command line argument.

3)? ? ? ? The code needs to be very neatly spaced and commented. Remember you are writing a skeleton into which someone else will be adding logic, so it needs to be easy to work with.

4)? ? ? ? Please make an “ant?? build file (it will be short but we still need one).

5)? ? ? ? Please configure log4j and set it up to log errors/warnings to standard output.

**Code Requirements**

Please split up the code into the following two classes:

? “WkXMLParser??

-? ? ? ? ? ? We will create one instance of this class

-? ? ? ? ? ? Before we parse the file we need to map field names to handling methods:

o? ? ? addFieldHandler(fieldName, handlerMethod)

§? Maps methods to the field name they handle.

§? It is very important that the functions are called in the same order in which they were mapped, not in the order that fields occurred in the XML file.

o? ? ? addRecordStartHandler(handlerMethod)

o? ? ? addRecordEndHandler(handlerMethod)

-? ? ? ? ? ? The final method would be “importXML(inputFile)??. If the start handler or end handler aren’t defined it should throw an exception.

-? ? ? ? ? ? If during importing you encounter a field name with no method associated with it you should log a warning but continue.

? “WkImport??

-? ? ? ? ? ? This is the class with the main(), the one we run.

-? ? ? ? ? ? Creates database connection.

-? ? ? ? ? ? Instantiates WkXMLParser.

-? ? ? ? ? ? Contains the field processing methods.

-? ? ? ? ? ? Adds field processing methods to WkXMLParser.

-? ? ? ? ? ? Contains “startRercord()?? and “endRecord()??.

-? ? ? ? ? ? For the purpose of the skeleton, provide the following methods:

o? ? ? startRecord() ??" clears any previous data stored in the member variables (name and descrition).

o? ? ? fieldName(…) ??" Gets the text value of the DOM tree using asText and saves it to member variable “name??.

o? ? ? fieldDescription(…) ??" Gets the text value of the DOM tree using asText and saves it to member variable “description??.

o? ? ? endRecord() ??" creates a new record in the Database with name and description.

? **General Specification Points**

-? ? ? ? ? ? The importXML method should read the input XML file using SAX, then for every record load the contents into memory using DOM.

-? ? ? ? ? ? First at the start of each record it should call the record start handler.

-? ? ? ? ? ? Within each record you need to call the individual field handlers. Again, it is important that you call the field handlers in the same order that they were added using the “addFieldHandler?? method, and NOT in the order that they appear in the XML file. This is one reason why the record should be loaded in DOM.

-? ? ? ? ? ? Keep track of which fields were used, if there are any fields left that were not associated with any field handler then log warnings for each of them.

-? ? ? ? ? ? Keep track of the number of the record being processed and display it in any warnings.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online