You have chosen to sponsor your bid up to a maximum amount of .
We are looking to develop a web application for resume parsing written in Java/Spring. If you want to use something else please let us know.
This parser will be used to parse thousands of UNSTRUCTURED resumes in html, word (doc, docx), rtf, text and pdf formats.
Input: Resume files in the following formats: WORD, PDF, TEXT, TIF, html
Output: XML format files of the resume when all the words from resume are located in the correct tag of the XML.
The parser needs to be able to extract the following data from the resumes:
. first name
. last name
. zip code
. citizenship/immigration status
. email address
. resume job category
. resume title
. career objective or background
. years of professional experience
. employment history
. education history
. licenses and certifications
. foreign languages
. skills keywords
. security clearances
Output of the parser should be an xml tagged file, one xml file for each parsed resume, output file name to be the same as the input file name with extension changing from resumefile.xxx to resumefile.xml
All of the parsed fields will be used to upload into a mysql database. Parser is required to do the database insertion as part of the parsing process.
We will supply a sample set of resumes, as many as you need to be successful.
Resumes are unstructured so formats and content vary widely. The ability to score the parsing performance would be beneficial. It would be helpful to be able to look at a parsing report (i.e. The application should contain a log file) that indicates which resumes the parser thinks it did poorly on so we can manually revisit those parsed resumes that have the highest probabilty of having parsing errors.
We need to be able to integrate the web application parser with our existing php website.
•The application should contain at least 2 main modules:
1.File converter – Each file format will be translated by this module to text format
2.Parsing engine – This engine should receive a text file and return an XML file
The separation is needed in order to allow additional file formats in the future.
Passing acceptance testing with several resumes will be required at project completion.
I expect there will be a lot more questions so feel free to ask.