Part A ) Extract information from a given set of url's (BID URLs) which contain many PDF in Spanish and extract from the PDFs text using regular expressions.
The URL [url removed, login to view] should produce the following : Gerente de proyecto, Desarollador Java, Desarrollador PHP, Desarrollador Forms, Desarrollador .NET , Arquitecto de Software. This text is in page 47 of one of the files listed in the url. Keep in mind you have to parse all the docs in the URL.
Part B) After extracting the text the idea is to Store some of the text that matches certain criteria into a relational database (Mysql). With the above example the idea would be to store in a table with three fields:
| [url removed, login to view] | Gerente de Proyecto | Ingeniero de Sistemas
Un (1) año en Gerencia de proyectos informáticos | 1
1. Automatic replies that do not ask for especific information will be automatically discarded.
2. Deliverable MUST be configured as a working java maven project and does NOT have to be web.
3. Only one payment will be made when deliverables work and fully tested.
4. Project will be awarded to the first programmer to submit a working prototype of part A.
17 freelancers are bidding on average $710 for this job
Hi, I am an expert web-scrapping application maker and also very comfortable with extracting text from pdf and regex. Please see private message for more details. Thanks