Web Scraping Software

CLOSED
Bids
11
Avg Bid (USD)
$1096
Project Budget (USD)
$250 - $750

Project Description:
Hi
I need a software to scrape/ parse / aggregate / job postings mainly from career pages (like the ones majore companies have – please visit sites like ibm.com/ adobe.com / microsoft.com / inditex.com / danone.com) but also from newspapers or job boards.

„Must have” features of the project:

Spider jobs from websites (HTML or XML), ATS or via FTP.
Jobs taxonomy for categorization based on job titles & keywords.
Then the job ads should be converted into the correct format for my job board. Semantic analysis should be used, to ensure the content is accurately mapped onto my job board’s category schemes
The capture has to be at the highest possible data to be parsed on a job site and it would contain the entire job advertisement.
Incremental scraping feature only downloads new jobs.
Synchronize jobs to guarantee that only open jobs remain posted.
Filter jobs by keywords so only relevant jobs are posted.
Auto-replace keywords in content & clean up job formatting.
Schedule regular spidering / posting sessions.
Auto-post via XML, EXCEL or CSV to single or multiple websites or forward to a list of emails.
Post to HTTP interface, via API, SOAP, to FTP or email.

!!!The way I want it to work: I shall introduce the adress of the main career page of the hiring company
(for example only: https://rexfront.com/psc/REXUPRD/APPLICANT2/CUST/c/HRS_HRS.HRS_APP_SCHJOB.GBL?Page=HRS_APP_SCHJOB&Action=U&var=
or
https://careers.microsoft.com/search.aspx?gl=gbl
or
http://find.ibm.jobs/
or
https://www.joinfashioninditex.com/joinfashion/en/vacancy-search)

and the software will scrape the entire posting of the job openings that will be found in the career pages of the website. The software product should be build in a way which will allow me to give only the web adress of the career section of the website to be parsed and the software product will do all the rest. This is really a must have. If needed I shall provide for you all the addresses of the career sections/pages to be parsed or at least most of them.

Very important: I need to have perfect accuracy of the scraped content - text, url, images. The engine has to be able to scrap at least 600000 career pages / day (which will come from arround 30000 web-sites/day), convert them into XML, EXCEL, or similar files and post them on my job board website or spread them to other job boards.

The software should be able to do at least what the following two are doing:
http://www.jobboardmount.com/cm/job_spider
http://www.madgex.com/services/job-scraper/


If you do not wish to build up an entirely new software product you may use visualwebripper.com or mozenda.com but I would prefer visualwebripper as it is a lot cheaper. Other web scraping engines/software are welcome as long as they prove to work just as good as visualwebripper or mozenda and they are not to expensive compare to visualwebripper. Build up a visualwebripper with the “must have” features above and it will do it.

Another very important thing: before paying for your work I have to make sure that the product meets all the above requirements. A free trial of 30 days or demo is compulsory.


Any of the above statements are subject to questions and discussions. Feel free to ask or comment on any of the above.

Kind regards
Charlie

Skills required:
Web Scraping
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


$ 250
in 7 days
$ 2105
in 60 days
Hire phpXpertbd
$ 526
in 3 days
$ 789
in 21 days
Hire raul27868
$ 277
in 10 days
$ 750
in 4 days
$ 1546
in 45 days
Hire VtechMass
$ 277
in 3 days
$ 257
in 3 days
$ 833
in 21 days