Closed

Website Crawler / Scraper Manager

I'm looking for someone to code a solid website scraper / crawler. I've already coded a version, however it is not as good as I need it to be, so I need help to create a new better version from scratch.

In short I need to be able to manage (create/edit/delete) scraping tasks through a robust, flexible and advanced UI; scraping task script need to look for things to do on regular intervals (optimally as an update daemon service on my Ubuntu VPS instead of a CRON task) with data getting scraped and inserted into an MYSQL database. The sites in questions are generally news sites relating to games and tech; key data is headlines, intro and/or full content, date published, author and URL to full story (similar to what an RSS feed could provide, but these site do not have RSS feeds).

Beyond use of PHP/JQuery and Ajax I expect you to use something like SimpleHTMLdom (which I used, however maybe you prefer another framework - so can be discussed) and Datatables for all types of tables (alternatively some bootstrap tables).

Also note that I use a them called Metronic – Admin Dashboard for my general UI design, I can provide a default template and link in that regard.

Features that will be required

Advanced create/edit/delete tasks UI so that tasks to do everything can be done via the UI as far as possible to ensure a page can get scraped for data.

Smart way to manage multiple page scrapes from the same website. E.g. when there is no way to fetch, news, reviews and features from a single page.

List of tasks with relevant status; search, filter, sort and manage options

Update daemon that can run as a background process on an VPS Ubuntu 14.04 box. This manage all the tasks based on task settings and interval criteria to fetch data.

Error handling; able to recover in case of failed fetches, interruptions, re-schedule tasks etc., logging of what is going on and error’s that occurred.

Error management; warnings system that flags tasks that might have issues, e.g. we’re no longer scraping a headline or an author etc. e.g. site change code that can cause issues.

Happy to answer any further questions, just ask.

IMPORTANT

Timeline/deadlines; while I would have loved to have this done yesterday, do let me know an estimate of how much time you believe will be required to complete the project. A high level of English also required. Offers that ignores to provide this information will not be considered.

See attached images for a view of my current system.

Updated with two missing attachment that was intended to be included.

Skills: AJAX, jQuery / Prototype, Linux, MySQL, PHP

See more: scraper crawler spider, scraper crawler book, filestube crawler website, php, mysql, ajax, linux, jquery / prototype, video crawler website, php crawler website, screen scraper crawler, building scraper crawler, scraper crawler, add news crawler website, crawler website site inventory, update crawler website, flight crawler website, backend website options, spider crawler website, website contents update

About the Employer:
( 5 reviews ) Brighton, United Kingdom

Project ID: #11323928

6 freelancers are bidding on average £176 for this job

malviyamanish

Dear Client , We gone through project description We are Really intrested to do work with you , We are Web & Software development company having more than 8 year experience in Software development we totally commit More

£263 GBP in 3 days
(201 Reviews)
7.6
yogeshssanwal

Hi there - My name is Jhalak. I’ve read your [login to view URL] team has 4 years experience designing and developing mobile apps and Websites.I would approach your project by starting with wireframes and getting the site complet More

£150 GBP in 3 days
(192 Reviews)
6.8
sparximer

You will receive EXCELLENT results from my work. My reviews speak of my excellent attention to detail and my great customer service! Please review my profile and read my client reviews (101 reviews - 5 stars). I More

£142 GBP in 3 days
(9 Reviews)
5.0
marcocatania

Python programmator since 2012. I used to create a lot of applications about Web Scraping/Crawling with BeautifulSoup/Selenium/PhantomJS/lxml I love Computer&Hack and I'd like to share my passion helping someone with c More

£150 GBP in 3 days
(10 Reviews)
4.7
Zimalab

Hello, We specialize in PHP, MySQL, AJAX and would like to help you with your project. Visit our website to see the portfolio and our customers’ feedbacks: [login to view URL] As we’ve understood from the proj More

£150 GBP in 3 days
(1 Review)
3.8
imuimran92

Hi, I am imran and i am a web developer expert in WordPress, PHP, laravel,jquery and ajax . I have working experience in several successful project. For example 1. [login to view URL] 2. [login to view URL] 3. [login to view URL] More

£200 GBP in 7 days
(4 Reviews)
2.5