login
Forgot?
Login with Facebook

Don't have an account? Register one now!

Website crawler for HTML content

Bids 
6
Avg Bid
$177 USD
CLOSED
  • Project ID:

    556542
  • Project Type:

    Fixed
  • Budget:

    $30-$250 USD

Project Description:

I need a crawler to identify phrases in the html of websites, for example "google analytics".

There will be about 5 phrases in total, i want this to be an input that i can control. I want to be able to control the depth of the crawl in terms of how many levels "deep" the crawler goes into the website (e.g., home page --> about us --> management would be 3 layers deep).

Also, i want to be able to control the total number of pages crawled per site, e.g., cut-off search after 100 pages crawled.

Finally, the crawler needs to be able to crawl 20,000 sites in about a week. Therefore, the winner bidder needs to be able to build a "fast" crawler--e.g., utilizing multi-threading etc. Also, i will need to be able to upload the urls of the websites I want to crawl.

Finally, this crawler needs to be completed in a couple days.

This is something that was allready asked a couple of months ago by somebody else. But I need it as well now.

Skills required:

PHP

Project posted by:

tompoes Netherlands
(0 Reviews)

Last seen:

Public Clarification Board

1 messages


If you are the project creator or one of the bidders, please Log In for more options.


All Bids ()

wildlily980 China
a1.jpg
wildlily980
China From China     Gold Member     Offline
$150 in 7 days 
0
over 2 years ago
5.0

5.9

39 Reviews
87% Completion Rate
I'm interesting in it. check pmb for detaisl.
numatido Viet Nam
numatido
Viet Nam From Viet Nam     Offline
$150 in 2 days 
0
over 2 years ago
5.0

2.8

2 Reviews
72% Completion Rate
Hi, Please check your PM. Thanks.
alphacoms Russian Federation
alphacoms
Russian Federation From Russian Federation     Offline
$180 in 7 days 
0
over 2 years ago
0.0

0.0

0 Reviews
0% Completion Rate
I can do this in PHP. This will be a multi-threading script, if we can say this. PHP doesnt naturally support it, but there are some tricks to implement it. I've the similar experience.
nzpiknik New Zealand (Aotearoa)
nzpiknik
New Zealand (Aotearoa) From New Zealand (Aotearoa)     Offline
$200 in 7 days 
0
over 2 years ago
Hello, Thank you for your clear specification and requirement, I wish all jobs on getafreelance.com were as clear and concise as your post. I suggest having a screen where you would enter (a) the phrases to se... more
Hello, Thank you for your clear specification and requirement, I wish all jobs on getafreelance.com were as clear and concise as your post. I suggest having a screen where you would enter (a) the phrases to search (b) search depth (c) max number of pages to search per site (d) file path for websites to process (e) file path for the output (f) other control information that may be required to help with the performance of the tool, like a restart from last site processed checkbox. The data entered above would be stored into the registry so that when you start the program again you would not have to re-enter it. You would press the 'crawl' button and away it would go. I propose building you a stand-alone program in Microsoft VB.NET to do this work, not PHP as you have indicated in the job type. The reasons for this are performance and usage related. You will get a much high processing rate with VB.NET as opposed to PHP. With PHP you have to spend time working with a web server and this adds another layer of complexity and things you have to do, with a stand-alone vb.net program you simply run it from your PC that has an internet connection. I'm a seasoned programmer/developer with 30 years experience building and supporting IT systems. I live in Wellington, New Zealand. I'm however very new to getafreelance.com, in fact this is my first bid ever. I pride myself for producing high quality software and I'm sure you won't be disappointed with my work. On winning the bid i would start immediately and have a first cut program for you to look at within 3 days, I would then proceed to complete fine tuning and adjustments to the development as required. Thanks again for considering my bid. Kind regards Nik.  less
mrtuannm Viet Nam
mrtuannm
Viet Nam From Viet Nam     Offline
$230 in 3 days 
0
over 2 years ago
Hi, Please see some websites we've developed: http://yamaha-motor.com.vn/ http://megashares.vn/ ... and at http://vngia.com/ we've created price search engine website. In which have several crawler modules to... more
Hi, Please see some websites we've developed: http://yamaha-motor.com.vn/ http://megashares.vn/ ... and at http://vngia.com/ we've created price search engine website. In which have several crawler modules to crawl information over the internet. So that, I believe we can do your project well. You can email to these webmasters to confirm my name is Nguyen Minh Tuan. We are waiting for your reply. Thank you. less
svetlinb Bulgaria
svetlinb
Bulgaria From Bulgaria     Offline
$150 in 2 days 
0
over 2 years ago
Contact me to clarify details on the project