Find Jobs
Hire Freelancers

Web Page Downloader/Parser

$100-300 USD

Closed
Posted almost 19 years ago

$100-300 USD

Paid on delivery
First of all: This should be programmed using ANSI C that compiles in GCC should be cross platform. We need a Function that will take a web URL and download the pages html contents. (it should not download any pictures or any other external files) It should then come up with a title, description and keywords based on the meta tags. If ther are no meta tags, the title, keywords and descriptions should be be figured out like google or yahoo- in that it will ignore common words like 'a', 'the', and many others. It should also drop words that have been repeated to many times (more then 7 I think). It should also attempt to figure out the last time the page was modified - if it can't it should compare it with an internal date in the database- and store in the database only if newer. The URL, Title, Description and keywords should be saved in a database called "sites.dat" using a database function we have had developed for us. At any point that it receives an error 301 (or any other redirect method) it should follow the link then update the URL that was passed in. If there is a 404 or any other error preventing the page from being downloaded it should return all blank values. Any links that it finds should be stored using a database function that we are having developed using the filename "links.dat". This function should obey all ROBOT tags, as well as [login to view URL] files. When this is being coded, you should be aware that not all sites have perfect HTML and some tags will be wrong or full of errors. Count on this function looking at badly formed html sites. In most cases, this should act no differently as a googlebot. Though when downloading a page it should identify itself as 'dCrawler'.
Project ID: 15679

About the project

11 proposals
Remote project
Active 17 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
11 freelancers are bidding on average $251 USD for this job
User Avatar
I have similar code now - see PM
$250 USD in 5 days
1.2 (4 reviews)
6.0
6.0
User Avatar
Hi, Crawling is our first choice. We have developed so many crawlers in PHP/MySQL and we are very much confident that we can develop a crawler in C/C++ also in GNU/Linux environment. For demo and discussion please see PMB. Regards ... ccpplinux
$150 USD in 15 days
5.0 (6 reviews)
3.6
3.6
User Avatar
Check pm for more details pls
$300 USD in 21 days
0.5 (1 review)
4.2
4.2
User Avatar
I already worked on a similar project. (downloading/smart parsing). I may have to tune my code, since it worked under windows and in C++. Still I put 10 days in order to have time to test the app completely & carefully. I also need further info about database used. Best regards!
$250 USD in 10 days
0.0 (0 reviews)
0.0
0.0
User Avatar
we could do it.
$300 USD in 5 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Dear Sir/ Madam, If you are looking for top quality and quick turnaround then we will be delighted to work up the required downloader for you. We are an IT company specializing in web technologies and programming. Our abundant experience in this field helps us find always outstanding solutions, providing 100% satisfaction for our clients. The price stated is a placeholder. Please, feel free to open PMB for a more detailed discussion (as we have several questions to you) and we'll provide you with an accurate calculation. Best Regards, Nidle Inc.
$300 USD in 15 days
0.0 (0 reviews)
4.6
4.6
User Avatar
Hi there, Niftysoft Solution is a leading IT services company providing solutions across the globe. A large team of extremely professionals staffs Niftysoft Solution with a strong background in IT field and having extensive experience in various modules to meet the rigid standards of quality. Each member of the team understands the importance of quality services and rigorously follows QA processes laid down by the company. We will be happy to design the program for you using ANSI C that compiles in GCC and will be cross platform. We will develop the fFunction that will take a web URL and download the pages html contents. (it will not download any pictures or any other external files). We assure that the project will contain all the specifications given by you and it will meet your expectations. We look forward to working with you. We request you to open PMB for further discussions. Regards, Niftysoft Team.
$290 USD in 15 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Dear sir, I will complete this program within 15 days to suit all your requirements. Thank you.
$125 USD in 15 days
0.0 (0 reviews)
3.8
3.8
User Avatar
We are a group of software professionals from India with expertise in ASP, ASPx, HTML, XML, Java, C, C++, VB, Oracle, SQL Server, PHP, My SQL Professionals ranging from 1 yr to 20 yr of experience We are sure to deliver your project with perfection and in a minimum time Cost and time estimates are preliminary, will be able to deliver exect quote after discussions
$250 USD in 15 days
0.0 (0 reviews)
0.0
0.0
User Avatar
Dear Sir/Madam We are group of software engineer having expertise in web technology, windows desktop application development, security and mobile technologies. Recently we have developed a project in which we are parsing web pages to find some keywords. Please feel free to contact us for more information or any queries.
$250 USD in 15 days
0.0 (0 reviews)
0.0
0.0

About the client

Flag of CANADA
Brantford, Canada
5.0
3
Member since Apr 6, 2005

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.