Find Jobs
Hire Freelancers

Crawling

$100-300 USD

In Progress
Posted about 20 years ago

$100-300 USD

Paid on delivery
Web Crawling utility that crawls a specific site, parses the data according to a template, and then inserts the extracted data into a database. Inputs (with examples given): URL Root: [login to view URL] Starting Integer: **10000** Ending Integer: **10100 **Wait Interval: **2** Template name: **[login to view URL]** Destination database: **testdb** Destination stored procedure: **testrun** The user is responsible for setting up a stored procedure that receives the data In the above example, the program retrieves 101 web pages as specified in the range starting with [login to view URL] It reads the [login to view URL] file which contains pattern information defining (1) fieldname (2) type of data (integer, date, or character) (3) max field lenght (4) are null values allowed? (5) starting pattern to match (6) ending pattern The template might look something like this, although you can define your own "template language" PROFILETOID type:integer nulls:no start:[login to view URL] end:"> ALIASTO type:**char** nulls:**no** start: end:**</a>** PROFILEFROMID type: **integer** nulls:**no** start: **[login to view URL]** end:**">** and so on for the fields **ALIASFROM,DATEPOSTED,MSGNUMBER,MAXMESSAGE,MSGBODY ** The program starts at the beginning of the file and searches for the string **[login to view URL]**, and then takes what immediately follows but before **">** and extracts it as the **PROFILETOID** field. For the **ALIAS** field, since there is no start information, the program knows to begin reading that field immediately and end with </a>. and so on to the end of the template file. ## Deliverables (More info that wouldn't fit above) As the page is being read, the results are validated to make sure that every field is filled with the correct type of data. If nothing is found between the start and end patterns, the field is valid only if NULLs are allowed. If one of the fields is found not valid, an error dialog box pops up and says which url failed and why. Example error messages: "Failed to parse #10002 because [ALIAS] exceeded the maximum length of 20" "Failed to parse #10002 because [ALIAS] start pattern not found." But even if there is an error, the program continues once the dialog box is Okayed. There should be a checkbox on the dialog that offers the user to "Ignore errors" so that the program may continue without interruption. After each page is read and validated, the program calls a stored procedure with all of the template fields, along with RAWDATA which is a text field including the html of the page retrieved and an ERROR variable which indicates whether the page contained a validation error or not. The wait interval of "2" is the number of seconds to wait between page requests (so as not to overload the target site). If set to "0", there is no wait between requests. Deliverables: 1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done. Must run on Visual Studio .NET in VB, C#, or Java **Please configure your code to work with SQL Server using (localhost) as the server, "sa" as the userid, and "crawler" as the password.** 2) Sample database named testdb, containing the testrun storedprocedure and 1 table with the following 10 fields: **ERROR, RAWDATA,PROFILETO, ALIASTO, PROFILEFROM, ALIASFROM, DATEPOSTED MSGNUMBER, MAXMESSAGE, MSGBODY ** When I run the program without alteration, the above table should be filled in. 3) Installation instructions ## Platform Windows XP, SQL Server 2000, [login to view URL]
Project ID: 3101217

About the project

2 proposals
Remote project
Active 20 yrs ago

Looking to make some money?

Benefits of bidding on Freelancer

Set your budget and timeframe
Get paid for your work
Outline your proposal
It's free to sign up and bid on jobs
Awarded to:
User Avatar
See private message.
$170 USD in 14 days
5.0 (3 reviews)
3.5
3.5
2 freelancers are bidding on average $213 USD for this job
User Avatar
See private message.
$255 USD in 14 days
5.0 (5 reviews)
5.9
5.9

About the client

Flag of UNITED STATES
United States
5.0
45
Member since Nov 4, 2003

Client Verification

Thanks! We’ve emailed you a link to claim your free credit.
Something went wrong while sending your email. Please try again.
Registered Users Total Jobs Posted
Freelancer ® is a registered Trademark of Freelancer Technology Pty Limited (ACN 142 189 759)
Copyright © 2024 Freelancer Technology Pty Limited (ACN 142 189 759)
Loading preview
Permission granted for Geolocation.
Your login session has expired and you have been logged out. Please log in again.