Closed

need fast script to parse html using wget or curl

This project received 17 bids from talented freelancers with an average bid price of $154 USD.

Get free quotes for a project like this
Employer working
Project Budget
N/A
Total Bids
17
Project Description

Hello,
The script should;
1- Crawl the webpage given
2- Parse all the urls in page with different regular expressions. (don't have to start with a href or http even)
for example: parse all urls with rar,zip,mp3 etc. extensions. parse all mediafire, rapidshare etc. urls.
3-It should be able to login or load cookies to login to specific webpages such as forums etc. to get the links
4-Must be fast as much as possible and stable :).

it can be shell script, perl, c etc. important part it should be fast and not use much resources. advices about platform or techics welcome.

below is an example which I can do till here, I need so many improvements

wget -q -U "Mozilla/5.0 (X11; U; Linux i686; pl-PL; rv:1.9.0.2) Gecko/20121223 Ubuntu/[url removed, login to view] (jaunty) Firefox/3.8" [url removed, login to view] -e robots=off -O - | tr "\t\r\n'" ' "' | grep -i -o '"\(ht\|f\)tps\?:[^"]\+\(.gif\|.apk\|.rar\|.mkv\)"' | sed -e 's/^.*"\([^"]\+\)".*$/\1/g' | uniq


thanks in advance

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online