need fast script to parse html using wget or curl
This project received 17 bids from talented freelancers with an average bid price of $154 USD.Get free quotes for a project like this
The script should;
1- Crawl the webpage given
2- Parse all the urls in page with different regular expressions. (don't have to start with a href or http even)
for example: parse all urls with rar,zip,mp3 etc. extensions. parse all mediafire, rapidshare etc. urls.
3-It should be able to login or load cookies to login to specific webpages such as forums etc. to get the links
4-Must be fast as much as possible and stable :).
it can be shell script, perl, c etc. important part it should be fast and not use much resources. advices about platform or techics welcome.
below is an example which I can do till here, I need so many improvements
wget -q -U "Mozilla/5.0 (X11; U; Linux i686; pl-PL; rv:18.104.22.168) Gecko/20121223 Ubuntu/[url removed, login to view] (jaunty) Firefox/3.8" [url removed, login to view] -e robots=off -O - | tr "\t\r\n'" ' "' | grep -i -o '"\(ht\|f\)tps\?:[^"]\+\(.gif\|.apk\|.rar\|.mkv\)"' | sed -e 's/^.*"\([^"]\+\)".*$/\1/g' | uniq
thanks in advance
Looking to make some money?
- Set your budget and the timeframe
- Outline your proposal
- Get paid for your work
Hire Freelancers who also bid on this project
Looking for work?
Work on projects like this and make money from home!Sign Up Now
- The New York Times
- Wall Street Journal
- Times Online