Must be written in Python and installed on server. NDA may be required.
I need a program that will scrape the pages of a 3-5 websites.
The scraper will look for specific information that is sometimes obvious and sometimes not too obvious.
Information is in standard html format, i think non-dynamic. But not sure.
Show me examples of websites you have successfully scraped and what type of information was outputted and cleaned.
The output will be entered into a database and then exported to a csv file on demand or every week.
A simple text log will be maintained to show results of output, errors, etc.
A configuration file will also be involved, to configure the speed of the program at every step of the process, and other configurations.
A time stamp and other unique identifiers must be appended to each record.
It must be able to run on windows and linux box as an executable. Has to be able to run in the OS scheduler.
UI not needed. But on screen it should print what's going on for each step.
This must be able to sit on a dedicated server and on a shared hosting server.
Must be willing to be a bit flexible on scope of project to account for unforseen issues or changes.
* * *This broadcast message was sent to all bidders on Monday Feb 7, 2011 9:02:11 PM:
Update: -This must be written in python. No exceptions. -You will be scraping from [url removed, login to view] and 2-3 other sites for modeling. Please submit your bid for consideration. Thank you.