Need some work done? Post a Project Today
Please just bid if you read what I'm writing and you really think you can do it, don't waist your time and my time bidding without even reading. I'm ok to give a chance to someone that is starting. Please be as professional with me as you want that I will be with you, because I'm going to be professional with you. Please don't bid in the maximum value and just afgterwards read, because I will ward project also watching the final price. Ok, having that clear, let's start.
If your bid is choosen, before starting the work for you to clearly know what is expected to do, we will have everything specified in the requirements documents that both of us have to accept. I will write a draft of the requirements and you can adjust, but both of us have to agrree before the project is awarded.
The main objective is clear, crawl a part of a website mainly build in flash, scrap and collect specific data and store in a XML file.
I need that to be done recuservely, so you will need to create a bot. The objective is to read the website and constantly keep updating the XML. Entries that are older than 90 minutes (1 and 1/2 hour) should be deleted.
Just for having a general idea, another program (not from this project) will read the XML and use the data for some purpose.
I will create a XML and XSD for you to use, I'm open to listen your ideas and if you propose some other structure that can store all the data in a easy to index way and if I accept it, that can also be changed.
As data from the website is constanly changing is expected that the all process to take just a few seconds to crawl , scrape and store the data, otherwise data will not be up-to-date.
The time between each access of the bot to the website should be randomize between two values.
The start of the crawl is one fixed webpage mainly build in flash, each access is supossed to crawl just the dynamic links situated in a column of that flash webpage, all crawled webpages are also mainly done in flash and all pages have a very similar structure.
Each acces is expected to crawl between 10 to at most 250 pages. Each page can have from none to 500 registries to store in the XML.
One access (for istance right now where the movement is not too high) should crawl 22 pages and scrape around 100 registers per page.
More specific information like the webpage, the data required will be given to the bidders.