I am reposting this given there were a couple of elements unclear in my previous posting and some questions frequently asked will be answered.
The ask is to deliver a script that data scrapes the following website:
To access the page click on the purple "Guest Access" button and then on the next page click on the purple "Guest Access 90 Day Delay" button. I will call the screen you are on now the “Main Screen” in the description below.
I prefer a script in VBA (for use along in excel) – but I am open to other suggestions as long as excel as kept in mind as the ultimate tool where the data will be analyzed by myself.
The script that needs to be delivered will have to scrape the following:
- on the main (LHS) panel you can see the investment ideas in the box called "Latest ideas..."
- on the LHS panel I want to scrape all the ideas up to date [X] – where [X] needs to be an input to the script
- the script needs to download the following fields of the idea from the Main Screen:
1) Date & Time posted
2) Company name
3) Company ticker/symbol
4) Price of stock at the time the idea was posted
5) Market capitalization of the stock at the time the idea was posted
7) Number of users that the rating was based upon
- the script then needs to click on the idea link and download the following items from the individual idea page:
8) User (this is in the top header where at the end it says "by..")
9) "Description..." text box (stored as a string)
10) "Catalyst..." text box (stored as a string)
11) "Messages..." text box - if there are any messages then the time / date of the last message should be scraped
- moving on to the RHS panel you will see three boxes: "Functions...", "Reports..." and "Topics..."
- of these only the Reports need to be downloaded and clicked through
- the reports are “Most Active”, “Highest Overall Rated”, “Highest Quality Rated” and “Highest Performance Rated”
- each of these reports need to be fully downloaded (i.e. click through as far as possible) and the fields that are required are 1) Company name, 2) Company ticker ad 3) http link to idea
- the last element of scraping that the script needs to perform is when you are on the Main Screen you see a search box
- the script needs to search for an empty string and click “search”
- the results table need to be then downloaded up to date [Y], where [Y] needs to be an input to the script
- the fields that need to be downloaded are: Date, Symbol, Company Name, Market Cap, Return, Member and Description.
- the script needs to allow as well for username and password as input given ultimately an actual account might be used
Additional Project Description:
12/29/2012 at 1:13 SAST
Attached an example excelsheet of how the output should look based on the website as it looks now.
There will be basically three tabs for each time the script is run - one for Investment Ideas, one for the reports and one for the "zero string search".
I have attached as well snapshots of the website in pdf that correspond with the data that needs to be scraped.
12/29/2012 at 21:21 SAST
Given that the description of the investment idea might be more than 32,000 characters - no. 9) "Description..." and no. 10) "Catalyst..." of the extraction should be replaced with just the website link to the idea instead in the excel sheet.