I need a scraper built to scrape financial information from 2 different websites. This information should be exported into a single XML file per company stock symbol.
The scraper will index a dynamic list of company stock symbols (Google = GOOG) that will sit in the same folder as the scraper’s executable file. This list will be a txt file – an example has been attached. These codes will be used to build dynamic links.
The first website that I want data scraped from is nasdaq.com. Specifically, I want data scraped from the following page: http://www.nasdaq.com/aspx/infoquotes.aspx?symbol=GOOG
This URL will be dynamic based on the company stock code at the end (GOOG), using the company stock symbol text file I have provided.
The list of data I need from this page is:
• Last Sale
• Share Volume
• 52 Week High
• Earnings Per Share (EPS)
• Previous Close
• 52 Week Low
• P/E Ratio
• Date of Open Price
• Date of Close Price
The second set of data that I would like scraped is from the Wall Street Journal. Again, this uses a dynamic link that will be updated using the company stock symbol text file. The link is: http://quotes.wsj.com/GOOG/company-people
The data I want scraped from this website is:
• Stock code (e.g. GOOG)
• Listed market (at the top of the page: e.g. (U.S.: NASDAQ))
• Sales or Revenue
• 1Y Sales Change
• Fiscal Year Ends
• Board of Directors (specifically: Name, Age, Title, Current Board Memberships)
• All Company Executives (specifically: Name, Title)
• Average Growth Rates, Past Five Years (specifically: Ending date/year, Revenue, Net Income, Earnings Per Share, Capital Spending, Gross Margin, Cash Flow)
This scraper must be scheduled to run daily. It will export an XML file per company stock code, per day. The output should be name datesymbol.filetype (01012013GOOG.xml).
Please contact me if you have any questions.