Web scraping to Excel Scrape data from [url removed, login to view] (DEF 14A flings)

Closed

I would like to scrape data from the [url removed, login to view] website. I am interested in scraping the DEF 14A filings. I want to scrape data from at least 5000 reports, preferably more.

I want to extract just two fields from each report: the name of the company and the percentage of the company owned by the board members.

I would like this information to be scraped and sent to me in Excel format.

Scraping this information will be fairly challenging because the HTML pages are unstructured. The target text does not appear at the same predicable location on each page.

The only way to locate the relevant text is to make use of some kind of advanced Boolean proximity search.

The target text is normally preceded by a number of recognizable terms.

The target text is normally followed by a percentage symbol.

Here is an example of a DEF 14A filing.

[url removed, login to view] The relevant table appears on page 11.

The table lists the names of the board members and the percentage of the company that they personally own. Collectively the board members own [url removed, login to view] percent of the company. In this example I would be looking to extract the word "Microsoft" and the number [url removed, login to view] and place this in Excel format. In this example I would be looking to extract data from the 2013 Microsoft filing and all previous Microsoft filings on record.

The big problem is that not all the companies use the same language: Here is a list of different phrases that various companies use. In each case the percentage figure at the very end is what I would be aiming to extract:

"All directors and current executive officers as a group (12 persons) 3,991,056 6,348,957 10,340,013 2.6%"

"Executive officers and directors as a group (13 persons)(19) 1,490,847 6.7%"

"All directors and executive officers as a group (18 persons) 661,671 1,440,299 269,802,371,776 4.3%"

"All Company directors and executive officers as a group (19 persons) 433,960 596,312 1,030,272 1.5%"

"All nominees, continuing directors and executive officers as a group (20 persons) 5,944,103 16,824,264 139,82 8,03,926,234 (4) 23%"

"All directors, director nominees and executive officers as a group (12 persons) 13,412,40 17.0%"

"All current executive officers and directors as a group (10 persons) (7)........ 19,059,809 1,275,405 52.1%"

"All directors and executive officers as of November 13, 2012 as a group (13 persons) 17,011,477 624,969 17,636,446 54.8%"

A program could be built that recognizes the term "as a group (?? persons)" A wildcard search would have to be used because the number of persons varies, but it is always a two digit figure. The proximity between the term and the relevant percentage varies, but it is normally less than 30 characters.

The relevant percentage figure is normally preceded by the terms "as a group (?? persons)" and its normally followed by a % symbol.

The program would not be 100% accurate, but that does not matter in my case.

Skills: Web Scraping

See more: def 14a scraping, scrape sec, web scraping sec, scraping excel, excel 2013 web scraping, web scrape data gov, web scraping advanced, scraping the web, scraping data from website, scraping data from the web, microsoft excel group, microsoft excel 2013 advanced, make my own symbol, executive on the web, data scraping company, big data company, advanced microsoft excel 2013, advanced excel website, advanced excel 2013, what is data scraping, less than excel, less than 1 symbol, excel less than, big 4 companies, website sec

Project ID: #4250082

Awarded to:

martin421

Hi, I have looked on your requirements, retrieved the data and applied the described algorithm. Please see the PM with a sample and more details. Thanks.

£250 GBP in 1 day
(0 Reviews)
0.0

31 freelancers are bidding on average £295 for this job

SigmaVisual

I can help in your project, please check PMB and our ratings/reviews to get idea of our experience. Please let me know if you have any queries.

£250 GBP in 5 days
(24 Reviews)
6.1
srinichal

I am an expert in scrapping and can deliver the project

£320 GBP in 5 days
(24 Reviews)
6.1
mantislin

Hi sir, please check PM, thx Kimi.

£375 GBP in 6 days
(60 Reviews)
6.0
fareastern

Hi Ned, please check my private message.

£325 GBP in 10 days
(28 Reviews)
5.8
phpXpertbd

I worked on many similar projects, I have big experience in data mining projects. I can finish this task in short time, with the best quality.

£250 GBP in 5 days
(16 Reviews)
5.6
mostacholoco

Hi, I'm ready to start right now. Please, check your PMB. Thanks in advance. ##### 5-star freelancer + 100% completion rate = 100% client satisfaction #####

£275 GBP in 2 days
(11 Reviews)
4.8
proauthor

Hi, Ready to start your work. Eagerly awaiting for your positive reply. Please check your inbox for further details. Thanks, Shaik.

£250 GBP in 3 days
(20 Reviews)
4.8
webscrapinggurus

Hi I have over 8 years of experience in writing web scraping code, I also specialize exclusively in web scraping which allows me to create higher quality scripts than most freelancers. I would Imagine that looking f More

£250 GBP in 7 days
(9 Reviews)
4.7
NiceDeveloper

Hi, My offer is a desktop application. Please refer to my message in your inbox for samples and demo. Thanks

£250 GBP in 7 days
(12 Reviews)
4.6
vbexcel

Web scraping pro, I have experience with SEC.gov site.

£250 GBP in 3 days
(6 Reviews)
4.3
jitendraparmar07

Automation expert here. I can easily write such a bot/scraper.Please check your PMB.

£400 GBP in 5 days
(11 Reviews)
4.4
zephercode

Hi, please find my solution attached to your inbox.

£250 GBP in 4 days
(3 Reviews)
4.2
sonarkaushik

Sir, I can do the project. Refer PMB. Looking for further discussions in this matter. with thanks and regards

£250 GBP in 9 days
(11 Reviews)
4.2
chirgeo

Hi, have experience with extracting data from sources with unregular/not fixed structure. Check PM, have some questions.

£300 GBP in 4 days
(2 Reviews)
3.5
msadoch

Hello, I'm an expert in web scraping with over 8 years of experience. I'm new at freelancer.com, but I'm not newbie in this kind of work. Please take a look at my private message to check my full proposal.

£350 GBP in 2 days
(1 Review)
3.4
mhmhz

Kindly check my PBM

£500 GBP in 3 days
(3 Reviews)
3.3
dreamci

Hello, bot expert ready! ,please check pm

£500 GBP in 2 days
(4 Reviews)
2.5
fhasanbd

Hi, Here is ready to start and please check PMB for details

£250 GBP in 5 days
(5 Reviews)
2.7
anyaservices

Hello, I have read your bid request and am interested in doing the project. I have done quite a few crawling projects recently, but perhaps not all through freelancer. I am confident I can do your project, so here More

£400 GBP in 14 days
(1 Review)
2.4
NEEMISH

sir please see pm and reply.

£250 GBP in 10 days
(5 Reviews)
2.3