Find URLs from websites

Awarded Posted Dec 23, 2012 Paid on delivery
Awarded Paid on delivery

Populate an Excel sheet with the URLs of staff pages from a list of University websites.

To identify the XPath to various elements in a page, one of the tools that can be used is the XPathChecker plugin in Firefox ([url removed, login to view]).

The first step in creating a template is to identify the start page for each institute/organization. This start URL is added to the StartURL field in the institutes table. In most cases the list of staff members names is either a table or a list. The XPath to identify this table or list is then added to the TableXPath field in the corresponding record. The XPath to identify each staff member’s profile page link is added to the URLXPath field. Since most web profiles will be linked using a relative URL, the URLXPath based link needs to be combined with a URL prefix for the institute web server address and path. This is added to the URLPrefix field.

Once the StartURL, TableXPath, URLXPath and URLPrefix fields are populated, the script should be able to read the individual profile pages one by one. This can be verified by running the script and checking the output of the script on the screen to see whether the URLs are actually being retrieved.

Once the pages are able to be extracted, the template XPaths for the profile details need to be populated. The variables that are being captured include:

• Name

• Title

• Email

• Phone

• Fax

• Address

• Biography

• Qualifications

• Research Interests

• Publications

Each of these details will require a separate XPath added to the template with an optional regular expression to eliminate unwanted formatting and HTML tags. Please note that not all organizational units/staff members will have all of these details. A few trial runs will need to be run to get the most optimal XPath that will capture the majority of the details. For each detail, there are two methods of using the XPath. One is to get the value as a list of XPath nodes (‘V’) and the other is to get the values found by the XPath as a string (‘S’). The type of return needs to be added to the corresponding type field in the table. If a regular expression is needed, the type would usually be ‘S’.

More details will be posted in the coming weeks.

Data Entry Excel Perl Web Scraping Web Search

Project ID: #4066847

About the project

16 proposals Remote project Active Dec 25, 2012

16 freelancers are bidding on average $157 for this job

SigmaVisual

I can help in your project, please check PMB and our ratings/reviews to get idea of our experience. Please let me know if you have any queries.

$199 AUD in 7 days
(68 Reviews)
6.8
appwiz

Good day, please see my message

$150 AUD in 7 days
(9 Reviews)
4.5
ebrainindia

Can be done very well. Have done this many time. Please see private message for proposal

$250 AUD in 3 days
(16 Reviews)
4.2
SoftSandila

i have done this work many times its quiet easy task for me....regards:R1

$130 AUD in 7 days
(9 Reviews)
3.9
hesama110

hi please check your PMB

$110 AUD in 3 days
(1 Review)
1.9
vyastik

Dear sir, I'm an experienced Web researcher and am eager to complete this job properly and in time.

$140 AUD in 9 days
(1 Review)
1.1
usmanfaisal3

Hello Sir, I 'm Faisal I will do my best for your project. And will deliver your completed project with a very short period. I have great team to done your task before time I am proficient in ms-word, ms-excel, dat More

$200 AUD in 3 days
(0 Reviews)
0.0
ksharpvw

some Details ?

$150 AUD in 2 days
(0 Reviews)
0.0
toKandarp

Can be done very well. Have done this many time.

$150 AUD in 10 days
(0 Reviews)
0.0
ramjay03

I know this process well,give me I will do it successfully with good quality

$150 AUD in 20 days
(0 Reviews)
0.0
dawnconsultancy

Lets get started.

$250 AUD in 3 days
(0 Reviews)
0.0
ITjobs76

ready for your work.

$30 AUD in 10 days
(0 Reviews)
0.0
Uma829

i am interested in working with you

$100 AUD in 1 day
(0 Reviews)
0.0
Obxide

Ho, please engage me, give me this chance. Thanks

$200 AUD in 5 days
(0 Reviews)
0.0
tarasprystavskyj

Simple html DOM is better than Xpath for dirrect search of info on html page.

$160 AUD in 3 days
(0 Reviews)
0.0