Cancelled

Script to Analyze Wikipedia and Build People Graphs

We are looking for a talented developer to create a script to parse relationships about people on Wikipedia. This is the 1st phase of a multi-step project. If the 1st phase proves successful, there is likely more work needed.

OVERVIEW

* Develop an online form to crawl Wikipedia for relationships between people

* Parse and store direct relationships found within a submitted Wikipedia article

* Parse and store indirect relationships found by connecting people via linked articles in a submitted Wikipedia article

USER FLOW

* User is presented with a single text box and a submit button

* User is asked to input a Wikipedia article URL in the text box

* After clicking Submit, script examines the URL and identifies all links to Wikipedia articles about living people and separates it from articles NOT about people

* For article links about living people, script stores those relationships as “direct” relationships

* For articles links NOT about people, script examines those linked articles and identifies all links about living people within and stores relationships as “indirect” relationships

* For any identified Wikipedia articles about living people throughout this process, script will store all URLs in the “External Links” section of the article

* After processing, the script outputs two CSV files: “Relationships” and "Links"

Table: Relationships

* Column 1: wikipedia_url_source, which stores the submitted URL

* Column 2: wikipedia_url_target, which stores the URL to the related article about a living person

* Column 3: wikipedia_url_connection, which stores which URL the relationship was found

* Column 4: relationship_type, which stores whether the relationship is “direct” or “indirect”

Table: “Links”

* wikipedia_url, which stores URL to Wikipedia articles about people

* external_link, which stores any URL in the “External Links” section of the wikipedia_url

EXAMPLE

Based on user input, script would first go to an article for Boris Becker the tennis player ([url removed, login to view])

Then, it would first pull names of living people from this parent article, such as:

* Ivan Lendi

* Michael Chang

* John McEnroe

Then it would go to child articles NOT about people, such as:

* Grand Slam

* Wimbledon Championships

* ATP World Tour Finals

Then it would pull names of living people from those child articles to find:

* Steffi Graf (found via the “Grand Slam” article)

* Andre Agassi (found via the “Wimbledon Championships” article)

* Roger Federer (found via the “ATP World Tour Finals” article

Example of the Relationships CSV: [url removed, login to view]

Example of the Links CSV: [url removed, login to view]

CORE REQUIREMENTS

* Examine all Wikipedia article links within a submitted Wikipedia URL

* Detect whether an article is about a living person or not

* Store “direct” relationships between living people

* Store “indirect” relationships between living people

* Store URLs in the “External Links” section of each article about a living person

POTENTIAL FOLLOW-ON

This is a multi-milestone project. If work is qualified and the data generated is beneficial, we are considering additional follow-on projects:

* Change script to store additional fields

* Change script to analyze entire Wikipedia portals

* Change script to accept batches of Wikipedia URLs or full names instead of URLs

DELIVERABLES

* Full source code

* Hosted on development webserver for testing and 15 days after completion

QUALIFICATIONS

* Experience in PHP, Python, or Ruby

* Experience in extracting data from Wikipedia

* Great debugging and reasoning skills

* Strong communication skills

IMPORTANT NOTE: TO ENSURE YOUR BID IS RECEIVED, PLEASE INCLUDE THE FOLLOWING PHRASE IN THE 1ST LINE OF YOUR BID: “SIGNAL EXPERIMENT”

Skills: PHP, Python, Ruby on Rails

See more: work for wikipedia, wikipedia communication skills, wiki online, we find a developer online, we a looking for ruby developer, testing projects online work, source flow, qualified child, python qualifications, python online store, python find, python button, php developer qualifications, php developer overview, overview php developer, overview for php developer, online python projects, online projects on python, online projects for testing, graphs online, graphs in c, flow section, find work online as a developer, find wikipedia, find python

About the Employer:
( 0 reviews ) San Francisco, United States

Project ID: #1286114

5 freelancers are bidding on average $21/hour for this job

Etcherator

SIGNAL EXPERIMENT See PMB

$15 USD / hour
(4 Reviews)
3.9
emef0

SIGNAL EXPERIMENT I wrote a text-classification program about a year ago that could most likely be useful in the sorting of living people when applied to wikipedia articles. It is a very interesting machine learnin More

$20 USD / hour
(1 Review)
2.4
martinmok

PLEASE CHECK PMB

$40 USD / hour
(0 Reviews)
0.0
makrusak

SIGNAL EXPERIMENT! I want to work with you and I suppose that I can help you. My experience is enough but now I try to develop my own projects and have no portfolio and also my price is cheap. ( 10-12 $/hour)

$15 USD / hour
(0 Reviews)
0.0
BruceDing

SIGNAL EXPERIMENT I have the experience of crawling wiki.

$15 USD / hour
(0 Reviews)
0.0