Parse Wikipedia Database Dump

Closed

Description

Wikipedia Database Dump project:

1. Parsing [url removed, login to view] files and extracting only unique domain names. Domain should not be wikipedia.org.

2. Script should work with a file from [url removed, login to view]

3. Two params (filename -> [url removed, login to view] database settings -> sql table to insert eh urls id, domain).

All params should be in the begining of the file so we can customize ourselves.

4. Software should run on Linux and use regex or other parsing technique.

Example Domain Extraction:

[url removed, login to view] -> extract [url removed, login to view]

[url removed, login to view] -> extract [url removed, login to view]

Skills: Perl, PHP, Python, XML

See more: parse wikipedia, parse wikipedia dump, parsing wikipedia, parse database, wikipedia database dump, database parsing, parse wikipedia xml, python parse wikipedia, parsing database, wikipedia dump, wikipedia dump html, parsing wikipedia xml, php parse wikipedia, php wikipedia database, parsing wikipedia xml dump, wikipedia dump sql, parsing wikipedia dump, parse wikipedia dump python, perl wikipedia dump, parse wikipedia dump xml text files, wikipedia xml dump parser, wikipedia dump parsing, wikipedia parsing perl, wikipedia database parse, parsed wikipedia

Project ID: #213474

Awarded to:

marchent

pls check PMB.

$100 USD in 5 days
(165 Reviews)
6.3

3 freelancers are bidding on average $133 for this job

mistersoft

Will be done fast and by requirements. Thanks!

$200 USD in 2 days
(399 Reviews)
7.5
spx2

Hi, I have previous experience with this, I can write the program you need. Regards, Stefan

$100 USD in 5 days
(2 Reviews)
2.0