In Progress

Parse Wikipedia Database Dump

Wikipedia Database Dump project:

1. Parsing [url removed, login to view] files and extracting only unique domain names. Domain should not be

2. Script should work with a file from [url removed, login to view]

3. Two params (filename -> [url removed, login to view] database settings -> sql table to insert eh urls id, domain).

All params should be in the begining of the file so we can customize ourselves.

4. Software should run on Linux and use regex or other parsing technique.

Example Domain Extraction:

[url removed, login to view] -> extract [url removed, login to view]

[url removed, login to view] -> extract [url removed, login to view]

Skills: Perl, PHP, Python, XML

See more: parsing wikipedia, wikipedia database dump, database parsing, parsing database, wikipedia dump, wikipedia dump html, parsing wikipedia xml, php wikipedia database, parsing wikipedia xml dump, wikipedia dump sql, parsing wikipedia dump, perl wikipedia dump, wikipedia dump parsing, wikipedia parsing perl, parsing wikipedia dumps, wikipedia database, wikipedia html dump, regex example, regex c, wikipedia dump filename, wikipedia dump extract, parsing wikipedia dumps perl, wikipedia dump text, perl database dump, parsing xml wikipedia php

About the Employer:
( 1016 reviews ) Mahe, Bulgaria

Project ID: #213474

Awarded to:


pls check PMB.

$100 USD in 5 days
(165 Reviews)

3 freelancers are bidding on average $133 for this job


Will be done fast and by requirements. Thanks!

$200 USD in 2 days
(399 Reviews)

Hi, I have previous experience with this, I can write the program you need. Regards, Stefan

$100 USD in 5 days
(2 Reviews)