login
Forgot?

Don't have an account? Register one now!

Login

script / software to parse file and extract data

Bids 
35
Avg Bid
$115 USD
CLOSED
  • Project ID:

    463062
  • Project Type:

    Fixed
  • Budget:

    $30-$250 USD

Project Description:

i need a script / software to parse an huge "webcrawler index file", extract some informations and save them into my mysql database

data to extract:
-link relations: link from site - link to site
-link info: textlink or image link, link text, alt text, title text
-link type: nofollow, (meta tag nofollow), follow link
-some page infos: content encoding, title, domainname...

the index file sizes are between 15GB and 100GB big, please keep in mind that your script can handle this capacity
the script / software should run on our linux root server
i'll give you an mysql database layout
you find can all informations about the index file here: http://www.dotnetdotcom.org/

please send me an pms if you have any question

Skills required:

C Programming, Java, Linux, Perl, Python

Project posted by:

crilla Germany
5.0 (3 Reviews)

Last seen: Apr 04, 2012 8:37 AM EDT

Public Clarification Board

1 messages

  • jesuse1998

    Why do you want to place the records in a dbms when the authors of the webcraler decided to up flat files, most likely due to performance issues? just wondering...

    over 2 years ago


If you are the project creator or one of the bidders, please Log In for more options.


Awarded Bids

jimcrow
Russian Federation From Russian Federation        Offline
 Accepted
$40 in 7 days 
0
over 2 years ago
5.0

5.8

3 Reviews
71% Completion Rate
I can do it with python.

All Bids ()

CliverSoft128.jpg
cliver
Ukraine From Ukraine        Offline
  Foundation EUFreelance.com Member
$180 in 2 days 
0
over 2 years ago
5.0

6.5

23 Reviews
92% Completion Rate
Hello, Please look at the PMB. Regards, Sergey
victory.jpg
srinichal
India From India    Standard Membership     Offline
  Freelancer Orientation (90%, 100th percentile)
  General Orientation (85%, 95th percentile)
  Participated in the 2012 Freelancer.com Scavenger Hunt
$120 in 4 days 
0
over 2 years ago
4.8

6.4

78 Reviews
58% Completion Rate
I can write a bash script for the same
svt_logo_header2.gif
SigmaVisual
Pakistan From Pakistan    Standard Membership     Online
  General Orientation (80%, 90th percentile)
  Foundation EUFreelance.com Member
$250 in 4 days 
0
over 2 years ago
5.0

6.0

30 Reviews
73% Completion Rate
We can help in your project, please check PMB to see our related experience.
Forum.png
gangabass
Russian Federation From Russian Federation    Standard Membership     Online
  Foundation EUFreelance.com Member
$70 in 3 days 
0
over 2 years ago
4.9

5.8

128 Reviews
59% Completion Rate
I can do this job for you. See PM for details.
Desh Tech.jpg
sureshdevi
India From India    Standard Membership     Offline
  Freelancer Orientation (80%, 97th percentile)
  Employer Orientation (75%, 97th percentile)
  General Orientation (90%, 98th percentile)
  Participated in the 2012 Freelancer.com Scavenger Hunt
$200 in 5 days 
0
over 2 years ago
4.9

5.5

49 Reviews
86% Completion Rate
I can do this work. Thanks, Suresh
Logo.JPG
ancosys
Pakistan From Pakistan    Standard Membership     Offline
  Foundation LimeExchange Member
  Participated in the 2012 Freelancer.com Scavenger Hunt
$130 in 3 days 
0
over 2 years ago
4.9

5.5

39 Reviews
78% Completion Rate
Hi, Please check PM Thankx
pawel100
Poland From Poland        Offline
  General Orientation (90%, 98th percentile)
  Foundation EUFreelance.com Member
$60 in 3 days 
0
over 2 years ago
4.9

4.6

26 Reviews
73% Completion Rate
Hello, I'm interested in your project, Please check PMB for more details.
logo-image.jpg
edatawiz
India From India        Offline
  General Orientation (80%, 90th percentile)
  Foundation EUFreelance.com Member
$150 in 7 days 
0
over 2 years ago
5.0

3.7

5 Reviews
83% Completion Rate
Hi - Please check PM for details.
KelvinChen
China From China        Offline
  Foundation EUFreelance.com Member
$110 in 3 days 
0
over 2 years ago
5.0

3.6

5 Reviews
66% Completion Rate
Please check PM for details.
trash it.gif
ulkas
Slovak Republic From Slovak Republic        Offline
  General Orientation (95%, 100th percentile)
  Foundation EUFreelance.com Member
$100 in 2 days 
0
over 2 years ago
5.0

2.9

2 Reviews
55% Completion Rate
easy task, don't need any more info, just get me an example file and i can start asap and after then you can try it with your own big file.
yaroslavm
Russian Federation From Russian Federation        Offline
  Foundation EUFreelance.com Member
$100 in 3 days 
0
over 2 years ago
5.0

2.9

2 Reviews
100% Completion Rate
Hello, I can do that quickly and at low price.
earth.jpg
Ellemer
Hungary From Hungary        Offline
$111 in 3 days 
0
over 2 years ago
5.0

2.8

4 Reviews
100% Completion Rate
Please check the PM, thanks
2009-07-02-211859r.jpg
IstvanAntal
Romania From Romania        Offline
$100 in 2 days 
0
over 2 years ago
5.0

1.2

1 Review
81% Completion Rate
Hello, I understand what needs to be done, and I can start right away. See PM for details.
Four_leaf_clovers.jpg
jporwal
India From India        Offline
$30 in 2 days 
0
over 2 years ago
Dear sir, I have experience developing parsers for VHDL and C++ using lex/yacc and ANTLR. Can develop a very efficient, cache optimized parser for you in 2 days. It is and interesting and simple task for me. Rega... more
Dear sir, I have experience developing parsers for VHDL and C++ using lex/yacc and ANTLR. Can develop a very efficient, cache optimized parser for you in 2 days. It is and interesting and simple task for me. Regards, Janak less
Kamerer
Ukraine From Ukraine        Offline
$150 in 7 days 
0
over 2 years ago
I am expirienced in Python. I can write such script for you.
xeNorthwest
India From India        Offline
$50 in 2 days 
0
over 2 years ago
can do this easily with perl
zub1uk
United Kingdom From United Kingdom        Offline
$100 in 2 days 
0
over 2 years ago
0.0

1.4

0 Reviews
100% Completion Rate
Hi, I am an information extraction specialist and would be happy to help you with this project. All I require is a small sample of the file to be parsed to proceed.
index.jpeg
Fernando444
Sri Lanka From Sri Lanka        Offline
  Foundation EUFreelance.com Member
$200 in 2 days 
0
over 2 years ago
0.0

0.0

1 Review
22% Completion Rate
Please see the PM
nusch
Poland From Poland        Offline
  Foundation EUFreelance.com Member
$99 in 2 days 
0
over 2 years ago
I can do it fast with Python, I have experience with crawlers and other automated systems.
ibatica
Serbia and Montenegro From Serbia and Montenegro        Offline
$100 in 4 days 
0
over 2 years ago
0.0

0.0

0 Reviews
0% Completion Rate
I can do this for you.
ishantoraskar
India From India        Offline
$35 in 3 days 
0
over 2 years ago
pls see PM.
jcgasser
United States From United States        Offline
$140 in 5 days 
0
over 2 years ago
0.0

0.0

0 Reviews
100% Completion Rate
Please see my pm for details. Thanks
TipofIceberg
India From India        Offline
$80 in 4 days 
0
over 2 years ago
0.0

0.0

0 Reviews
25% Completion Rate
Hello. I can develop this software using Java. I can start the project right now. Regards.
1299335493_technical.png
maheshpmahadevan
India From India        Offline
  Foundation LimeExchange Member
$100 in 3 days 
0
over 2 years ago
0.0

0.0

0 Reviews
0% Completion Rate
hello , i can do this job on python or c/c++
gaf.logo.png
ThePilgrim
Romania From Romania        Offline
  General Orientation (85%, 95th percentile)
  Foundation EUFreelance.com Member
$200 in 10 days 
0
over 2 years ago
0.0

0.0

0 Reviews
50% Completion Rate
I'm proficient in ANSI C, C++ and also JAVA, and i the task at hand might or might efficient to complete. Anyway please post in PMB the database structure with a description on the handling of data extracted.
mkedar
India From India        Offline
$100 in 3 days 
0
over 2 years ago
I am expertise in java and sql. And since your file are very large it needs an efficient io for to read. I and do this for you ...
demigraff
United Kingdom From United Kingdom        Offline
  General Orientation (90%, 98th percentile)
$120 in 9 days 
0
over 2 years ago
Have looked at the file format, and believe this should be a relatively simple project in Perl. Thankyou for your consideration.
bvfalcon
Russian Federation From Russian Federation        Offline
  Foundation LimeExchange Member
$100 in 3 days 
0
over 2 years ago
I have experience in web crowler creating and text data mining
subhabrataban07
India From India        Offline
  Foundation EUFreelance.com Member
$100 in 3 days 
0
over 2 years ago
i can do it in perl/php/python..
a_264b00c1.jpg
serjant2600
Ukraine From Ukraine        Offline
$150 in 3 days 
0
over 2 years ago
0.0

1.8

0 Reviews
71% Completion Rate
I have done similar project, Please see the PM
rupaliT
India From India        Offline
$35 in 3 days 
0
over 2 years ago
0.0

1.0

1 Review
33% Completion Rate
Hi, I will do this work. I am very well in C/C++. Now i am currentlly working on that also.
ricky867
India From India        Offline
  Foundation EUFreelance.com Member
$150 in 20 days 
0
over 2 years ago
Recently..i did perl scripts for intel,oregon converting xml files..,templates..plugging register values etc..for huge files.. I will be glad to assist..you.
periwebindia
India From India        Offline
  Foundation EUFreelance.com Member
$150 in 5 days 
0
over 2 years ago
hello, I can help you out.
xjx922
China From China        Offline
  Foundation EUFreelance.com Member
$120 in 5 days 
0
over 2 years ago
0.0

0.0

0 Reviews
0% Completion Rate
I can hel you