Cancelled

Mass Data Extraction Project - over 15 million pages

We need a very experienced data extraction expert with "underground" skills to extract the entire content of well known social network which contains over 10 million profiles. This includes extracting all profiles, all publicly available profile data, networks of friends, communities, guestbook messages, pictures, and all information that a user has publicly available in their profile including their entire network of friends. You must provide all of this info in a MY SQL database file and also be able to create an exact navigatable clone of the social network.

Note that this social network is not Myspace and the pages contain no multimedia, video, or personalized html.

You should be very familiar with the top social network and understand their structures well and why they currently prevent spiders from grabbing their data naturally.

Successful completion of this project would require an in depth understanding of the the social network, a IP detection and blocking strategy to allow for the successful extraction of terabytes of data without being blocked. You may need to set up a massive network of computers with rotating IP addresses to complete this/

The delivery of this project would be providing all this extracted data in a predefined database format and providing a web based clone copy of the entire social network site. We would provide all the hd space necessary to store this. We would need a requirements list from you of all hardware, software, and other tools necessary to complete this job.

Because of the size of this job and the closed nature of this site preventing traditional spiders, this job will require some very creative thinking and knowledge

You will need to custom building scripts and programs to complete this job.

Finally, after completing the initial extraction of all current content, using this scripts, we need the capability to repeat this process in the future using the scripts and same strategy.

please demonstrate that you clearly understand the issues involved in completing this. Demonstrate that you understand the challenges of logging in the social network, extracting data, dealing with IP issues, and all other challenges that will come up.

This is a massive job requiring a well coordinated and planned effort requiring a bit of "underground" skills and resources.

In summary, the final deliverable should be an exact navigatable social network clone of the social network allowing the ability to navigate through all profiles viewing pictures, networks of friends, communities, etc.

To a casual observer, it would appear as if you were navigating through an exact clone of the social network site. And all these information will be supported from a database containing all 10,000,000+ records of all the publicly available information we can extract.

Please take a look at the top social networks, hi5, orkut, facebook, and myspace and provide provide you insights into the differences of these social networks and how you would approach data extraction on each of these. Your ability to intelligently discuss the differences of these sites will clearly demonstrate your ability to pull of this project.

Please do very creative in your thinking and in the resources that could be made available to complete this job.

Finally, we expect several follow on jobs and potential long term ongoing work as a result of this project. So please take the time to demonstrate your ability to complete this and also let us know the limitations that might exist in completing this job.

Skills: .NET, Java, PHP, Python, Script Install

See more: with communities, need data structures, web based programs, video made pictures, us communities, understanding data structures, underground jobs, top data, top creative sites, top challenges, structures data, software challenges, set data structures, set data, project work jobs, project messages, project based jobs, programs data structures, programs computers, profile completion, process work jobs, personalized video messages, network jobs, nature jobs, logging jobs

About the Employer:
( 0 reviews ) Berkeley, United States

Project ID: #52122

13 freelancers are bidding on average $3423 for this job

mspl

Hello I can Provide this Service tou you. let me know your budget. so we can discuss and negoiceate further! Please Contact me on PMB. So we can discuss further! please visit www.webnflashdev.com for more info abou More

$4000 USD in 50 days
(176 Reviews)
7.4
justinsylas

Sir, i can do this

$4000 USD in 45 days
(32 Reviews)
6.6
tfs

Hello sir! We're the professional russian team. We have a 6-years experience in the Delphi and a great experience in the data extraction from different sites (we wrote more then 100 different spiders). Familiar with t More

$3000 USD in 30 days
(29 Reviews)
6.3
MaxPowers

Welcome to the spider-man :) I have built many spiders and can very closely emulate a browser when spidering. More info in the PMB...

$1500 USD in 0 days
(32 Reviews)
5.8
givemeahell

hi, i am interested in doing this, i am basically software eng. with 5yrs exp. vb asp java .net so u can trust me, i will deliver the project in time or even b4, with quality u expect. thanks gmh

$2000 USD in 10 days
(4 Reviews)
4.6
mekoolt

I have gone through your requirements and understand your requirements well.And what’s more I have experience in this kind of project and I'm really interested in your project.Please see PM for detail, I have sent yo More

$1500 USD in 10 days
(1 Review)
1.8
radiantindia

HI PLEASE SEE THE PMB

$5000 USD in 45 days
(0 Reviews)
0.0
dotnetcoders

I have experience on sxtracting dat from large social networking sites

$3000 USD in 45 days
(0 Reviews)
0.0
govipattu

Our regular work consists of extracting data and formating them in proper form. So we can easily carry out this job. Please see PMB

$4000 USD in 30 days
(0 Reviews)
0.0
vikrantmohite

Please visit me on http://www.24creative.com for more details about us. Over you can check out our products page for previous work. If feels good with given information come back to us.

$5000 USD in 45 days
(0 Reviews)
3.8
epammanager

Please, check the PMB for clarifications

$3000 USD in 30 days
(0 Reviews)
0.0
activenext2006

Dear Sir/Madam, Our Company, ActiveNext http://www.activenext.com, is based in the Silicon Valley with offices in India. Through our ActiveStaff program, you can hire a qualified candidate to work for you on a ded More

$4000 USD in 60 days
(0 Reviews)
0.0
hiensys

Dear Sir Hiensys IT Consulting is an emerging global technology services company delivering business solutions to its clients. We deliver the full range of application outsourcing, business process outsourcing, con More

$4500 USD in 90 days
(0 Reviews)
0.0