Cluster Analysis (using existing code) / MySQL database

$30-250 USD

Closed

Posted

over 11 years ago

$30-250 USD

Paid on delivery

I have .[login to view URL] that I think contains all the pieces you need, if you think something is missing then please let me know. The rough workflow is as follows: [login to view URL] will set up a database for clustering that is in the correct format. Ideally I would like to leverage my existing database on GoDaddy but I would be open to other suggestions. You will need to change the "data" table at the very bottom so that it is a view across your actual page data, which is expected to show the page id (a unique identifier) and a hash of the DOM. When you run the script you can specify a database schema, all of the tables will go in that schema. Compile qfp.c with "gcc -o qfp qfp.c". Run [login to view URL], this script takes a lot of options and will allow you to customize where the database and all the tables are. If you have done all this, congratulations, you have clusters in your database! The [login to view URL] table contains the actual clusters, for each page it will have a (rep_id, page_id) pair, where the rep_id is essentially the cluster id (it is actually just the id of the lowest page in the cluster). Depending on what you want to do with the clusters, this may be all you need. You can compile [login to view URL] with "javac -cp [login to view URL]:. web_clustering/[login to view URL]". You may want to make a copy of this file for your modifications, that way you can refer back to the original if you delete too much and screw something up. If you compile and run web_clustering/[login to view URL], it will generate a web site that shows your clusters, gives screenshots of common pages in the clusters (assuming you have screenshots enabled on Neha's crawler), and lets you look at their DOMs pretty easily. You have to compile and run it from the main folder, not from within web_clustering, as it is part of the web_clustering java package. Unfortunately you will need to dig into the Java file to change things like table names, the output location, and the location of your screenshot and DOM files. These are all hard-coded and spread through multiple files, so this part will be a little time consuming. Run it with the "-M" flag and just delete any code that did not follow this execution path (there is a lot of it, he added lots of different options to this code as time went on). Then you will probably need to modify the SQL queries to grab the page data correctly, I am not certain how much work this will be though. If you can get this to compile and run, you should be left with an output directory that contains a bunch of folders and files, one of which is "[login to view URL]". Opening this file in a web browser will give you a main page that shows your 225 most common clusters, the most common screenshots of the pages in those clusters, and have links for more information about the clusters, DOMs, etc. Let me know if you have any questions about any of this, I would be happy to answer them. Good luck! Happy Bidding!

Project ID: 3996385

About the project

4 proposals

Remote project

Active 11 yrs ago

Looking to make some money?

Email address

Benefits of bidding on Freelancer

Set your budget and timeframe

Get paid for your work

Outline your proposal

It's free to sign up and bid on jobs

4 freelancers are bidding on average $208 USD for this job

@samitXI

I am Java expert. I am want to help you here. Please check your personal inbox for more details. I will wait you. Thanks, AMit

$250 USD in 7 days

4.7

(100 reviews)

6.3

@Butterfly718

Hello sir. I read all your requirements. And i am good at all that. Please check attached doc for my previous works. Hope to hear from you soon. Thanks!!

$200 USD in 10 days

5.0

(41 reviews)

4.8

@coolbuddy19

HI I am confident to handle this will work until you are satisfied Thanks With REgards i am keenly interested in this project

$195 USD in 4 days

5.0

(16 reviews)

4.0

@petracompany

Petra is a developer group experienced 5-years in web development, desktop programming and database design and programming. We have excellent expertise in web Development languages and tools (PHP, JOOMLA, DRUPAL, Magento, HTML, CSS,AJAX, JavaScript, SEO, word press etc),programming languages (Java, C#) and database design (Oracle SQL, MySql, MS. SQL Server).

$185 USD in 7 days