Cluster Analysis (using existing code) / MySQL database


I have .[url removed, login to view] that I think contains all the pieces you need, if you think something is missing then please let me know. The rough workflow is as follows:

[url removed, login to view] will set up a database for clustering that is in the correct format. Ideally I would like to leverage my existing database on GoDaddy but I would be open to other suggestions. You will need to change the "data" table at the very bottom so that it is a view across your actual page data, which is expected to show the page id (a unique identifier) and a hash of the DOM. When you run the script you can specify a database schema, all of the tables will go in that schema.

Compile qfp.c with "gcc -o qfp qfp.c".

Run [url removed, login to view], this script takes a lot of options and will allow you to customize where the database and all the tables are.

If you have done all this, congratulations, you have clusters in your database! The [url removed, login to view] table contains the actual clusters, for each page it will have a (rep_id, page_id) pair, where the rep_id is essentially the cluster id (it is actually just the id of the lowest page in the cluster). Depending on what you want to do with the clusters, this may be all you need.

You can compile [url removed, login to view] with "javac -cp [url removed, login to view]:. web_clustering/[url removed, login to view]". You may want to make a copy of this file for your modifications, that way you can refer back to the original if you delete too much and screw something up.

If you compile and run web_clustering/[url removed, login to view], it will generate a web site that shows your clusters, gives screenshots of common pages in the clusters (assuming you have screenshots enabled on Neha's crawler), and lets you look at their DOMs pretty easily. You have to compile and run it from the main folder, not from within web_clustering, as it is part of the web_clustering java package. Unfortunately you will need to dig into the Java file to change things like table names, the output location, and the location of your screenshot and DOM files. These are all hard-coded and spread through multiple files, so this part will be a little time consuming. Run it with the "-M" flag and just delete any code that did not follow this execution path (there is a lot of it, he added lots of different options to this code as time went on). Then you will probably need to modify the SQL queries to grab the page data correctly, I am not certain how much work this will be though.

If you can get this to compile and run, you should be left with an output directory that contains a bunch of folders and files, one of which is "[url removed, login to view]". Opening this file in a web browser will give you a main page that shows your 225 most common clusters, the most common screenshots of the pages in those clusters, and have links for more information about the clusters, DOMs, etc.

Let me know if you have any questions about any of this, I would be happy to answer them. Good luck! Happy Bidding!

Skills: C Programming, HTML, Java, MySQL, PHP

See more: cluster analysis mysql, what is pair programming, what is database programming, what can you do with java programming, web programming using java, web enabled programming in c#, programming hash, pair programming, lowest common, java database programming, how to make data analysis report, how hard is programming, hash programming, hard questions to answer, execution table, dom programming, database programming sql, c programming web crawler, code on time, change jar, bottom up programming, answer set programming, what is a crawler, postgresql any, mysql postgresql

Project ID: #3996385

4 freelancers are bidding on average $208 for this job


I am Java expert. I am want to help you here. Please check your personal inbox for more details. I will wait you. Thanks, AMit

$250 USD in 7 days
(100 Reviews)

Hello sir. I read all your requirements. And i am good at all that. Please check attached doc for my previous works. Hope to hear from you soon. Thanks!!

$200 USD in 10 days
(41 Reviews)

HI I am confident to handle this will work until you are satisfied Thanks With REgards i am keenly interested in this project

$195 USD in 4 days
(16 Reviews)

Petra is a developer group experienced 5-years in web development, desktop programming and database design and programming. We have excellent expertise in web Development languages and tools (PHP, JOOMLA, DRUPAL, Mag More

$185 USD in 7 days
(1 Review)