Develop a Photo Clustering System

In Progress

The requirement is to build a process/pipeline that can take a table (literately a database table) of information about geographically located photos, and place them into meaningful but subjective groups or clusters.

There are many 'dimensions' to the data that could be used to perform the clustering, including geographical coordinates, locality (town/country etc), date taken, textual tags (Folksonomy), and photographer. There is also freeform title and description, but we've already extracted automated terms from these, so don't need to process freeform text.

All of these should/could be used to perform the clustering, eg "taken by Joe blogs in April 2012" could be a arbitrary cluster. Clustering should ideally make use of the geographical coordinates, to create clusters of nearby photos (which have some other theme - such as taken by a particular user), but not limited to it, where possible multiple dimensions should be used. The photographer is a good candidate for clustering because often a given photographer will take similar photos in the same geographical area on any given day.

It will require two modes, 1) 'priming' where a large number (over 3 million ultimately!) of photos are taken and put into clusters. and 2) 'updates' where batches of images are added (about 1000 at a time), which require placing into the existing clusters or creating new ones.

The 'update' mode should aim to where possible add to current clusters , it could delete and then recreate some clusters if how have a better fit, but also needs to be able to create new clusters where needed. In particular, it should be differential, most clusters will remain the same, only a few changing, it shouldn't just delete all the clusters and start again. The two modes are closely related, and will be largely similar probably (eg priming could just be lots of 'updates' with initially no clusters, but there could be some optimization possible to tailor for the two modes.

The aim would be to have every photo placed in one or more cluster, and ideally clusters should be somewhere on the order of 5-200 images. If a cluster grows much beyond 200 it should be a candidate for splitting. Ideally each cluster should have a label that describes it eg "photos near Reading"

If K-means or similar is used to cluster geographically, it should be an adaptive algorithm, without having to specify K. ie it works out a good number of clusters to create, not aim to create say 30 clusters. [url removed, login to view]~wilkinson/Applets/[url removed, login to view]

A sample dataset can be supplied (say a table of 120,000 images), but the 'full' data set of [url removed, login to view] images could be used too. For a tiny sample, showing the range of columns available, see [url removed, login to view]

It can be written in any language (PHP, Python, Java etc), but needs to be able to run fairly self contained on a Linux server. MySQL would be the ideal backing database (downloading the data from mysql, and creating the clusters in a mysql table) - but others can be considered if offer a tangible benefit (eg postgre/postgis).

The full source code - and the means to compile/run it will be required. The eventual aim would be to release the source as opensource. (keep the credit yourself, or assign it to us)

To be clear the requirement is not to come up with the perfect clustering system, as noted the clusters are subjective. But to build the framework - with a working clustering method - but so that the exact parameters can be tweaked as required.

Skills: Algorithm, Big Data, Data Mining, Data Processing, Software Architecture

See more: photo clustering, working as a photographer, we need a photographer, us algorithm, set algorithm, python updates, photographer needed uk, photographer for a day, make algorithm, how to make an algorithm, how to make algorithm, how to create an algorithm, how algorithm works, good algorithm, develop an algorithm, develop algorithm, data dimensions, cs mode, creating algorithm, a new mode, an algorithm is a, algorithm set, algorithm sample code, algorithm of the day, $1 k means

Project ID: #4405177

Awarded to:

diegoforteza

Hi I'm a statistician form Uruguay and I have plenty of experience in data mining and data analysis

£935 GBP in 30 days
(1 Review)
2.1

10 freelancers are bidding on average £734 for this job

hsndehghan

Hi, I can help you.

£735 GBP in 3 days
(1 Review)
2.7
luchenggang

Hello, greatly interested in serving for you, any difficulies will be no problem for me in image clustering.

£750 GBP in 13 days
(1 Review)
2.8
rameshrajan29

Hello, We specialize in Image clustering and will be able to complete the task as per your specifications. Please find complete details over PM. We hope to hear from you at the earliest.

£770 GBP in 20 days
(1 Review)
2.4
romanuwa

I can do this task for you, please check PMB.

£525 GBP in 15 days
(3 Reviews)
2.1
hiddenboy

I can help you in your project.

£400 GBP in 25 days
(1 Review)
1.0
sandyshankar

Hi Barry, I run a company specializing in Machine Learning and Social Media Analysis. I have submitted a detailed approach in my private message. Thank you.

£1100 GBP in 30 days
(0 Reviews)
0.0
mephistopheies

Hi, we are team of data scientists http://neuroximator.com/. There are some ways to do it: Kohonen neural networks, hierarchical clustering and so on. Testing can show best way.

£825 GBP in 21 days
(0 Reviews)
0.0
MujtabaAwan

We have developers with skills required to do this project and can provide you best solution in php or python.

£550 GBP in 25 days
(0 Reviews)
0.0
kumarkapil343

hire me...!!!! i have done master in cs and working in hcl for past 5 years i can do this you you in 30 day

£750 GBP in 30 days
(0 Reviews)
0.0