Statistical Web Intelligence

to evaluate one or more ways of encoding unstructured text so that sensible reasoning can be done about page contents or the relationships between different web pages

An increasing amount of "web intelligence" research ideas (as well as existing applications) depend on being able to reason about the content of web pages based purely on the statistics of the words contained in them. Understanding pages based on natural language processing is extraordinarily difficult and has so far had only minor success in the domain of unstructured free text. However understanding whether two different web pages are about similar topics *can* be done, based on "bag of words" statistics. There are lots of research issues here, and lots of unanswered questions. Projects in this line will address these issues and questions. In all cases it is likely that the student will need to be able to write a basic simple parser that can find all words in the page and their frequencies. Hence, any given web page can be converted to a real-number vector.

(1) find good ways to visualise a set of pages in two dimensions, based on a Self-Organising map, or by using a genetic algorithm to optimise the clustering of the pages.

(2) build vectors based on *pairs* of words rather than single words, which may lead to better clustering of pages.

(3) investigate the accuracy of a variety of machine learning methods (e.g. decision trees, and/or a variety of things which can be implemented easily by downloading weka) for classifying pages into categories based on their vector encodings.

(4--10) many other possibilities.

For test data we will use categorised sets of pages from [url removed, login to view] and/or similar

Skills: Graphic Design, HTML, PHP, Scientific Research, Website Design

See more: weka web, web vectors free, website free vector, web research projects, web page self design, web page language, web page design topics, web page design free vector, web design student projects, vectors web design, vector of pairs, vector in c language, trees in algorithm, statistics algorithm, statistical algorithm, set of pairs, self learning web design, research design methods, relationships between use cases, map of vectors, machine learning web design, learning web design free, ideas for web design projects, free website to find a research, free website design applications

About the Employer:
( 0 reviews ) Dubai, United Arab Emirates

Project ID: #4146313

13 freelancers are bidding on average $669 for this job


hi,just check PM for portfolio,and read our reviews. Its would be your guidance.(Why we are best for our project)

$750 USD in 12 days
(1162 Reviews)

Freshwebsites is a design and web development company, technology services and branding company. Combining unparalleled experience, comprehensive capabilities across the design and development industry and business fun More

$1000 USD in 15 days
(26 Reviews)

We have all the required designer and developer you are looking for and can allocate dedicated resources for your project. You will be very happy and satisfied of our services.

$1000 USD in 18 days
(70 Reviews)

Dear Customer! I am an expert PHP/MySQL developer with over 6 years of experience and very interested in this project. Available to start immediately and finish as soon as possible. My bid is for fast professional s More

$750 USD in 7 days
(146 Reviews)

Hi, Experts team of SEO/Wordpress/PHP/joomla/Drupal developers and designers. Please check PM for detail. Thanks, Gaurav

$700 USD in 10 days
(108 Reviews)

>>>>>>>>>>>> Hello Sir, I read carefully your requirement. Please check your PMB.<<<<<<<<<<

$700 USD in 12 days
(20 Reviews)

please check inbox

$600 USD in 40 days
(5 Reviews)

Hello, Please check your PMB. Thanks Have a lovely day.

$700 USD in 5 days
(15 Reviews)

Hello, Please go through the PMB.

$600 USD in 12 days
(10 Reviews)

hi let's start to complete the work, thanks

$550 USD in 10 days
(7 Reviews)

Respected Sir!!!! I've read you requirements and i'm ready to work with u... although we're new to freelancer but we've worked a lot in local markets.. so all i want is u to trust me.. and i wont let it down... for More

$450 USD in 10 days
(0 Reviews)

Hi, we are team of data scientists [url removed, login to view] We can try to solve you problem with Restricted Boltzmann Machines and neural networks. We have our own data mining library with implementations of state-of- More

$750 USD in 14 days
(0 Reviews)

Hello, I am an experienced statistician who is also fluent in php and html. Please see your personal message board for more details. Regards Steve Long

$750 USD in 28 days
(0 Reviews)

hi, I have knowledge in the field of econometric and time-series analysis and i think it'll be useful for this project, PM Sent

$250 USD in 15 days
(0 Reviews)

Dear, I am a PhD researcher working on a very close area to your project. I can try various ML techniques such as TF/IDF (Cosine Similarity Metric and others) to cluster or even rank the pages using other techniques. More

$750 USD in 20 days
(0 Reviews)