Closed

PHP script to build database of most common words on internet

This project was awarded to hiddenpearls for $247 USD.

Get free quotes for a project like this
Employer working
Awarded to:
Skills Required
Project Budget
$30 - $250 USD
Total Bids
9
Project Description

Implement a PHP class that takes as an input a single web page address (URL). When called, download and read the web page's text content (i.e. remove all html tags, javascript etc) and update to a database table information about the most common words found, with relation to the web page's TLD. For example, if the system is called with input URL "[url removed, login to view]", the TLD would be ".com", or if it is called with "[url removed, login to view]", the TLD would be ".[url removed, login to view]" The list of possible TLD's can be found e.g. from: [url removed, login to view]

In other words, the idea is that the system will generate a huge database table that contains information about the most common words found in web pages of different TLD's. The common word matching is, of course, not case sensitive. A word is defined as a string that has 2-100 characters, only characters from A to Z.

The table must contain information how many times any of the words have been found, and when last time (i.e. date).

So, the fields of the table could be:
id - integer - auto increment
word - varchar(100)
hits - integer (number of times this word has been found in different web pages)
last_hit - date (the date when this word was last found)
country - varchar(2) - the two letter ISO code of the country from which this web page containing this word was found).

The hits value gets increased every time the same word is found from different pages. In other words, if a web page contains word "foobar" 10 times, it is still added to the table with a hit count of 1. When the word "foobar" is found from some other web page of the same country, the hits counter is increased by one.

I will use the hits and last_hit data to prune the database table so it does not grow too big. I want to build a table of all the most common words found online, not all words. The job must be implemented using object oriented design using PHP classes. For database, use MySQL. You must develop the script in your own server, you are not given any server access to the production use server.

Looking to make some money?

  • Set your budget and the timeframe
  • Outline your proposal
  • Get paid for your work

Hire Freelancers who also bid on this project

    • Forbes
    • The New York Times
    • Time
    • Wall Street Journal
    • Times Online