In Progress

popular property keywords

FAO gangabass

We would like to analyse which are the most popular words in estate agents' property descriptions. Please can you implement a perl application [url removed, login to view]

The data will be recorded in a mysql table site_house. site_house will include 2 columns first_house_loader_id and full_desc. I’ll supply some sample data later.

DBI/DBD::mysql library should be used to connect to the database. Please implement a function connecttodb() (returns $dbh) which I can later override with our existing function. Mysql user/password/db should be hard-coded within connecttodb().

For each first_house_loader_id the application must choose the longest available full_desc.

The application should iterate through all the first_house_loader_ids and choose the longest full_desc for each first_house_loader_id. In some cases no full_descs are set for a first_house_loader_id, in which case this full_desc should be ignored and not be counted in any stats.

full_desc may contain html. We need to ensure we convert from HTML to text including converting html special chars to text. Please implement a function htmltotext() which I can later override with our existing function.

The app should remove:

1) characters that aren't part of words - but may "connect" words together without a space character, e.g. (),.^!:;*+-/"@_\?

2) all 1 and 2 letter words (e.g. a, in, an, etc.) - and all digits only words (e.g. phone numbers).

3) apostrophes

The application should output the results to ~/logs/top_description_words/[url removed, login to view]

The application should report on the top 1000 [configurable] most common words. For each word it should report on the number of full_descs (tested) in which the word appears. The application should report on how many full_descs were tested.

After the first run, we may find that we want to group together singulars/plurals or synonyms. So the application needs a feature where we can hard-code synonyms in a hash and count them as one and then report on the synonymous values at the end eg the first run my output might include

property 66

properties 54

Having tied these together the 2nd run might include output

property/properties 92

(92 is less than 66+54 because some descriptions will have contained both words)

The code needs to be well commented and well laid out to demonstrate that the coder is skilful enough to help with further projects. The app should include header comments explaining the aim and outline mechanism of the application. Each subroutine should come with comments explaining inputs and outputs.

Thanks,

Ben Horton

Skills: Perl

See more: popular property keywords, where can i find a html coder, thanks letter sample, so ben, sample of application letter, property agents, letter of application sample, find word html coder, fao, character outline, application letter sample, how can i find my phone number, cases for less, most popular, find an app coder, text inputs, letter G, find app coder, dbh, analyse some data, analyse pl, mysql output html, convert existing projects, property code, html table txt

About the Employer:
( 5 reviews ) Cambridge, United Kingdom

Project ID: #1236786

Awarded to:

gangabass

As we discuss it...

£60 GBP in 2 days
(144 Reviews)
5.9

2 freelancers are bidding on average £65 for this job

ananya098

Hi, I can do this and I'm sure you will not be disappointed with the quality of work. Looking forward to working with you. Thanks, Ananya

£70 GBP in 15 days
(0 Reviews)
0.0