I have dataset of 500 000 over rows of data. Each row has the following columns of data:
1) a person's name (first name OR last name),
2) this person's ethnicity. There are a total 18 different ethcnicity groups in the data. Such as "english", "chinese", "russian", "indian", "german", "latino" etc.
3) popularity of this name among this ethnicity. That is, how many times our system has detected this name of a person and this person was from this ethnicity group.
4) popularity of this name among other ethnicity. That is, how many times our system has detected this name of a person and this person was from other ethnicity group.
A sample of the dataset is attached to this task as a CSV file.
Your job is to create a program or a script that will take as an input a name of a person (first name and last name, or only one name element) and it will output its guess as to what ethnicity group this person belongs to based on the training of the dataset AND a confidence or probability number which tells how sure the system is of this ethnicity being the correct answer.
For example, should the program receive input "john smith", it should output ethnicity class "english" and a confidence number as to how sure the system is "john smith" is "english".
Thus, this is basically a kind of classic string classification problem.
The code must be implemented in a way it can guess the ethnicity of person whose name does not exist in the dataset. In other word, the code must be some sort of learning system (such as artificial neural network system which has been trained using the sample dataset), OR it uses other ways to extract traits from the names which hint as to which ethnicity a person most likely is, for example that of n-grams, bayesian analysis or something else.
The code must not be simply a search algorithm which searches the dataset against hits and in case there are no hits (e.g. if name 'john' does not exist in the dataset but user input is 'john', the system cannot produce any guess that 'john' sounds like "english" name).
The code should be done in either PHP 7.x, or in a way it can be called from PHP script (e.g. Perl or Python script, for example).
In your bid, please tell me what kind of method you would use.
Hello, I have worked with NLTK problems before, so I believe the job won't be a problem for me. Please check my profile and feel free to ask me anything! I would use Neural network for predictor. Regards Žiga
20 freelancers are bidding on average $209 for this job
hii sir How are you doing I have good experience in this field and i can do your work in best possible way, kindly text me so that we can discuss the work in more details thanks ...........
hi sir i am computer engineer an as well a certified labview developer so i am intersted to do that in Labview and in general i am intersted in this type of work so if you accept we can cooperate