Script identifier for Unicode strings in Perl - Linux

CLOSED
Bids
10
Avg Bid (USD)
$55
Project Budget (USD)
$10 - $30

Project Description:
I want to build a Perl script to identify the "script" that a particular UTF-8 string is written on. For example, given the strings:

دجنبر --> Arabic
децембар --> Cyrillic
João --> Latin
נאָװעמבער --> Hebrew
กันยายน --> Thai
цембJair --> Mixed

by looking at the "Script" property of each character and checking if they all belong to the same script and in this case report the name. If the string is a mix of two scripts then it should return "Mixed".

The best way to get there would be to use the program "uniname" and echoing the string into it

echo กันยายน | uniname -b -g -c -e -r -u -n

and then process the output:

range
Thai
Thai
Thai
Thai
Thai
Thai
Thai
Basic Latin

to eliminate the first line (a header) and the last line (it corresponds to the LINE FEED at the end of the word). If all character belong to the same range, then report that range. If not, return the word "Mixed".

The program is available from here:

http://billposer.org/Software/unidesc.html

Paulo Ney

Skills required:
Perl
About the employer:
Verified
Public Clarification Board
Bids are hidden by the project creator. Log in as the employer to view bids or to bid on this project.
You will not be able to bid on this project if you are not qualified in one of the job categories. To see your qualifications click here.


Hire PerlIsFun
$ 30
in 3 days
$ 15
in 2 days
$ 20
in 1 days
Hire fenster
$ 30
in 2 days
$ 300
in 7 days
Hire jiefoxi
$ 30
in 1 days
Hire ErwanMas
$ 50
in 1 days
Hire maverick3
$ 20
in 3 days
Hire rnhiga
$ 25
in 1 days
$ 30
in 6 days