I want to build a Perl script to identify the "script" that a particular UTF-8 string is written on. For example, given the strings:
دجنبر --> Arabic
децембар --> Cyrillic
João --> Latin
נאָװעמבער --> Hebrew
กันยายน --> Thai
цембJair --> Mixed
by looking at the "Script" property of each character and checking if they all belong to the same script and in this case report the name. If the string is a mix of two scripts then it should return "Mixed".
The best way to get there would be to use the program "uniname" and echoing the string into it
echo กันยายน | uniname -b -g -c -e -r -u -n
and then process the output:
to eliminate the first line (a header) and the last line (it corresponds to the LINE FEED at the end of the word). If all character belong to the same range, then report that range. If not, return the word "Mixed".
The program is available from here:
[url removed, login to view]
10 freelancers are bidding on average $55 for this job
Interesting problem :) I'd just like to work it out. I suppose the uniutils you're referring to are available on the platform you want to run the code, so this project can use them as they are? Thank you.
Can help... I am an Expert... Please check the past projects I have handled and check my reviews for what employers have to say about my work... Can start right now...