Saturday, December 3, 2016

Unicode method of writing Marathi Characters


About Unicode and UTF - 8 Encoding  
for Devanagari ( Marathi ) 

 Unicode is a character set. UTF-8 is encoding.

The first 128 characters of Unicode (which correspond one-to-one with ASCII) are encoded using a single octet with the same binary value as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well.

Devanagari Unicode block is from  U+0900  To U+097F with total number of 128 characters  (Hexadecimal values from 00 to 7F from 0900 to 097F )

HTML 4 supports UTF-8. HTML 5 supports both UTF-8 and UTF-16! Unicode U+0915 if written as      when the  header of html page has declaration as  !DOCTYPE   html  in the first header tags.
 in html page will display as  

UTF-8 can be represented in webpage by 

Unicode Code Point (from  U+0900  To U+097F) 

OR

UTF-8 in literal format 

(from  \xe0\xa4\x80 to \xe0\xa5\xbf)

OR

Numerical equivalent decimak values

( from  2304 to 2431)



As shown  in my earlier blog the unicode characters for Marathi use single number from  U+0900  To U+097F to express full character.

However, Marathi character is formed by adding vowel to consonant. In order to display only consonant we have to convert Marathi character by adding half character sign (   ्   )  .

Thus we require two unicode characters in sequence to display consonant.

 Normal method 

Consonant + Vowel = Character
क् + अ  = क      

Unicode method 

Character - क represented by  
Consonant  - क + ्   = क्   represented by 

We can write all characters of बाराखडी (Barakhadi) in this form. But it is not needed as we can use only respective vowels to unicode character.

Normal method       Unicode method
क्  + आ = का               क + ा = का
क्  + इ = कि                क + ि = कि
क्  + ई  = की               क + ी  = की
क्  + उ  = कु                क + ु  = कु
क्  + ऊ  =कू                क + ू  = कू
क्  + ऋ  = कृ               क + ृ = कृ
क्  + ए   = के              क + े = के
क्  + ऐ   = कै              क + ै  =  कै
क्  + ओ = कौ             क + ौ   =  कौ
क्  + अं  = कं              क + ं   =  कं
क्  + अः = कः             क + ः  =  कः

For writing complex characters, in Unicode method, the character is first converted to consonant and then other character is added.

क + ् + क = क्क  
This method is used even if complex character is formed by two or more consonants.

No comments:

Post a Comment