Unicode Conversion from/to String

In java, text is internally stored in Unicode format. If input/output is in differnt format then conversion is required.

Conversion

Following example will showcase conversion of a Unicode String to UTF8 byte[] and UTF8 byte[] to Unicode byte[].

Example

Open Compiler

importjava.io.UnsupportedEncodingException;importjava.nio.charset.Charset;importjava.text.ParseException;publicclassI18NTester{publicstaticvoidmain(String[] args)throwsParseException,UnsupportedEncodingException{String unicodeString ="\u00C6\u00D8\u00C5";//convert Unicode to UTF8 formatbyte[] utf8Bytes = unicodeString.getBytes(Charset.forName("UTF-8"));printBytes(utf8Bytes,"UTF 8 Bytes");//convert UTF8 format to UnicodeString converted =newString(utf8Bytes,"UTF8");byte[] unicodeBytes = converted.getBytes();printBytes(unicodeBytes,"Unicode Bytes");}publicstaticvoidprintBytes(byte[] array,String name){for(int k =0; k < array.length; k++){System.out.println(name +"["+ k +"] = "+ array[k]);}}}

Output

It will print the following result.

UTF 8 Bytes[0] = -61
UTF 8 Bytes[1] = -122
UTF 8 Bytes[2] = -61
UTF 8 Bytes[3] = -104
UTF 8 Bytes[4] = -61
UTF 8 Bytes[5] = -123
Unicode Bytes[0] = -58
Unicode Bytes[1] = -40
Unicode Bytes[2] = -59

Comments

Leave a Reply

Your email address will not be published. Required fields are marked *