
-----------------------------------
Hikaru79
Tue Aug 02, 2005 7:24 pm

Unicode in Java
-----------------------------------
I'm trying to get Unicode working in Java, and its not going smoothly. Rizzix, particularly, I'm hoping you can help  :oops: 

Now, a regular FileReader can only handle ASCII, so I'm using the following object:in = new BufferedReader(new InputStreamReader (new FileInputStream(confFile),"UTF-8")); I try to read from a file with that. I have it output both to console (System.out.println) and to Swing (JOptionPane.showMessageDialog). In both cases, the Unicode segment of the input appears as '????'. I am compiling the code with the following paramaters:$ javac -encoding UTF-8 Test.java Nothing seems to work =/ Any ideas? Isn't Java supposed to have Unicode support "from the ground up"?  :?

-----------------------------------
rizzix
Tue Aug 02, 2005 7:43 pm


-----------------------------------
may i have that file? the one ur trying to read... attach it please..

-----------------------------------
Hikaru79
Tue Aug 02, 2005 8:22 pm


-----------------------------------
The one I'm trying to read? Sure. Here it is.

Thanks in advance :)

EDIT: Again, this is with UNIX line breaks, so opening it in Notepad will make it look strange.

-----------------------------------
rizzix
Tue Aug 02, 2005 10:12 pm


-----------------------------------
ok i tried opening your file in a UTF8 editor.. Same results. Your file is not encoded in UTF8 format. 

Try UTF16, it should work. Oh and ehm dont mix ASCII and UTF16, ehm, stick to one format..

-----------------------------------
Hikaru79
Wed Aug 03, 2005 5:52 am


-----------------------------------
Okay, hmm... I'll try that.
Oh and ehm dont mix ASCII and UTF16, ehm, stick to one format..
Isn't ASCII a subset of UTF16?

-----------------------------------
rizzix
Wed Aug 03, 2005 10:43 am


-----------------------------------
No it's a subset of UTF8

-----------------------------------
Hikaru79
Wed Aug 03, 2005 5:28 pm


-----------------------------------
No it's a subset of UTF8
And isn't UTF8 a subset of UTF16?  :? Man, I'm confused, since I've never even bothered dealing with internationalization until know. And this time it has to happen right =/

Maybe if I re-ask the question. I'm trying to achieve a model whereby the program can deal with (input/output) any of the the four following languages and scripts: English, Chinese, Japanese (kana and kanji), and Korean. Will this be ridiculously difficult? If not, how can it be done? If so, where can I go to find out how it can be done? ^_^;

-----------------------------------
rizzix
Wed Aug 03, 2005 7:14 pm


-----------------------------------
UTF16 is a 2-byte character format.. while UTF8 is a 1-byte format (actually i think its a variable byte format). ASCII is 1-byte as well, so all ASCII characters can be represented in UTF8. UTF8 also ensures that the ASCII characters retain their same old ASCII codes. The rest of the characters,, well I'm not sure how it represents them...

Java has great internationalization support. Hence it should be easy. Since internationalization is not a critical issues for the common developer, you rarely see any good tutorials on it. Companies like IBM, Apple, Microsoft, etc do have tutorials, but they usually require you to register or something first. Some of them are not free.

I would suggest you take a look into these articles hosted on sun:
http://java.sun.com/developer/technicalArticles/Intl/index.html

And then there's this: http://java.sun.com/docs/books/tutorial/i18n/index.html

Hopefully they are of some use to you.

-----------------------------------
Hikaru79
Wed Aug 03, 2005 8:51 pm


-----------------------------------
As always, Rizzix, you have been of great help! ^__^ I looked through those and they look mighty helpful. They're being sent off to the printer now ;) Thanks!

-----------------------------------
Hikaru79
Sat Aug 27, 2005 10:26 am


-----------------------------------
A-ha! Finally, success! :) Thanks, rizzix, problem solved!
http://thegoban.com/images/java_utf8.png

-----------------------------------
rizzix
Sat Aug 27, 2005 1:03 pm


-----------------------------------
cool. maybe you could write a tutorial to share that knowledge.. hmm!  :wink:

-----------------------------------
Hikaru79
Sat Aug 27, 2005 10:34 pm


-----------------------------------
cool. maybe you could write a tutorial to share that knowledge.. hmm!  :wink:
Deal :D I'll get to it tonight, hopefully.
