Java employs Unicode in the following sense:(Submitted by Dave Forster)
- The "char" data type is defined to be a Unicode type.
- Strings, since they are composed of char data, are therefore also Unicode-based.
- Java identifiers can contain Unicode characters. You can specify Unicode characters using the \u escape sequence.
Thus, if you read and write data using the typed methods in the DataInputStream and DataOutputStream classes, you will be able to store and retrieve Unicode data (and other Java data types as well).
To write Ascii data you may use the FileOutputStream.writeChars() and DataOutputStream.writeBytes() methods (and similar methods for reading). Note that to create a DataOutputStream, you must first create a FileOutputStream, and then construct the DataOutputStream from that.
// write unicode data as unicode FileOutputStream ufos = new FileOutputStream("test.ucd"); DataOutputStream udos = new DataOutputStream(ufos); udos.writeChars("ABCDE"); // writes Unicode udos.close(); // write unicode data as ascii FileOutputStream xfos = new FileOutputStream("test.xxx"); DataOutputStream xdos = new DataOutputStream(xfos); xdos.writeBytes("ABCDE"); // writes Ascii xdos.close(); // write ascii data FileOutputStream afos = new FileOutputStream("test.asc"); byte abytes[] = {65, 66, 67, 68, 69}; // "ABCDE" afos.write(abytes); // writes bytes - in this case Ascii afos.close();Also, the class StreamTokenizer assumes Ascii input. Thus,
// tokenize an ascii file FileInputStream fis = new FileInputStream("test.asc"); StreamTokenizer tokenizer = new StreamTokenizer(fis); int token = tokenizer.nextToken(); System.out.println("token=" + tokenizer.sval); // prints "ABCDE" fis.close(); // tokenize an ascii file // This also works on a DataInputStream fis = new FileInputStream("test.asc"); DataInputStream dis = new DataInputStream(fis); tokenizer = new StreamTokenizer(dis); token = tokenizer.nextToken(); System.out.println("token=" + tokenizer.sval); // prints "ABCDE" dis.close(); // attempt to tokenize a Unicode file fis = new FileInputStream("test.ucd"); tokenizer = new StreamTokenizer(fis); token = tokenizer.nextToken(); System.out.println("token=" + tokenizer.sval); // prints "A" - the tokenizer interprets the high byte of // "B" as whitespace fis.close(); // attempt to tokenize a Unicode file by creating a DataInputStream fis = new FileInputStream("test.ucd"); dis = new DataInputStream(fis); tokenizer = new StreamTokenizer(dis); token = tokenizer.nextToken(); System.out.println("token=" + tokenizer.sval); // prints "A" - the tokenizer interprets the high byte of // "B" as whitespace dis.close();
You can certainly enter unicode in your Java programme, BUT you have to do it using ASCII characters. The following should work, for examplepublic class fiddle { public static void main(String arg[]) { int \u1261 = 1; System.out.println("\\"+"u1261 is "+\u1261); } }(1261 is a Japanese or Chinese character, I think)