[LWN Logo]

Date: Wed, 17 Feb 1999 02:42:01 GMT
From: JDCTechTips@sun.com
Subject: JDC Tech Tips Vol. 2 No. 7
To: JDCMember@sun.com

J  D  C    T  E  C  H    T  I  P  S

                      TIPS, TECHNIQUES, AND SAMPLE CODE
                     
   WELCOME to the Java Developer Connection(sm) Tech Tips, Vol. 2 No. 7.  
   
   This issue covers:
                      * Converting Pathnames to URLs
                      * Using Vector in the Collection Framework
                      * Reading/Writing Unicode Using I/O Stream Encodings


- - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - -

T I P S ,  T E C H N I Q U E S ,   A N D   S A M P L E   C O D E



CONVERTING PATHNAMES TO URLS

A new feature of the Java(tm) 2 Platform is the File.toURL method, which is
used to convert a pathname specification to a URL (Uniform Resource
Locator, used on the Web).

A simple example that illustrates this method is:

        import java.io.*;
        import java.net.*;
        
        public class url {
                public static void main(String args[])
                {
                        if (args.length != 1) {
                                System.err.println("missing filename");
                                System.exit(1);
                        }
                        File f = new File(args[0]);
                        try {
                                URL u = f.toURL();
                                System.out.println(u);
                        }
                        catch (MalformedURLException e) {
                                System.err.println(e);
                        }
                }
        }

For input of:

        $ java url paper.txt    (current directory is t:\tmp)

output is:

        file:/T:/tmp/paper.txt

and this URL can be specified to view the local file in Netscape or
Microsoft web browsers.

Such a method is useful in applications that have to treat local
pathnames and web-based resources in a uniform way.


USING VECTOR IN THE COLLECTION FRAMEWORK

Collections are a new feature of the Java 2 Platform, and are described in
detail in various articles available on the Java Developer Connection.
Collections are used to organize and operate on groups of data elements.
For example, ArrayList is a replacement for Vector, and HashMap is similar
to Hashtable.

The old classes such as Vector are still available, but the new ones are
preferred.  So an obvious question is how to convert between old and new.
You might, say, have a Vector object in an application, and you want to
call a method that takes an ArrayList argument.  One way of doing such a
conversion is illustrated by the following example:

        import java.util.*;
        
        public class convert {
                public static void process(ArrayList al)
                {
                        for (int i = 0; i < al.size(); i++)
                                System.out.println(al.get(i));
                }
        
                public static void main(String args[])
                {
                        Vector vec = new Vector();
        
                        vec.addElement("123");
                        vec.addElement(new Integer(456));
                        vec.addElement(new Double(789));
        
                        process(new ArrayList(vec));
                }
        } 

A Vector is created, and several elements added to it.  Then the process
method is called, and it is passed an ArrayList object, one created via a
constructor that takes a Vector argument.  More precisely, what is
happening here is that there is an ArrayList constructor that takes a
"Collection" interface argument, and Vector has been retrofitted to
implement the Collection interface, and so an ArrayList can be created from
a Vector via this constructor.

There are a number of other conversion mechanisms available in the
collection framework, for hooking together old and new code.


READING/WRITING UNICODE USING I/O STREAM ENCODINGS

The Java programming language uses two-byte Unicode characters, while
one-byte characters are common in other languages such as C (which uses
ASCII).  An obvious question that comes up is therefore:  how are Java
characters stored in disk files, and how can the Java language make use of
the huge quantity of data out there that is encoded in ASCII?

When the JDK(tm) software, such as version 1.0.2, first became available,
this problem hadn't been solved.  For example, DataInputStream.readLine is
a method for reading lines of input, but it fails to properly convert bytes
to characters, and is now deprecated.  You won't necessarily notice this
failure until you start to more fully use the Unicode character set.

This problem has been solved by means of the Reader and Writer I/O
classes. These sit on top of a byte stream (such as FileInputStream),
and apply encoding bytes -> characters or characters -> bytes.

There's an encoding that is applied by default, and you can determine
its name via a small program:

        public class encode {
                public static void main(String args[])
                {
                        String p = System.getProperty("file.encoding");
                        System.out.println(p);
                }
        }

On my machine, running Java 2 software, this prints out "Cp1252", which is
a code for:

        Windows Western Europe / Latin-1

A table of encodings can be found at:

        http://java.sun.com/products/jdk/1.1/intl/html/intlspec.doc7.html

If you want to directly specify encodings, one way of doing so is
illustrated by the following program, which writes all the lower case
letters of the Unicode alphabet to a file. Some of these characters
have a non-zero high byte (that is, they are greater in value than
'\u00ff'), and preserving both bytes of the character is therefore
important. The encoding used is one called UTF-8, which has the property
of representing ASCII text as itself (one byte), and other characters as
two or three bytes.

        import java.io.*;
        
        public class enc1 {
                public static void main(String args[])
                {
                        try {
                                FileOutputStream fos =
                                    new FileOutputStream("out");
                                OutputStreamWriter osw =
                                    new OutputStreamWriter(fos, "UTF8");
                                for (int c = '\u0000'; c <= '\uffff'; c++) {
                                        if (!Character.isLowerCase((char)c))
                                                continue;
                                        osw.write(c);
                                }
                                osw.close();
                        }
                        catch (IOException e) {
                                System.err.println(e);
                        }
                }
        }
        
This program reverses the process:

        import java.io.*;
        
        public class enc2 {
                public static void main(String args[])
                {
                        try {
                                FileInputStream fis =
                                    new FileInputStream("out");
                                InputStreamReader isr =
                                    new InputStreamReader(fis, "UTF8");
                                for (int c = '\u0000'; c <= '\uffff'; c++) {
                                        if (!Character.isLowerCase((char)c))
                                                continue;
                                        int ch = isr.read();
                                        if (c != ch)
                                                System.err.println("error");
                                }
                                isr.close();
                        }
                        catch (IOException e) {
                                System.err.println(e);
                        }
                }
        }

InputStreamReader and OutputStreamWriter are the classes where byte
streams are converted to character streams and vice versa.

This issue is quite an important one if you are concerned with writing
applications that operate in an international context.

.  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .  .

-- NOTE 
The names on the JDC mailing list are used for internal Sun
Microsystems(tm) purposes only.  To remove your name from the list, 
see Subscribe/Unsubscribe below.


-- FEEDBACK 
Comments?  Send your feedback on the JDC Tech Tips to:

JDCTechTips@Sun.com


-- SUBSCRIBE/UNSUBSCRIBE 
The JDC Tech Tips are sent to you because you elected to subscribe 
when you registered as a JDC member.  To unsubscribe from JDC Email, 
go to the following address and enter the email address you wish to 
remove from the mailing list:

http://developer.java.sun.com/unsubscribe.html


To become a JDC member and subscribe to this newsletter go to:
    
http://java.sun.com/jdc/
    
    
-- ARCHIVES
You'll find the JDC Tech Tips archives at:

http://developer.java.sun.com/developer/javaInDepth/TechTips/index.html


-- COPYRIGHT 
Copyright 1999 Sun Microsystems, Inc. All rights reserved.
901 San Antonio Road, Palo Alto, California 94303 USA.

This document is protected by copyright.  For more information, see:

http://developer.java.sun.com/developer/copyright.html


The JDC Tech Tips are written by Glen McCluskey.

JDC Tech Tips Vol. 2 No. 7
February 16, 1999