Binary vs Text Files
Java IO supports two distinct kinds of files: text files and binary files. To understand difference between these two let’s look at an example. Say we have text file encoded using UTF-8 that contains text: “Привет мир!” (“Hello world!” in Russian), when written to disk this file consists of bytes:
$ xxd -g1 hello_ru.txt 0000000: d0 9f d1 80 d0 b8 d0 b2 d0 b5 d1 82 20 d0 bc d0 ............ ... 0000010: b8 d1 80 21 0a ...!.
Let’s say that we want to print contents of this file to standard output, we can read bytes contained in the file using the following code:
When ran this program prints:
Pretty much the same output as returned by
xxd. This is because
a byte stream - a stream that treads file contents as array of bytes. When we call
told stream to read next single byte from file. But wait our file contains text in Russian not
some bytes. To read text we must use another kind of stream called character stream, here is next
program that correctly prints “Привет мир!” to standard output:
This time we are using
InputStreamReader to read file contents.
When we call
we get next character from file (which may consists of one or more bytes).
InputStreamReader to work must known encoding used to
create file, in our case we pass
"UTF8" encoding as a
To sum up table below presents classes used to read and write text and binary files in Java:
Last two classes
FileWriter are equivalent to using
Charset.defaultCharset().name() as encoding.
Default encoding (charset) may differ between various JVM’s, it is always better to use
OutputStreamWriter with explicit encoding than to relay on
Finally let’s see class diagrams that connects classes described above with abstract classes
So far we were using only unbuffered streams, when we call
read() JVM invoked some OS functions that
actually read byte(s) from file. Since calling OS functions is slow this approach is inefficient.
We could do better if we read bytes in big
chunks say by 8KB and then store them in some internal array and serve next 8192
that internal array without any further OS calls.
Fortunately we don’t need to write such a buffered streams ourselves, Java comes with already tested and documented implementations. Similar to unbuffered streams there are two flavours of buffered streams one for text and one for binary files. Let’s see an example that demonstrates how to use buffered streams:
As you can see all you need to do is to wrap
Table below presents buffered streams for each of the unbuffered streams we already know:
Let’s finish by presenting some benchmarks that will show how much we can gain using buffered streams:
|Read 1MB*||161 773 ms||18 568 ms|
|Write 1MB*||236 892 ms||22 280 ms|
/dev/zero and writing to
/dev/null on Ubuntu 14 LTS with Oracle Java 8
As we can see there is almost x10 speed gain, so if you find yourself reading or writing huge files try to use buffered streams!
Line oriented IO
When we work with text files usually we are more interested in reading/writing lines of text instead of characters.
Let’s start with an example that will demonstrate how to read text files line by line:
BufferedReader class contains useful
readLine() method that allows us to
read entire line from file into
String object. Unfortunately it doesn’t provide
any methods that will allow us to read e.g. numbers from file. If you find yourself
trying to parse lines of text contained in some text file you may want to check
Scanner is a huge class that deserves a blog post on it’s own - I will not describe how
to use it here.
When writing content to text files we are in much more better situation thanks to
PrintWriter allows us to write not only lines of text but also
numbers, booleans and even custom formated strings:
output.txt file created by this program looks like this:
When you’re working with
PrintWriter you definitely want to familiarize yourself with
printf method. There is already plenty of blog posts/articles describing how formatting works
e.g. printf cheatsheet and
introduction to printf.
Closing streams after use
In all presented examples so far we always wrapped code reading or writing to stream
try block and we always closed the stream inside
Closing streams is very important because program can have only
limited number of open files at any given time.
For example on my machine Java program can only have 4080 open files,
after than any attempt to open another file throws:
Generally when dealing with wrapping streams like buffered streams that use
other streams to read/write data we should close only wrapper.
This is particularly important
BufferedWriter because closing inner stream may not save data that
BufferedWriter buffer. Following program illustrates this problem:
Running this program will create empty
test.txt file. Changing last line to
buffered.close() will fix this issue.
Try with resources (Java 7+)
After working with files for a while writing
try ... finally blocks and
manually closing streams becomes tiring.
Java 7 introduced new statement called try-with-resources that greatly
reduces amount of code needed to properly handle streams.
Let’s look at an example:
This code will be translated by Java compiler into (you may use CFR decompiler to get actual code):
As we can see all boring code like
if (stream == null) will be generated
by compiler for us.
NOTE: When we call
buffered.close() it will write all changes to
output stream and
output.close(). Notice that in second
finally block compiler generated code that
tries to close
output stream second time. This is perfectly valid, when you call
on already closed streams this is a no-op.
Try-with-resources can be also used with
This will end our tour of classic Java IO. In the next blog post I will present
Path classes - they are part of Java NIO and can further simplify working