Knowledge base‎ > ‎

DataOutputStream: encoded string too long

posted Apr 20, 2010, 4:35 AM by Szabolcs Szádeczky-Kardoss   [ updated Apr 22, 2010, 1:42 AM by István Soós ]
As I'm preparing to release OKTECH Profiler 1.1, I have checked the performance benchmarks on the profiler itself. It came apparent that the UTF-8 conversion consumes a lot time, so I've started to investigate what happens behind the scenes. I've encountered a little shock at the DataOutputStream class: it has a serious limitation, as it doesn't allows to write strings larger than 64k. I thought those times(*) were over and it is just the Java reference source code that has this limitation, so I've written a small program to double-check it:
    public static void main(String[] args) throws Exception {
StringBuilder sb = new StringBuilder();
for (int i = 0; i < 10000; i++)
sb.append("1234567890");
String s = sb.toString();

ByteArrayOutputStream baos = new ByteArrayOutputStream();
DataOutputStream dos = new DataOutputStream(baos);
dos.writeUTF(s);
dos.close();
}
(*) I got lazy in Java and assumed that it just works.

To my great dislike, it fails on my Mac, the DataOutputStream does have this 64k limitation. Having an other look on the Javadoc, it does contain this information. At the moment this shouldn't affect OKTECH Profiler, as I cannot imagine any stacktrace that has a method or class which is longer than that.

On the other hand, we are now in a process that ensures larger flexibility in the dumps, allowing 3rd party plugins easier to be contributed in the profiler runtime and analysis. After this experiment, I'm considering a more XML-like dump format, e.g. Fast Infoset, a pretty good binary xml format. (Update: no, it won't be binary XML, it is so much slower than pure DataOutputStream).


published:2009-09-26, a:István, y:2009, l:java, l:profiling
Comments