Our knowledge base contains our former blog entries (from 2009), technical problems and solutions, ideas that we think are worth preserving. We will add new entries from time to time in the hope they will be useful for you. |
posted Apr 20, 2010 5:28 AM by Szabolcs Szádeczky-Kardoss
[
updated Apr 22, 2010 1:50 AM by István Soós
]
A few weeks ago a friend asked me
about a problem with XMLStreamReader. We have quickly concluded that it
is no error at all, it is in the nature of the XML processing tools, but
if you encounter it at the first time, it could seem strange. It is
about the fact that XML text nodes are not necessarily processed at
once, and while you read the XML, you might receive only fragments.
For example if you have the text: "Q&A", which in XML will
be escaped to "Q&A", you might end up with reading first the
string "Q" then, the "&" and finally the "A", instead of reading it
as a whole string. Like the following code:
import java.io.StringReader; import javax.xml.stream.XMLInputFactory; import javax.xml.stream.XMLStreamReader;
public class TestTextNode { public static void main(String[] args) throws Exception { String xml = "<?xml version=\"1.0\" ?><test>Q&A</test>"; XMLInputFactory factory = XMLInputFactory.newInstance(); XMLStreamReader reader = factory.createXMLStreamReader(new StringReader(xml)); reader.next(); reader.next(); System.out.println(reader.getText()); } }
On Sun's Java 6 JVM you shall receive just
"Q" in the first round. On the consecutive reads, you will receive the
other characters, but for the unprepared people, it is just strange. So,
why this happens?
XML allows you to have very large files. If
you look for example at the wikipedia.org XML dumps, it is not unusual
to have XML files larger than a few GB. There is no limit on how big a
text node can be, so it is the responsibility of the tool to process it
in reasonable chunks. If you order it to load into a DOM, you will
receive a large tree in the memory - if you have much more than the XML
side itself, you have good chances that it will fit. However on large
XMLs or for some kinds of processing, you just stream through the data
and do not build a DOM tree.
As in the example above, while you stream
though the XML, you will receive TextNodes. These are usually
constrained by the:
- closing
or other opening tag
- buffer size of the streamer (if it is
full, the stream reader will receive the text)
- special escape
characters (as above, the escaped & resulted in a new fragment
While the first one is trivial, the second
and third is a less-known internal of the XML parsers, but from the
memory consumption perspective, it seems it has a good reason behind it.
Now the question remains: are you able to
parse the XML and receive all the text consecutive nodes compacted? It
depends on the parser, but in Java, you can, just put the following code
after the factory initialization:
factory.setProperty(XMLInputFactory.IS_COALESCING, true);
So it is not magic to change the behavior,
although with the recent hardwares and softwares, it might be better to
have the coalescing by default, and it could be turned off - although
it is definitely fail-safe this way. published: 2009-08-29, a:István, y:2009, l:java, l:xml |
posted Apr 20, 2010 5:26 AM by Szabolcs Szádeczky-Kardoss
[
updated Apr 22, 2010 1:57 AM by István Soós
]
When profiling
applications, it is always important to measure the time as precise as
it can be, and the old way was to measure the system clock with
increasing granularity, while in the meantime we have received access to
a more precise, thread-specific clock.
In the
old-fashioned way, we have two methods to measure the system clock:
- System.currentTimeMillis()
- System.nanoTime()
Both method depend on the operation
system's internal clock API, so there is no guarantee on the granularity
of the clock, however nanoTime tends to be more precise - as you would
expect. The difference between the two is that nanoTime does not relates
to the date information anyhow (no counting of milliseconds since 1970
or so). Measuring a single-threaded application, with not much
multitasking in the background (e.g. "nobody shall touch the machine
while I'll do the benchmark") produces good results.
However if the thread or process context is
switched by the operation system (because the user moves the mouse or
something else steals any CPU time or IO interruption), measuring the
system clock will always result larger number than it actually consumed.
Of course, if you repeat the measurements and calculate the supremum of the
numbers, it will give you good estimates, but that will always be just
an estimate.
On the other hand, the Java platform now
provides easy access to an other clock: the thread's own CPU time
counter, which can compute the CPU time spent in the actual thread. This
is more precise in the regard that thread and process context switches
are not measured, only the time when the thread is active. Although the
usual Thread and System classes don't provide such methods, Sun's JVM
provides a simple way to achieve it:
ThreadMXBean threadMXBean = ManagementFactory.getThreadMXBean(); long time = threadMXBean.getCurrentThreadCpuTime();
Simple, isn't it? Of course you can check
other methods on this MXBean, but it gives the basic idea behind precise
CPU time measurements.
published: 2009-08-24, a:István, y:2009, l:java, l:profiling
|
posted Apr 20, 2010 5:24 AM by Szabolcs Szádeczky-Kardoss
[
updated Apr 22, 2010 2:10 AM by István Soós
]
A few weeks ago there was a question
on a Hungarian Java user list about PMD and code review that made me
wonder what is the actual state of automated code review or code
analyzer tools like PMD compared to the manual process. The political
aspects of the code review process brought up some good and some bad
memories too...
On the mail list... My quick answer on the mail list was around
the following: "I wouldn't call PMD as a code review tool, it is rather a code
checker. Even if PMD rules pass (especially for the rules actually make
sense) we have had projects where our manual code review improved the
quality of the code significantly."
Okay, this is not a big surprise so far.
But the reply made me think a little bit: "Our actual development
methodology doesn't have any place for code review. I am are looking for
way to include it without disrupting the current development timelines.
For the first 2-3 projects it will be only feedback, to have some
routine in it and later on we can fine-tune the process. However
somewhere we shall start and PMD looks good for such."
Good
luck for such an initiative! :)
On the dark
side... I have some doubts
about the long-term success if it will remain feedback only and it is
not enforced even in the actual projects. Why I see it that skeptical? I
think if something (a) is not mandatory and (b) it is not in the
essential culture of the group and (c) it could be skipped to save some
time for the hard deadline - will be essentially skipped to save that
crucial time... Been there, done that, forgot that... :)
One of our clients did have similar
scenario a year ago: a multinational corporation that works mostly with
outsourced developers (read: cheaper than everything = not always as
qualified as you would like), while we were hired as architects for
project assistance. Our first shock was when we've realized that nobody
(read: not a single person) has ever required an internal code review
of the shipped products. Of course officially they have had it: if the
department A of the contractor company developed a product, they have
requested a code review from the department B. Guess what: it has
passed. Even if they contracted a third party, it was as cheap or much
more cheaper than the original one, so there was no surprise they
haven't produced any reasonable report. By the way: have you noticed
that code review reports make no sense at all for product managers who
just know Excel-sheets? :)
Anyway, as architects we were the first to introduce this
requirement in the business process, and to be honest, we have made
progress in the technical parts, but we have slightly failed the
political aspects. At least it reinforced the belief: if it is not
mandatory, it will be skipped - period.
In technical terms, we were quite good: we
had introduced PMD as an IDE-plugin, so every developer was able to
check the rules for itself. Unfortunately PMD contains a lot rules that
just make no sense or are irrelevant in the actual project context, so
we had to fine-tune the ruleset and eliminate some annoying ones. We
have created a few new rules to describe some critical scenario but
generally we have left PMD to do the job.
On the political side, we - as architects - were a bit outsiders: we have had no rights to push the project deadline
or block the handover process. Even if the project was buggy (in PMD
sense and in our common sense, like String == instead of .equals(...)),
we have had no impact on the process. The PMD reports had been sent around,
but the management gave it no sh*t (they hadn't understood it, and the
deadline was too much worry anyway), so nothing happened with the
results. The only thing that successful resonated something was when
I've periodically sent a report about the number of errors in the
project - with the historical numbers, showing a clearly rising tendency
:[. Finally we were allocated with the budget to do some manual code
review as well, and it was a clearly better story - well, at least for
us, architects :].
And the brighter side... Speaking on the good experiences, I've had
a job at an investment bank, where we had a clever and responsible team
of people around infrastructure libraries. These libraries were
crucial, many other departments has used them in production environment -
so essentially a solid code review process was in place. There was
practically no code that wasn't peer reviewed before make to a new
release - with a few exception of emergency releases, but they were in a
different branch.
Speaking of our internal developments, we
have the luxury of doing such code reviews irregularly, because instead
of such process, we rely on: - unit
testing
- test coverage tools
- performance profiling that
monitors the critical parts
By the way, this resonates to the "automate everything"
motto of successful businesses :)
And some
supporting tools... Cutting a
long story short (oh, am I late with that?), my view on the
technological part: - PMD is a nice tool, but has
limitations and obsolete rulesets (some of them are detected by the java
compiler anyway). If you need similar tools, even open-source, you
might check these code analyzers
too.
- Manual code review is always
better than such tools. I don't think that static code review for
'feedback' purpose makes any sense without some policies.
- The weakest chain is - as always - the
timeline and the budget. If those doesn't allow the code review process
to allocate more time for better quality, you could have the best
toolkits in the world, but you might just drop it in the trash.
As I'm typing this entry, I've started
looking into tools that allow better formalization and documentation of
the manual code review process, basically a better team collaboration.
As for one example, Atlassian Crucible
looks interesting. Of course there are open source
code review tools there, it might be easy to pick from one of those
too.
Most of these tools are a mixture of
version control repository viewer, issue tracker and commenting, so if
you have these installed, you might just use it for the code review
process too. Our preferred tool is Redmine,
and it is pretty easy to use it for such purpose: - When committing code, not only
update the originating issue, create a new one requesting a code review
and optionally assign to someone.
- The review process will
produce questions and comments, these can be new issues as well.
- The
discussions around these comments can be just as organized or as
loosely defined as you like - without much technological restrictions...
And of course: none of these tools can
beat the performance of the developers who are sitting at the same desk
and checking the code on the monitors real time, while discussing their
ideas :). published: 2009-08-15, a:István, y:2009, l:automation, l:codereview, l:governance, l:pmd, l:redmine, l:testing, l:unittesting |
posted Apr 20, 2010 5:09 AM by Szabolcs Szádeczky-Kardoss
[
updated Apr 22, 2010 1:45 AM by István Soós
]
As our regular readers already
know, Wicket is our favorite web
framework and we use it actively in our projects. Wicket is an
easy-to-use, well-designed framework and is able to incorporate Ajax in a
very nice and easy way. I personally am not a big fan of using Ajax in
every corner of the application, however at some points it can make your
app much nicer. Let's look at such a case!
Imagine a form where
you have to enter possible answers for a survey question, with the
number of answers being dynamic. It would be possible to give the user a
fixed number of text fields, let's say 8 should be enough, but what
happens if the user wants more possible answer? Or what if (s)he only
wants 2, and is not really happy about another 6 empty text fields
taking up half of the screen? So let's go dynamic and "ajaxify"!
Let's
give the user only 2 possible answers to begin with and also a link
with the possibily to add more answer rows, and another to remove unused
ones. The form can contain a lot of other fields in which we are not
interested in, however adding or removing a row should not have any
effect on the other fields. What's more the app should keep those values
that the user already has entered. Of course the links should be using
Ajax and only refresh the dynamic list part of our form page and they
should refrain from submitting and/or validating the whole form. If you
look up a tutorial or a good book on Wicket it gives you a similar
solution:
<!-- DynamicRows.html --> <form wicket:id="form"> <!-- Other non-repeating fields in the form --> ... <div wicket:id="rowPanel"> <span wicket:id="rows"> <span wicket:id="index">1.</span> <input type="text" wicket:id="text"/> </span> <a href="#" wicket:id="addRow">Add row</a> </div> ... </form>
And
here comes the associated Java code:
// Relevant constructor code in DynamicRows.java ... // Create a panel within the form, to enable AJAX action final MarkupContainer rowPanel = new WebMarkupContainer("rowPanel"); rowPanel.setOutputMarkupId(true); form.add(rowPanel);
// List all rows ArrayList rows = new ArrayList(2); rows.add(new String()); rows.add(new String()); final ListView lv = new ListView("rows", rows) { @Override protected void populateItem(ListItem item) { int index = item.getIndex() + 1; item.add(new Label("index", index + "."));
TextField text = new TextField("text", item.getModel())); item.add(text); } }; rowPanel.add(lv);
AjaxSubmitLink addLink = new AjaxSubmitLink("addRow", form) { @Override public void onSubmit(AjaxRequestTarget target, Form form) { lv.getModelObject().add(new String()); if (target != null) target.addComponent(rowPanel); } }; addLink.setDefaultFormProcessing(false); rowPanel.add(addLink); ...
You
can notice, that we have used AjaxSubmitLink for adding a new row into
the list. This is needed because the user might already have entered
some values in some of the fields and we don't want those to be lost, so
the form values have to be submitted. However we would like to avoid
getting a validation error in some of the other fields when all we want
is to add a new row, so we use addLink.setDefaultFormProcessing(false).
This
is a nice solution, however if you try it you'll see that the values
entered in the repeating rows get lost, when a new row is added. The
reason for this is that after pressing the "Add row" Ajax link the
TextFields don't update their backing model (since we have turned off
form processing), however the ListView removes and recreates all its
TextFields again in its onPopulate() method. And so the "old-new"
TextFields will show the original model value.
So what to do now?
You can try to update the backing model of all the repeating TextFields
in the Ajax "Add row" action, however this is still not a 100% solution
in case an invalid value is entered. In case of an invalid value the
validation fails, the model doesn't get updated and after recreating the
TextFields the invalid value reverts to the last valid value entered.
So it turns out that the problem is again caused by the recreation of
all the TextFields.
You could cache and reuse all those
TextFields in a custom subclass of ListView, as I did for the first
time. Or you could browse the source code of ListView and come across a
reuseItems property which is just the nice and clean solution to our
problem:
lv.setReuseItems(true);
This
will reuse the already created TextFields and will call the
populateItem method only for the newly added row. Since the TextFields
are reused they will remember the last valid or invalid value entered,
and that's what we wanted from the beginning. All the above logic can
also be applied to a "Remove row" Ajax action in case it's needed. So,
we have made a nice "ajaxified" ListView.
published:2009-07-28, a:Szabolcs, y:2009, l:ajax, l:wicket |
posted Apr 20, 2010 5:06 AM by Szabolcs Szádeczky-Kardoss
[
updated Apr 22, 2010 2:17 AM by István Soós
]
Wicket provides a lot of useful
feature, among the others: it provides a lot out-of-the-box components.
And if it doesn't suites you, you can easily create your own. Recently
we have encountered this with a captcha component: we
required a few different features (e.g. easier to read captcha), so we
have created our own captcha panel.
We have choosen SimpleCaptcha as the
image provider: it can create really difficult captchas, however it does
suite our requirement for easier ones too, it can be configured like
charm. Everything seemed to be easy and it basically worked for simple
examples. However we have created an ajax tabbed panel that, on a few
tab, contained this captcha. The problem hit us when the user switched
between these tabs: the captcha text changed on the server side
(expected) but the image hadn't on the client side (if you are entering
the text, it will update, but that is not very user-friendly, is it?).
What can you do in similar scenario?
- Set the cache
control directives in the response header of the image
- Add a bit
randomness to the image URL
- Obfuscate the URL of the image
Some
believe that the first point is enough for most of the scenarios, some
would go with the second option as well, however it is always a good
idea to implement the third one too - it comes almost free and
effortless with Wicket.
1. Set the cache
control directives
You can set these as part of the
DynamicWebResource or its subclass, e.g. the image resource we have
used:
BufferedDynamicImageResource bdir = new BufferedDynamicImageResource() {
private static final long serialVersionUID = 1L;
@Override protected void setHeaders(WebResponse response) { super.setHeaders(response); response.setHeader("Cache-Control", "no-cache, must-revalidate, max-age=0, no-store"); } };
2.
Add a bit randomness to the image URL
Wicket dynamic
image generates the URL for itself, but with a simple behavior you can
modify it and add an extra item to the end of it:
Image image = new Image("captchaImage", bdir); image.add(new AbstractBehavior() {
private static final long serialVersionUID = 1L;
@Override public void onComponentTag(Component component, ComponentTag tag) { tag.getAttributes().put("src", tag.getAttributes().getString("src") + "&nanoTime=" + System.nanoTime()); }
}); add(image);
3.
Obfuscate the Wicket URLs
This method will obfuscate
every non-bookmarkable URL in your application. This is not only for the
captcha images, but it helps you to expose most of your internals, and
prevents search engines to index them as part of the URL. This guide
can help you to achieve more, but basically you need to add these lines
in your Application class:
@Override protected IRequestCycleProcessor newRequestCycleProcessor() { return new WebRequestCycleProcessor() { @Override protected IRequestCodingStrategy newRequestCodingStrategy() { return new CryptedUrlWebRequestCodingStrategy(new WebRequestCodingStrategy()); } }; } published: 2009-07-22, a:István, y:2009, l:captcha, l:encryption, l:wicket |
posted Apr 20, 2010 5:00 AM by Szabolcs Szádeczky-Kardoss
[
updated Apr 22, 2010 2:56 AM by István Soós
]
On the company website, we have
indicated the intention to share some of our work as open source. This
blog seems to be a good opportunity to start with that process,
especially when the toolkit is as simple as our GWT integration solution
with Spring framework. (Have
you noticed that springframework.org is redirected to springsource.org? I
wonder if the packages will be renamed too, just to break backwards
compatibility - *evil*)
There was always a fuss about how GWT and Spring can be
integrated. There are solutions who go with the hard way: defining the
services as servlet paths with some hacking to access the application
context, or following the Spring MVC, as separate controllers. A year
ago, at the time when the Spring annotation support gained more
awareness, there were only a
few solution to use that with GWT. At that time, we have developed a
GWT application that required a Spring integration, so we took the
ideas from Chris Lee's blog, and extended it to our needs. Now, it is
our time to share that with the open source community, hoping that we
can say something new or at least less-known...
Chris and Martin (check the blogs
comments) however started a much more interesting way, which was near
perfect for us: define your service interface in GWT, implement it as a
normal class on the server side, and let Spring do the magic with the
binding and such. You need to have the following:
- GwtServiceHandlerMapping
- for mapping the service with a given servlet path
- GwtServiceHandlerAdapter
- for processing the request on that url and delegating the request to
the invoked method
- RpcProxyWrapper - that handles the client
side magic
Above these, we
had a requirement to pass a bunch of GWT serialized object to the HTML
content (through Freemarker), so we have specified a GwtObjectSerializer
interface, that can help us just to do that. And why would be this
important? In case you are targetting search engines with you
GWT-enabled application, it is definitely important to have some mixed
content model with HTML content and GWT too. It is not hard at all, so
we share that source here too. You can find these sources and compiled
version as an attachment on this page (temporally, as we had no better idea where to place it).
Pretty simple, not too much documented,
sometimes it feels like it hasn't been cleaned up (some
System.out.println(s) are there for some lazy developer who checked back
its debug codes) - which is true, as the project received a slightly
different version of it. We might clean it up later...
Anyway,
how can you use it?
Add the following lines in your
servlet config xml:
<bean class="org.squaredframework.gwt.rpc.server.GwtServiceHandlerAdapter"/> <bean class="org.squaredframework.gwt.rpc.server.GwtServiceHandlerMapping"/>
Suppose
you have a Service interface:
public interface SimpleSearchService extends RemoteService {
public SearchResult search(String text);
/** * Utility class for simplifying access to the instance of async service. */ public static class Util { private static SimpleSearchServiceAsync instance;
public static SimpleSearchServiceAsync getInstance() { if (instance == null) { // gwt client calls are single-threaded instance = GWT.create(SimpleSearchService.class); RpcProxyWrapper.get().wrapProxy(instance); } return instance; } } }
With
the async pair:
public interface SimpleSearchServiceAsync { public void search(String text, AsyncCallback<SearchResult> callback); }
Just
implement the service (and observe that nothing special is added in the
implementation):
@Controller public class SimpleGwtSearchServiceImpl implements SimpleSearchService { ... }
And
that is it. On the client, you can use the following code to invoke the
service:
SimpleSearchService.Util.getInstance().search(searchText, new AsyncCallback&;t;SearchResult>() { public void onFailure(Throwable caught) { // display error message } public void onSuccess(SearchResult result) { // display result } });
And we are done. Pretty simple, isn't it?
Once the Spring magic is in place, both the service implementation and
the client code cannot be more simple. (If it can, please let me know!)
And what happens inside?
The services are mapped in a special way,
which is automatically known to both the server and the client: with
some prefix and postfix, we will just transform the my.server.Service to
the /gwt-rpc/my/server/Service URL. Simple is that, details are in the
RpcProxyWrapper and GwtServiceHandlerMapping classes. As a bonus, you
are now refactoring-safe.
Oh, and those of you who are wondering what this
squaredframework.org is: this is our registered domain that was thought
to host our open source initiatives. We are not sure how we will proceed
on that, but at least we name our shared classes in that way.
Update (on July 19): I've just recently encountered Dustin's
project: spring4gwt.
He had very similar ideas, it might be reasonable to merge these, so if
he takes the initiative, expect some merges on his page...
published: 2009-07-13, a:István, y:2009, l:gwt, l:rpc, l:spring
|
posted Apr 20, 2010 4:58 AM by Szabolcs Szádeczky-Kardoss
[
updated Apr 22, 2010 2:48 AM by István Soós
]
The company made a good decision in
the recent weeks: the target is the sky, but at least the cloud. Amazon AWS offerings are hard to beat,
so we have started with that one, played around with different
configurations a bit, and finally decided that first we shall migrate
the company Subversion repository to the cloud, with ZFS mirrors and
encryption.
I'm a
long-time fan of the ZFS filesystem and the Sun's OpenSolaris offering
around it, basically because this is the best, easily accessible
filesystem that provides drive mirroring with checksums, enabling
automatic recovery from the underlying storage's failures. So it became a
natural plan to run OpenSolaris on EC2, ZFS with EBS volumes mirrored.
Although the EBS is meant to be very robust, there are always failures
in every system, and we have checked a few blog entries where the EBS
actually did fail, so better be prepared...
We know that we cannot achieve absolute
secrecy only if we unplug the server, dump it a big hole in a deserted
location and forget about it, but it seemed to be reasonable to have
some encryption. The plan was that at the time the instance starts, we
log in, attach the the encrypted ZFS pool with typing the password.
Okay, the running instance may be monitored and the content might be
extracted if the infrastructure allows such move, but we hope this is a
much harder and more classified job to do, than sniffing around a volume
snapshot.
I've
mailed to the Sun
OpenSolaris EC2 team, and they were very kind giving the initial
pointers to look for the stuff. I can recommend the following sites in
this topic:
Basically the last one pretty much describes most of the
important part, but there are a few differences on EC2. First, the Web
Console doesn't allow you to mount the EBS volumes directly, because it
will provide the /dev/sdf-like mount points for you, but this is not
what you are looking for, as the OpenSolaris AMI requires the device
number rather. So go to the command line or use ElasticFox,
to attach these drives properly. In our test drive, I've attached two
1GB volume as the 2nd and 3rd drive to the EC2 instance, they became the
c7d2 and c7d3 respectively.
To cut a long story short, I've used the
sun-opensolaris-2009-06/opensolaris_2009.06_32_6.0.img.manifest.xml AMI,
and here are the commands that were required to complete the process:
# zpool create rawstorage mirror c7d2 c7d3 # zpool status # zfs create rawstorage/block # dd if=/dev/zero of=/rawstorage/block/subversion bs=1024 count=0 seek=$[1024*512] # ls -lh /rawstorage/block/subversion # lofiadm -a /rawstorage/block/subversion -c aes-256-cbc # zpool create subversion /dev/lofi/1 # zpool status # pkg install SUNWsvn # svnadmin create /subversion/research/
So
what does it give for me?
- I have a mirrored storage
over the EBS (rawstorage pool)
- I have a ZFS filesystem
(/rawstorage/block) on that pool, so I can turn on the compression if
I'd like, create snapshots, extend it or anything like that
- I've
created a block file (/rawstorage/block/subversion) on this storage
with reasonable starting size. Okay, I haven't checked the size of our
ivy repository, this might be not enough for real use. Is there anything
more robust (or at least extendable) solution for this?
- I've
attached it as an encrypted loopback device (the /dev/lofi/1 appeared)
and set the password
- Created a new zpool above this device
(subversion pool)
- Installed SVN and used it...
This works from this point on, but what
happens if I shut down the instance and start a new one? Well, let's
attach the EBS volumes again, and follow these commands:
# zpool import # zpool import rawstorage # lofiadm -a /rawstorage/block/subversion -c aes-256-cbc # zpool import -d /dev/lofi # zpool import -d /dev/lofi subversion # ls -lh /subversion/research/
Cool,
it works again! You just need to import the rawstorage pool first,
attach the lofi driver (get the proper password here), import the second
pool, and use it as you like.
But what happens if the password is wrong?
First of all, the lofi driver is unable to decide. That seems to be bad
at first, but actually it doesn't matter, as we are not going to write
any data if we are not able to import the subversion pool. So the worst
scenario is that you type a bad password, and the zpool import won't
import the subversion pool, and that is it. In such case, you shall
detach the lofi drive and retype the password until it gets the pool.
Simple? Seems to be, but before you put
all your crucial data on top of it, you might want to play around a bit
with OpenSolaris and EC2 first. Many thanks to the Sun and Amazon teams
to enable such marvelous technology combination.
Update on 2009-07-16
Last week we have made a little proof of concept about the encrypted Subversion on Amazon EC2. This week, we decided to move forward and migrate most of our development-related stuff to the EC2 cloud, and now here goes our little success story.
The ZFS encryption works mostly as described on the previous blog, although it has a little difference after we have rebundled the OpenSolaris image. (Make sure you follow this guide!) The difference is that on the rebundled image you shall do something like this (supposed that 'storage' is the normal pool, 'safe' is the encrypted pool:
zfs mount storage lofiadm -a /storage/block/encrypted -c aes-256-cbc zfs mount safe
Except that, everything works as expected. We have made the following setup on the EC2:
- The OpenSolaris image handles two EBS volume in a ZFS mirrored pool. This 'storage' pool has turned-on compression to decrease the number of IO-operations a bit.
- On the storage pool, we have stored some downloadable stuff, but most of our data is on the encrypted volume ('safe' pool).
- Our issue tracker is Redmine, and although it is hard to setup at first time, and it has some limitations in the project identifier handling (20 character of id is not really long), it is good enough to use for issue and time tracking (+ wiki, + documents, + subversion access control + ...).
- We are using Postgresql database to store the Redmine stuff.
- Our Subversion repository is exported on webdav, the access control is delegated to Redmine. One single entry point for the administration gives less overhead...
If we ever need larger storage, we just attach a new drive, the ZFS handles the hard stuff, and detach the old. We have all the development stuff on a remote server that is reliable (okay, we need to do some regular backups even on Amazon), and we are paying much less than our previous server hosting provider. And our public company page can be hosted on a cheap host, as it is 100% static content.
So far so good.
Update on 2009-08-07
We have started evaluating and using
Amazon EC2 almost a month ago. Here are our 'lessons learned' items.
Be
prepared...
We have
evaluated and used encryption
with OpenSolaris and ZFS on EBS. We have successfully rebundled the
instance to migrate
our Subversion repository on this server. Although we have always
typed the encryption password right after this migration, we have
finally decided to check some scenario, e.g. when we do type it wrong:
can we loose data some way? Just in case something does go wrong, we
have created EBS snapshots on the volumes. After some testing, we see
the data lost scenario unlikely, because if we type the password wrong,
we will receive something like the following:
Initial state: pool: safe state: FAULTED status: The pool metadata is corrupted and the pool cannot be opened. action: Destroy and re-create the pool from a backup source. see: http://www.sun.com/msg/ZFS-8000-72 scrub: none requested config:
NAME STATE READ WRITE CKSUM safe FAULTED 0 0 0 corrupted data /dev/lofi/1 ONLINE 0 0 0
pool: storage state: ONLINE scrub: resilver completed after 0h0m with 0 errors on Fri Jul 24 12:42:15 2009 config:
NAME STATE READ WRITE CKSUM storage ONLINE 0 0 0 mirror ONLINE 0 0 0 c7d2 ONLINE 0 0 0 c7d3 ONLINE 0 0 0 43.5K resilvered
errors: No known data errors
# zfs mount safe cannot open 'safe': I/O error
So we need to remove the lofi storage with
lofiadm, and remount it, solves all the problem.
Automate...
It is always a good idea to document
things, and this is especially true with a sometimes transient service
like Amazon EC2. It turned out that there was a startup
bug in the official OpenSolaris bundle and you need to rebundle
your server with the new version if you would like to have a better
version. We did, as we have encountered this bug sometimes, so the
documentation become very handy: we were required just to copy-paste the
commands in the console and wait for the output, as most of our
documentation was like a shell-script.
The next level of automation will be to
create expect-scripts to
automatically set-up and bundle full images. I'd suggest anyone starting
with EC2 to write the setup scripts in this later fashion from the
beginning. For the hard-core Java people like myself, ExpectJ or Enchanter are vital
options too, but the ultimate solution is to use something like JSch and Groovy to control every aspect of
the communication.
Automate, automate...
When we start an instance, we attach the
drives, the elastic IP, then execute a few commands to mount the
encrypted storage and start the services. This is a very boring process,
and fortunately you could automate this process too:
- Use the Amazon EC2 command line tools to
query information from your available resources.
- Tag your resources according your service
needs (e.g. if you have a redmine server, put the redmine tag in the EBS
volume's tag and the elastic IP of that instance).
- Write scripts that process these tags with
the help of the above mentioned command line tools, attach the drives
and IP automatically.
- Execute other
scripts (e.g. the encryption) on the running instance to fire up
everything.
Even if you
are using encryption, late service starting or other exotic
requirement, you might reduce the number of required steps to a very
small number (1-5, including the password specification).
Automate,
automate, automate...
Sometimes
it is not known before the server setup how often you would like to
have backups / report processes. Rebundling the server just to add a new
crontab entry is a very unlucky task for anyone involved. It is better
to prepare the bundle image with a few cron job that might not be ever
used, but if we does require them, we are not required to re-bundle the
image. For example the following commands help to define a hourly report
script:
export EDITOR=nano crontab -e # 58 * * * * [ -x /safe/home/root/hourly-report.sh ] && /safe/home/root/hourly-report.sh
As you can see, this script is placed in
the '/safe' directory, which is on the encrypted volume. If for some
reason the encryption / mount fails, or if there is no such file at that
place, there will be no error: the [ -x ... ] directive ensures it will
be executed if and only if it is present and executable. Placing this
in the encrypted volume allows us the opportunity to store a few, more
confidential items here as well, e.g. our script can encrypt the report
mail, or use some sftp mechanism to access some remote site for such
report.
Of course
the type and variety of such scripts you define in your crontab is up to
you entirely.
Be patient...
With the ElasticFox plugin, we have
encountered some strange problem, e.g. sometimes it does take a very
long time to get the list of KeyPairs. One inpatient member clicked on
the 'create' button, typed the same name we have had previously and
silently removed our old key and placed a new one. The KeyPair was
distributed internally again, but this is just a silly move it is rather
not encountered.
published: 2009-07-09, a:István, y:2009, l:aws, l:cloud, l:ebs, l:ec2, l:encryption, l:opensolaris, l:subversion, l:zfs
|
posted Apr 20, 2010 4:39 AM by Szabolcs Szádeczky-Kardoss
[
updated Apr 22, 2010 1:40 AM by István Soós
]
I would like to share some thoughts with
you about an untold tale in software engineering. Most probably you have
already read or seen quite a few marketing-oriented documents or events
with a similar message:
- Use configuration instead of coding
- Change
system behavior or business rules without recompiling
- Increased
developer productivity...
All of these (let's call them
dynamics) are nice, however I've never seen (not even in a footer with a
tiny font) what is the price you have to pay for these features. And
there's sure a price to pay compared to the "old" approach where you
didn't put much of your program in configuration, and you had to
recompile it for even the smallest changes. Well let's have a closer look at
one very common one, the usage of JavaBeans.
A JavaBean is
primarily meant to be a "data-holder" (or business object if you like
that terminology better), with no or very little application logic
included. Their main purpose is to hold data that are set through their
setter methods and return those when needed via their getter methods.
With some simple rules defined (for an attribute named attribute there must be a getAttribute and a setAttribute method) it is possible
to dynamically explore and use any JavaBean with the help of the Java
Reflection API ( java.lang.reflect
package). Well in most of the cases told in the beginning this is the
backing concept, and the one that makes a developer's life easier (at
least in the development phase). Let's have a look at a simple JavaBean:
public static class Person { private String name; private int birthyear;
public String getName() { return name; } public void setName(String name) { this.name = name; } public int getBirthyear() { return birthyear; } public void setBirthyear(int birthyear) { this.birthyear = birthyear; } }
This
is a very simple JavaBean. If you want to work with it the "old" way,
you call those getter and setter methods directly, but then you have to
explicitly write those at compile time, making your app not very
dynamic. The "new" way is this:
java.lang.reflect.Method setNameMethod = Person.class.getDeclaredMethod("setName", String.class); setNameMethod.invoke(person, "James Bond");
Even
better is if you explore the methods and their parameters beginning
with "set" and "get" dynamically, and store them in a Map for later use.
More or less this is the way all of today's frameworks function, from
JSP's Expression Language to Wicket's PropertyModel binding. Since this -
not very nice - code can be put in the framework, the developer will
see only the nice and easy configuration of ${person.name}="James Bond" or something similar. Ok,
but what will the developer see at runtime (with an emphasis on time)?
Well,
attached below is a small little tester app, that shows the runtime
difference between direct and indirect (reflective) call of a JavaBean's
setter and getter methods. The test performs a typical scenario when
some dynamically configured behavior of a framework reads and/or updates
all the fields of a JavaBean. Please note that the solution used is
probably the most simple solution for indirect method invocation, most
frameworks use something more complex for example via the java.beans
package, but in the end it always comes down to java.lang.reflect.Method.invoke(...)! You can easily
run the app yourself, so I only show some interesting results:
| Test
cycles: |
10 |
50 |
200 |
1000 |
5000 |
20000 |
100000 |
500000 |
| Direct
calls (average in ns) |
5700 |
5800 |
5700 |
5700 |
5800 |
5100 |
4500 |
4300 |
| Reflective
calls (average in ns) |
32000 |
60000 |
32000 |
36000 |
18000 |
12000 |
6300 |
5100 |
We
can see that there are situations, when it is 10 times slower to call
the same method(s) reflectively than directly. This is a huge
difference! And the average doesn't even contain the discarded maximum
values, which would make the gap even bigger. By increasing the number
of test cycles well above 1000, the difference melts down to about 20%.
However I think the reason for this is the HotSpot optimization in
the JVM when it sees that the same cycle is repeated for an awful lot of
time. I am sure that in a normal application where such code is not
executed in a cycle like this, JVM optimizations are not likely to be so
effective. Most probably in a normal application it can be said that
the similar developer-friendly features (dynamics) take up to 5-6 times
more cpu time to accomplish for the JVM.
Well, what do you think: is
this a big price to pay for developer productivity?
published: 2009-09-11, a:Szabolcs, y:2009, l:java, l:productivity, l:profiling |
posted Apr 20, 2010 4:35 AM by Szabolcs Szádeczky-Kardoss
[
updated Apr 22, 2010 1:42 AM by István Soós
]
As I'm preparing to release OKTECH Profiler 1.1,
I have checked the performance benchmarks on the profiler itself. It
came apparent that the UTF-8 conversion consumes a lot time, so I've
started to investigate what happens behind the scenes. I've encountered a
little shock at the DataOutputStream class: it has a serious
limitation, as it doesn't allows to write strings larger than 64k. I
thought those times(*) were over and it is just the Java reference
source code that has this limitation, so I've written a small program to
double-check it:
public static void main(String[] args) throws Exception { StringBuilder sb = new StringBuilder(); for (int i = 0; i < 10000; i++) sb.append("1234567890"); String s = sb.toString();
ByteArrayOutputStream baos = new ByteArrayOutputStream(); DataOutputStream dos = new DataOutputStream(baos); dos.writeUTF(s); dos.close(); }
(*)
I got lazy in Java and assumed that it just works.
To my great dislike, it fails on my Mac,
the DataOutputStream does have this 64k limitation. Having an other look
on the Javadoc, it does contain this information. At the moment this
shouldn't affect OKTECH Profiler, as I cannot imagine any stacktrace
that has a method or class which is longer than that.
On
the other hand, we are now in a process that ensures larger flexibility
in the dumps, allowing 3rd party plugins easier to be contributed in
the profiler runtime and analysis. After this experiment, I'm
considering a more XML-like dump format, e.g. Fast Infoset, a pretty good binary
xml format. (Update: no, it won't be binary XML, it is so much slower
than pure DataOutputStream).
published:2009-09-26, a:István, y:2009, l:java, l:profiling
|
posted Apr 20, 2010 4:19 AM by Szabolcs Szádeczky-Kardoss
[
updated Apr 22, 2010 1:41 AM by István Soós
]
Recently I've encountered a problem of displaying comma-separated list items on a web page. It came natural to check if it can be done in CSS or not. This page explains the concept implemented in CSS, and on this example page you can check it yourself. It works in Safari and Firefox, but does not work in IE 7 - it just don't display any comma at all, the items are listed with spaces between them. Too bad :(
That will leave us to implement the comma separated list inside the application, in our case: in Wicket. The following small code fragments explain the basic idea:
<wicket:container wicket:id="list">
<span wicket:id="comma"> </span>
<span wicket:id="label">some value</span>
</wicket:container>
We created a container without markup; inside the container we have defined a "comma" component, and a "label" component. If we take a look at the Java code, it is pretty easy to understand how it goes, actual code fragment uses ListView:
add(new ListView<String>("list", new MyListModel(myParentModel)) {
private static final long serialVersionUID = 1L;
@Override
protected void populateItem(final ListItem<String> item) {
item.add(new Label("comma", ", ") {
private static final long serialVersionUID = 1L;
@Override
public boolean isVisible() {
return item.getIndex() != 0;
}
});
item.add(new Label("label", item.getModel()));
}
});
The first comma component is not shown, while all the others are. It is really simple to implement - after you have the idea and design. Of course you might get rid of the <span> elements too, if that makes sense in your application.
published:2009-09-27, a:István, y:2009, l:wicket |
|