Don’t Parse That XML!
I’ve talked a few times about how the best code you can write is code you never write. One of the major places I end up seeing developer writing code that they don’t need to write is when parsing XML.
A word of caution before I go into how to not have to parse XML!
What I am going to describe is not always going to be the best solution. It is an easy solution that will cover simple processing of XML files. For a large XML file, the solutions I am going to suggest might be memory intensive and take too long.
Seems like everyone is doing it
I’ve walked into so many software development shops and seen code to parse XML files. It seems to be one of those really common things that not enough developers realize can be completely automated.
I have started to wonder if it is self-propagating. If developers have a tendency to see it being manually parsed in one place, assume that there is not a better way, then propagate that manual parsing to the next place they go.
Why is it so bad?
First of all, it is not an easy task to parse XML. Even when using an XML parsing library there is a large amount of code that has to be written, especially for a complex XSD.
XML parsing code is also very fragile. If the structure of an XML file changes, the code will have to be modified, and the modification can have cascading effects.
Manually generated XML parsing code cannot be regenerated if the structure of the XML changes.
Most importantly, any code you have to write runs the risk of introducing bugs and complexity into the system.
It’s so simple you wouldn’t believe it
So, how simple is it to automatically parse XML into objects?
Very simple. First I am going to give you the basic pattern, then I am going to tell you how to do it in both C# and Java.
- Use a tool to generate an inferred XSD from your XML file. (You can skip this step if you already have an XSD file.)
- Use a code generation tool to generate your classes automatically from the XSD file.
- In your code, deserialize your XML file into an object tree using the framework you generated the classes from.
If you are doing something more complex than this, without a really good reason, you are doing it wrong!
Learning how to do this in your language of choice is a very important tool to put into your tool bag. There are many times that I have run into the need to parse XML files, where I have saved many hours of development time, by knowing how to automatically deserialize my XML files into objects.
There are two main ways in which XML serialization frameworks work.
- Serializers that auto-generate the classes from the XSD files.
- Serializers that use annotations or attributes on classes.
Using a serializer that auto-generates the classes from an XSD is the easiest to use and can work in most cases. If you need more control over the generation of the XML, you might want to use an attribute or annotation based framework.
One of the biggest barriers in getting started with an XML framework is knowing what to use and how to use it. I am going to cover 3 options that will get you going for C#, Java SE, and Java Android development.
XML serialization is so easy in C# because it is built right into the .NET framework.
The only real piece of magic you need to know is the XSD.exe tool which is installed with Visual Studio. This one tool can be run to infer an XSD from your XML file and then again to take that XSD and produce fully serializable / deserializable classes.
If you have an XML file named myFile.xml, you can simply go to the Visual Studio command prompt and type:
Which will produce a myFile.xsd.
xsd myFile.xsd /c
This will generate a set of classes that you can add to your project, and then you can deserialize an xml file with this simple code:
The steps are slightly more complicated with JAXB, but it is still fairly easy.
First we have to generate an XSD file from an XML file. JAXB doesn’t do this itself as far as I know, but there is another tool we can use called Trang.
First step, download Trang, then run it like so:
java –jar trang.jar –I xml –O xsd myFile.xml myFile.xsd
You can also use the XSD.exe tool from Visual Studio if you have it installed or download it. There are a few other tools out there as well.
Once you have the XSD file, or if you already had one you had written, you need to generate Java classes using JAXB’s tool like so:
xjc –p my.package.name myFile.xsd –d myDirectory
Running this command will produce Java files that represent the elements in your XML document.
Finally, to create your objects you can use the JAXB unmarshaller.
Not as simple as the C# example, but really quite simple. I’ve omitted the steps like downloading JAXB and adding it to your class path, but you can see that the process really is not very painful at all.
JAXB also provides some options for customizing the serialization and deserialization.
Android (Simple XML)
You can’t use JAXB with Android. It seems like because of the Dalvik VM, the reflection part of JAXB doesn’t work.
I found a pretty good and small XML framework that I am using in my Android app that seems to do the trick nicely. You have to annotate your classes and create them by hand, but it is very simple and straightforward to do so.
The tool is called “Simple XML.”
You can find lots of examples on the “tutorial” tab on the web page.
Basically, you download Simple XML, add the jars to you class path, and create class files with some annotations to specify how to deserialize or serialize your XML.
Here is a small example of an annotated class, from the Simple XML website.
To deserialize this xml you just use the following code.
Very simple and straightforward. The only downside is generating the Java class files yourself, but that isn’t really very hard to do using the annotations.
So there you have it, XML serialization frameworks abound to make your life easier when dealing with XML. For most simple cases you should never handwrite XML parsing code, even when using a library to help you do it.
Now that I’ve shown you how easy it is, there really are no excuses!
As always, you can subscribe to this RSS feed to follow my posts on elegant code. Feel free to check out my main personal blog at http://simpleprogrammer.com, which has a wider range of posts, updated 2-3 times a week. Also, you can follow me on twitter here.