I Hate You More Then Ever XML

I love Milkman Dan.Okay, so I don’t actually hate XML.

But recently I have been working on writing a syndication tool and I am beginning to agree with a lot of people that question the use of XML for simple data exchange. XML was originally supposed to be both machine and human readable, and in the case of using XML to create structured documents, like XHTML, it is. It was an offshoot of SGML but had much stricter and therefore simpler syntax rules. But then people started to try and use XML for any sort of communication over the network; CSV files got turned into XML (at no real gain other than it’s XML), protocols for method invocation over HTTP (SOAP), to defining the interface for those method invocations (WSDL) and now it seems, for any data exchange out there, a lot of people think that you need to do it in XML, and that you should define the XML via an XSD (XML Schema Definition).  Now XSDs I hate!  In defining the schema of an XML document using XML you are using an crude tool for the task of exchanging data by using a terrible tool for the task of defining a schema. XSD is painful unless you have some sort of tool to to help you.  Don’t believe me, here is the XSD for syndication. Maybe I am crazy but I think that a schema definition language should be human readable and I don’t think XSD is.  The arguments for XML are many, but mostly seem to revolve around it being a standard, and that there are a lot of tools that exist for it.  So XML has evolved from a simplification of SGML for the creation of structured documents, to a catch all hammer in the toolbox of many software designers. Soon people will start suggesting that we just write the programs that run XML based files in some sore of XML based programming language (oh wait, they did that already with XSL and XSLT). There has to be a better way.

Right now I have been looking at other data exchange formats and have been focusing on JSON and YAML. Both are more human readable (YAML even more so than JSON) and have less weight to them than XML for data exchange.   They are standards with decent library support and can cover any structured data format that XML can.  There is even a tool out there to create verifiable schemas for both JSON and YAML called Kwalify. I also am starting to think that there needs to be a language for defining schemas in a language/platform neutral way. This language could be used by tools to generate things like XSD (if you have to use XML), YAML for Kwalify, SQL etc.  This language becomes like a DSL (Domain Specific Language) for defining schemas.  I know there are a lot of people that think that creating a parser for a new language is hard, but using tools like ANTLR and yacc it’s fairly easy and a powerfull addition to your developers toolbox.  As Martin Fowler says, don’t be afraid of creating parsers! We need to start thinking about the proper use of XML as a tool. It has it’s place, but there are better tools out there for doing many of the things that XML is currently used for. Also, is the obsession with using XML for everything preventing us from creating even better tools?  It’s something we need to think about.

PS: Apologies to Max Cannon, and many thanks to folks that helped create Build Your Own Meat!

  1. I had no idea there were alternatives to XML, thanks for the info. More importantly, thanks for the roll your own comic link :p

  2. jp fielding

    i assume you’ve seen this too in your recent research http://code.google.com/apis/protocolbuffers/

  3. XML’s infinite extensibility is what makes it a blessing and a curse. Sure, XSD sucks, but so does every other generic data definition language out there. Try DTD or RELAX NG if you’re really up for a fight.

    I, too, am a big fan of JSON, but it pays for its lightweight nature at the expense strict typing, which is a dealbreaker in most data exchange scenarios.

    When it comes to *structured* data, there really is no viable alternative at the moment, IMHO. YAML is okay, but equally lacks type-definition (AFAIK) and looks too much like a COBOL/Python mashup to be friendly to the 3rd gen language folks like myself. 🙂

    … and there’s always our friend ANTLR. I love the idea, but haven’t had much success beyond the basics. I think there’s too high a learning curve for its niche/kitsch value that the average working stiff doesn’t have the time conquer.

    I love/hate XML, too. 🙂


  4. @jp —

    It’s time to coin YASF (Yet Another Serialization Format). First Facebook’s “Thrift”, now the not-very-different-but-with-a-far-lamer-name: “Protocol Buffers”.

    Each of them *does* solve the problem of datatyping and terseness for messaging formats, but why do they have to go “the extra mile” and provide a crap RPC interface? Why can’t we get out of the 90s?


  5. Andrew Tillman

    I think protocol buffer is interesting (name could use some improvement). I personally think that writing a grammer has about the same difficulty as writing an XSD, but what I really want people do think about the tool they are using.

    As to JSON, if typing being there is a problem, you could just make everything a string. Then it won’t matter, same with YAML. And if you use Kwalify you can make a schema if you need validation. I also know that YAML has some features that JSON doesn’t that might allow it to be used in more complex cases.

    XML has it’s place, but I don’t think it’s a good idea for every. At one of my previous companies I had to parse GB sized files that were XML representations of emails. Why not use mbox format? Would have been much smaller and there are tools out that that already handle mbox.

  6. dbt

    Relax NG is an alternative to XSD, and has a non-XML grammar (Relax Compact).

    It’s a pleasant alternative to XSD when XML is the correct document standard to start with.

  7. My interest is in EDI where CSV is stubornly popular. After 10 years XML still hasn\’t replaced message definitions and standards that are 30 years old.


    JSON can learn from Protocol Buffers, but I\’m convinced it can be better


  8. YAMl does need a schema definition language. Encoding nuetral would be ideal – it might server to even go back and look at ASN.1 – I wonder if XML encoding rules could be written for ASN.1?

    Ideally – a YAML-like syntax that was encoding neutral would be ideal. Looking at ASN.1 as a example – having a varitey of well definited types that are then later bound to the encoding rules would seem to work.