What’s wrong with RDF/XML?
RDF is a very unusual XML vocabulary. It starts from the premise that graphs are a superset of trees, and then tries to get as much of the graph model into XML’s hierarchical structure as possible, then uses hyperlinks (via rdf:ID, rdf:resource, etc), to handle the rest. Whilst this seems a reasonable approach for humans, it doesn’t work so well for machines. It leads to a syntax so variable that it’s superficial resemblance to XML is more of a hazard than a benefit to human-authorability. Admittedly human-authored RDF/XML is one of most concise, and human-readable XML syntaxes possible for RDF, but the same can’t be said about machine-readability. Whilst an XML syntax such as Atom is simple to process using an off-the-shelf XML parser, and a small amount of code; an RDF/XML parser is an unbelieveably complicated piece of software.
Rather than use a single tag to represent a resource, with an attribute to represent the URL or id; RDF/XML embeds the URI or id in the tag name itself by (ab?)using XML Namespaces. This might be concise, but it is not particularly compatible with general purpose tools such as XML validators. Namespaces weren’t designed as a generic URI compression technology, and it shows. Due to the limitations in the character-model used by localNames, there exist many URIs that can’t be decomposed into an nsUri/localName pair, and there is no way to write predicates in RDF/XML other than as nsUri/localName pairs. This makes it impossible to serialise graphs containing predicates that end in ‘/’ or ‘#’ to RDF/XML, even though they are valid in the RDF model, and can be serialised to other syntaxes.
For me though, the biggest frustration with RDF/XML stems from the fact that despite being an XML syntax, it is impossible to use XML tools such as XSLT to process it. I want an XML syntax that is processable by humans, dedicated RDF tools, and generic XML technologies such as XSLT.
Do we need to base the syntax on XML?
Turtle is a great RDF syntax that isn’t XML-based. It is probably more human-readable than the syntax that I am currently working on, but I still think that that there is a need for a decent, human-readable XML-based syntax.
XML isn’t without issues:
- Verbosity
- The Turtle syntax is designed for humans to read, XML is designed for machines to read. No XML syntax is ever going to approach the aesthetics of Turtle.
- Poor support for inline binary content
- However, this isn’t such a big deal for RDF, as the RDF model also has poor support for binary content.
- Problematic MIME-types
- The interactions between XML, RFC3023, the MIME RFCs, and HTTPs almost-but-not-quite-MIME encapsulation are well-known.
Despite these issues however, XML has such a massively deployed infrastructure, and there are lots of good reasons for prefering an XML based syntax for RDF:
- Widely deployed XML parsers: SAX, DOM, StaX, etc...
- An XML base makes it easy to write parsers for simple XML-based RDF syntaxes. A variety of parser models are available.
- XSLT, XPath, XQuery, XInclude and XProc
- Technologies such as XSLT allow not only the data to be easily portable across platforms, but the code as well. And the performance of XSLT is good enough for Microsoft to use it for the RSS/Atom normalization in Windows/IE7’s Feed Platform.
- Validation
- DTDs may be pretty feeble, and XSD may be a monster, but RelaxNG is great. Validation is important, and validation technologies for XML are prevalent.
Is it worth making a new syntax?
When the subject of Binary XML gets raised, it usually gets some push-back. This is probably because, all that XML standardizes is a syntax; once you diverge from the XML 1.0 serialisation, there isn’t much left (OK, there is the the Infoset, but it’s not a very usable application model) . With RDF however, because all interoperability is at the RDF model layer, and because the model is so simple, it is feasable to use different syntaxes. Although RDF/XML is the only syntax blessed by the W3C Recommendations; other syntaxes such as N-Triples, N3, Turtle, TriX, RXR, and TriplesML exist without seriously fragmenting interoperability. Of course part of the reason why these other syntaxes exist is because of the serious problems with RDF/XML.
Anyway, I’ve got a RelaxNG schema, some XSLTs, and documentation of the design decisions, which I will post here later, but in the meantime, here is a quick teaser — the feed from the RSS 1.0 examples, converted to my proposed syntax:
<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rdf [
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<!ENTITY rss "http://purl.org/rss/1.0/">
]>
<rdf xmlns="http://djpowell.net/schemas/treetriples/1/">
<s id="http://www.xml.com/xml/news.rss">
<p id="&rss;link">
<o>http://xml.com/pub</o>
</p>
<p id="&rss;title">
<o>XML.com</o>
</p>
<p id="&rss;items">
<o id="_:b0" />
</p>
<p id="&rss;description">
<o>XML.com features a rich mix of information and services
for the XML community.</o>
</p>
<p id="&rss;textinput">
<o id="http://search.xml.com" />
</p>
<p id="&rdf;type">
<o id="&rss;channel" />
</p>
<p id="&rss;image">
<o id="http://xml.com/universal/images/xml_tiny.gif" />
</p>
</s>
<s id="http://xml.com/pub/2000/08/09/xslt/xslt.html">
<p id="&rss;link">
<o>http://xml.com/pub/2000/08/09/xslt/xslt.html</o>
</p>
<p id="&rss;title">
<o>Processing Inclusions with XSLT</o>
</p>
<p id="&rss;description">
<o>Processing document inclusions with general XML tools can be
problematic. This article proposes a way of preserving inclusion
information through SAX-based processing.</o>
</p>
<p id="&rdf;type">
<o id="&rss;item" />
</p>
</s>
<s id="http://xml.com/pub/2000/08/09/rdfdb/index.html">
<p id="&rss;link">
<o>http://xml.com/pub/2000/08/09/rdfdb/index.html</o>
</p>
<p id="&rss;title">
<o>Putting RDF to Work</o>
</p>
<p id="&rss;description">
<o>Tool and API support for the Resource Description Framework
is slowly coming of age. Edd Dumbill takes a look at RDFDB,
one of the most exciting new RDF toolkits.</o>
</p>
<p id="&rdf;type">
<o id="&rss;item" />
</p>
</s>
<s id="http://xml.com/universal/images/xml_tiny.gif">
<p id="&rss;link">
<o>http://www.xml.com</o>
</p>
<p id="&rss;title">
<o>XML.com</o>
</p>
<p id="&rss;url">
<o>http://xml.com/universal/images/xml_tiny.gif</o>
</p>
<p id="&rdf;type">
<o id="&rss;image" />
</p>
</s>
<s id="http://search.xml.com">
<p id="&rss;link">
<o>http://search.xml.com</o>
</p>
<p id="&rss;title">
<o>Search XML.com</o>
</p>
<p id="&rss;description">
<o>Search XML.com's XML collection</o>
</p>
<p id="&rdf;type">
<o id="&rss;textinput" />
</p>
<p id="&rss;name">
<o>s</o>
</p>
</s>
<d parse="seq" id="_:b0">
<o id="http://xml.com/pub/2000/08/09/rdfdb/index.html" />
<o id="http://xml.com/pub/2000/08/09/xslt/xslt.html" />
</d>
</rdf>