treetriples FAQ

What is unique about treetriples?

treetriples is designed to be processed by XPath, XSLT, and tree-based XML APIs.

treetriples offers abbreviated forms for lists, collections, statements, and XML literals, and requires producers to use them. It also requires triples with the same subject to be grouped together, and triples with the same subject and predicate to be grouped further. This creates a syntax which, while not canonical, can be reliably processed with XPaths that care about the nesting of elements, but not the precise ordering of element children.

The defaulting of identical subjects, and (subject, property) pairs works well for Turtle, makes the syntax more readable and more concise, and hopefully does the same for treetriples.

Why is it useful to use XSLT with RDF?

XSLT provides a solution for querying and document generation. Although languages such as SPARQL can operate on RDF graphs directly, whatever their syntax, SPARQL doesn't support generating documents with the extracted data.

It might be possible to combine multiple SPARQL queries, and process the SPARQL XML result-sets with XSLT, perhaps using XProc to glue everything together; but this has a lot more technology requirements than just an XSLT processor, and it isn't clear that this would be as successful at dealing with multiple dependant queries as direct XSLT access would be.

XSLT is poor at string processing and grouping though isn't it?

Some operations are tricky in XSLT, but there is always the possibility of using an EXSLT-style function library to help with common operations. Or XSLT 2.0. Or use a DOM-style library to process the XML; treetriples is designed to work well with XSLT and XPath, but it also works well with tree-based XML processors, such as DOM or JDOM.

What sort of documents might you want to transform RDF into?

What about support for contexts or named graphs?

I don't think that there is enough agreement on how to support contexts yet. In a format optimized for XPath access, there is the difficulty that separating named graphs at the top level makes querying over multiple graphs unintuitive, and the alternatives, of labelling triple at a fine-grained level seem like they would add a lot of complexity. There is nothing to stop someone from inventing a higher-level format that encapsulates multiple graphs in a single XML document. Or just use multiple files.

Why are the s, p, and o elements nested?

The nesting means that when there are are multiple triples with the same subject, or with the same subject and predicate; the subject, or subject and predicate don't have to be repeated.  This reduces redundancy and visual clutter, and improves the performance of queries by reducing the scope that has to be searched to find properties of a given subject.

Why not allow 'striping' (nesting additional predicates inside resource objects)?

Striping is harder to parse, and harder to generate.  Although striping can make the document more readable, it can also make it less readable. Generally speaking, a document is more readable if the most "important" resources are located at the top-level, but this conflicts with striping, which may seek to nest a resource because a triple forms an incoming link.

How can verbosity be reduced?

Unlike RDF/XML, treetriples doesn't use Namespaces to abbreviate URIs. Although not required, a suggested alternative for treetriples is to use an Internal DTD Subset, to abbreviate URIs using Entities.

This can easily be implemented by using a SAX ContentHandler and LexicalHandler that serializes the input, but attempting to do prefix matching on id attributes and replace them with an entity corresponding to the namespace prefix.

Example:

<?xml version="1.0" encoding="utf-8"?>
<!DOCTYPE rdf [
<!ENTITY rdf "http://www.w3.org/1999/02/22-rdf-syntax-ns#">
<!ENTITY foaf "http://xmlns.com/foaf/0.1/">
>

<rdf xmlns="http://djpowell.net/schemas/treetriples/1/">
  <s id="http://www.example.com">
    <p id="&rdf;type">
      <o id="&foaf;Document" />
    </p>
    <p id="&foaf;name">
      <o>Example Web Page</o>
    </p>
  </s>
</rdf>

XML parsers are REQUIRED to implement Internal DTD Subsets.

Why overload the id attribute for both bnode identifiers and URIs?

This is done with consideration of how the format will be processed by consumers. In most cases consumers will not interested in whether a resource is a bnode or a URI, as long as it can be used to match against other resources with an XPath. A single attribute makes it easier to write the XPaths. For cases where it is necessary to distinguish the two, it is easy to use the XPath function:

starts-with(@id, '_').

Why do I have to give an identifier to all bnodes?

If a bnode is only referenced in one place, in principle it wouldn't need a bnode identifier; however, in treetriples, id attributes are required for all bnodes.

If the tt:o/@id attribute was made optional, it would be difficult to identify nodes that represented empty literals.

It would be possible to use a special sentinel value as a bnode id, but in either case this would complicate the job of consumers who would need to explicitly check whether a bnode was anonymous before attempting to match it (because simple matching of sentinel values would match distinct anonymous bnodes. It is in-keeping with the philosophy of treetriples to simplify the parsing at the cost of complicating the generation, so producers must generate bnode ids even if the bnode isn't referenced anywhere else in the document.

Any tips for pretty-printing the XML?

What about xml:lang?

xml:lang is supported at <o> level for plain literals. It is not supported anywhere else as this would require xml:lang inheritance to be implemented by consumers, which would be difficult for the target audience of XPath processors, and not all that useful.

What about xml:base?

xml:base isn't supported anywhere, and the base-uri of the document is not relevant to parsing, because relative URI-refs are not allowed. Base URIs are difficult for XPath processors to work with: firstly, it is not possible to tell what the base URI is if it is set by the document URI or Content-Location; secondly, once you know the base URI, resolving URIs and relative xml:base attributes against it is very difficult in XSLT.

Setting a base-uri isn't very useful anyway: entities can be used to abbreviate URIs; RDF uses so many URIs that relative references would only be able to abbreviate a small fraction of them anyway.

XML Literals aren't affected by in-scope base-uris or language contexts either. This is the same as in RDF/XML. The exclusive canonicalization process used by parsers does not preserve these contexts.

What about XML literals?

I decided to add support for XML Literals, because the target audience, people using XSLT, hardly want to write an XML parser should they want to access the content of the XML.

XML Literals also make it easy to convert between treetriples and RDF/XML, because they both expect that the consumer, rather than the publisher will apply exclusive canonicalization to the input, so as long as you don't lose any white-space or comments, the resulting triple will have the same literal value.

XML Literals do make it hard to transform content to other formats such as RXR though. Because these formats don't have special support for XML Literals, they require the literal to be serialized to exclusive canonical XML by the publisher. This requires steps such as sorting attributes by their Unicode values - something that is not possible in XSLT.