Summary of Serialization Process

This process describes how to convert an RDF graph to a document in treetriples. treetriples has strict serialisation requirements to ensure that the document is uniform enough to be easily processed with XML tools.

The process serializes all of the triples in a graph by first processing structures such as collections and containers, which have special abbreviated syntactic forms, before serializing the remaining triples using the generic syntax.

1. Initialisation

Construct a <tt:rdf> element as the root of the output document. In this document, the 'tt' prefix corresponds to the namespace URI: <http://djpowell.net/schemas/treetriples/1/>. The namespace prefix is not significant; implementations MAY use a different namespace, or the default namespace.

Construct an empty set called 'processed_triples'. This will record the triples processed by the early stages of the process so that they don't get processed again generic triples.

2. Serializing well-formed collections

For the purposes of this process, a set of triples forms a well-formed collection if:

The collection is terminated with the triple ( ? , rdf:rest , rdf:nil ); and
Each node in the collection contains exactly one rdf:first property, and exactly one rdf:rest property; and
The collection does not include the same list node more than once, which would cause loops in the collection.

Find the start of each collection by working backwards from each ( ? , rdf:rest , rdf:nil ) triple, until the start of the collection is reached, or until a malformed list node is reached.

Malformed collections are represented using the well-formed collection syntax to represent the well-formed tail-end of the collection (if any), and the generic syntax to represent any remaining triples that could not be represented using the well-formed syntax.

A well-formed collection is represented by adding a <tt:d parse="list"> element as a child of the root <tt:rdf> element.

For each member of the collection, append a <tt:o> element to the list of child elements of the <tt:d> element (see 'Serialising objects').

A <tt:o> element MAY have a listId attribute containing the id of the list node resource. Usually, only the first element would have a listId, which is effectively the id of the list. List nodes with URIs, or list nodes that are referenced by triples in the graph outside of the well-formed collection will need to be assigned listId attributes (see 'Serializing ids').

To the processed_triples set, add the triples ( ?, rdf:first, ? ), and ( ?, rdf:rest, ? ) corresponding to each of the triples that has been represented by the tt:d element.

Note that like RDF/XML, the treetriples collection syntax does not explicitly create a ( ?, rdf:type, rdf:List ) triple for each node.

3. Serializing well-formed containers

For the purposes of this process, a set of triples with a given subject forms a well-formed container if:

The container is a member of, at most, one of these classes: rdf:Alt, rdf:Bag, rdf:Seq; and
The container does not contain members with duplicate indexes; and
The member indexes of the container are contiguous, and starting with rdf:_1; and
The container contains at least one member.

If the container is not well-formed, the well-formed container syntax MUST NOT be used for any triples with that subject.

Containers MAY be well-formed, yet still use the generic syntax for any triples that cannot be represented using the well-formed container syntax. For example rdf:type properties on the container other than rdf:Alt, rdf:Bag, and rdf:Seq.

A well-formed container is represented by a <tt:d> element as a child of the root <tt:rdf> element, with a 'parse' attribute which indicates the type of the container:

Type Triple	'parse' attribute
( ? , rdf:type , rdf:Alt )	alt
( ? , rdf:type , rdf:Bag )	bag
( ? , rdf:type , rdf:Seq )	seq
none of the above	container

Note that the parse="container" syntax does not explicitly create a (?, rdf:type, rdfs:Container) triple.

Append a <tt:o> element to the children of the <tt:d/> element for each member of the container (see 'Serialising objects').

To the processed_triples set, add the triple ( ?, rdf:type, ? ), corresponding to the parse attribute as described above, if any. Also add a ( ?, rdf:_?, ? ) triple corresponding to each of the triples that has been represented by the tt:o elements.

Note that the well-formed containers process operates over the full-set of triples, including those that were serialized as well-formed collections. It is possible, albeit unlikely, that both processes could generate the same triple; this is not a problem.

3. Serializing well-formed unasserted reified statements

The syntax has two different forms for writing reified statements depending on whether the reified statement is also asserted as a triple or not.

For the purposes of this process, a set of triples with a given subject forms a well-formed reified statement if:

The subject has exactly one rdf:subject property, exactly one rdf:predicate property, and exactly one rdf:object property; and
The subjet has an rdf:type property with the value rdf:Statement.

Unasserted reified statements MAY be well-formed, yet still use the generic syntax for any triples that cannot be represented using the well-formed unasserted reified statement syntax. For example provenance properties on the statement.

All unasserted reified statements are contained in a single <tt:d parse="statement"> element as a child of the root <tt:rdf> element. There MUST NOT be more than one <tt:d parse="statement"> container per document.

Inside the <tt:d> element, add one or more <tt:s> elements (containing <tt:p> and <tt:o> descendents), representing one or more reified statements. The syntax rules for these elements is the same as the generic syntax for triples (see 'Serializing generic triples'), except that each <tt:o> element must have a 'stmtId' attribute so that it can be reified.

The <tt:d parse="statement"> element MUST NOT contain other <tt:d> elements, so unasserted reified statements cannot directly reify higher-level constructs such as containers; higher-level constructs can be reified by using the generic syntax for them within the <tt:d> element.

The semantics of the <tt:s>, <tt:p>, and <tt:o> descendents is the same as the generic syntax for triples, except that rather than being asserted in the graph, these triples will be represented as reified rdf:Statement structures.

To the processed_triples set, add the triples: (?, rdf:type, rdf:Statement), (?, rdf:subject, ?), (?, rdf:predicate, ?), and (?, rdf:object, ?) corresponding to the resource represented by the stmtId and the elements of its reified statement.

4. Selecting well-formed asserted reified statements

Well-formed asserted reified statements also have an abbreviated syntax, which is represented by annotating generic triples with a 'stmtId' attribute. Before generic statements can be serialized, it is necessary to identify all of the triples that have been represented so far, and the reifications of any remaining reified statements. Only the remaining triples should be processed in the next step.

5. Serializing generic triples

Triples that don't form part of collections, containers, or reified statements are serialized using the generic syntax. Triples are grouped by their subject, so that the subject doesn't need to be repeated when it occurs as the subject of multiple triples. Triples with the same subject are also grouped by predicate, so that the subject and predicate don't need to be repeated when there are multiple triples with the same subjects and predicates.

Common Steps

A. Serializing ids

If a resource has a URI, then the id is simply the URI.

Otherwise, the resource is a bnode. A bnode is serialized by creating an id that consists of the characters, "_:"; followed by a character from the class, [a-zA-Z_]; followed by zero or more characters from the class, [a-zA-Z0-9.\-_].

In other words, a bnode id must match the regular expression: _:[a-zA-Z_][a-zA-Z0-9.\-_]*

Note that backslash (\) is a meta-character used to escape the following hyphen (-) in the regular expressions above, and is not allowed to appear in the id itself.

bnode ids are a subset of the strings allowed as XML NCNames.

B. Serializing objects

If an object is an XML literal then the object is represented as a <tt:o> element, with the XML fragment as its children.

The consumer can obtain the string value of the XML Literal by interpreting it as Exclusive Canonical XML with comments and an empty inclusive namespaces prefix list. These are the same rules as used by RDF/XML XML Literals. (see "Significance of white-space").

If an object is a literal then the object is represented as a <tt:o> element, with the value of the literal as the text content (see "Significance of white-space"). Additionally:

if the literal has a language tag, then the language is represented as an xml:lang attribute on the tt:o element;
if the literal is a typed literal, then the data-type is represented as a datatype attribute on the tt:o element

If the object is a URI resource then the object is represented as a <tt:o> element, with the URI as the value of

C. Significance of white-space

It is important that a literal retains the same string value after serializing and parsing RDF to and from treetriples. For example, the string "[sp][sp]a[sp]b[sp][sp]c[sp][sp]" (where the space character is denoted by [sp]), retains these leading, trailing, and consecutive spaces.

Therefore, white-space inside <tt:o> elements should be preserved. Care should be taken not to accidentally introduce extra white-space, or collapse leading, trailing, or consecutive white-space; for example, by attempting to pretty-print the XML.

It is not necessary to preserve white-space in elements other that <tt:o>, as these elements do not contain text.