openarchives.org

ORE User Guide - Primer

️Fri Oct 03 2008

1. Why use OAI-ORE?

In the physical world we create, use, and refer to aggregations of things all the time. We collect pictures in a photo album, read journals that are collections of articles, and burn CDs of our favorite songs. In this physical world these aggregations are frequently tangible - we can hold the photo album, journal, and CD. But, we also aggregate abstract entities - for example classification schemes aggregate abstract subjects into broader abstract groups.

This practice of aggregating extends to the Web. We accumulate URL's in bookmarks or favorites lists in our browser, collect photos into sets in popular sites like Flickr, browse over multiple page documents that are linked together through "prev" and "next" tags, and talk about Web sites as if they had some real existence beyond the set of pages of which they consist. Despite our frequent use of these aggregations, their existence on the Web is quite ephemeral. One reason for this is that there is no standard way to identify an aggregation. We often use the URI of one page of an aggregation to identify the whole aggregation. For example, we use the URI of the first page of a multi-page Web document to identify the whole document, or we use the URI of the HTML page that provides access to a Flickr set to identify the entire set of images. But those URIs really just identify those specific pages, and not the union of pages that makes up the whole document, or the union of all images in a Flickr set, respectively. In essence, the problem is that there is no standard way to describe the constituents or boundary of an aggregation, and this is what OAI-ORE aims to provide.

Because aggregations are not well-defined on the Web, we are limitited in what we can do with them, especially in terms of the services or automated processes that make the Web useful. People who wish to save or print a multiple page document must manually click through each page and invoke the appropriate browser command. Programs that transfer multiple page documents among information systems must rely on the API's of the individual system architectures and their definition of document boundaries. Search engines must use heuristics to group individual Web pages into logical documents so that search results have the proper granularity.

This primer describes the essence of the solution that OAI-ORE provides to deal with aggregations of Web resources; it is intended for a general audience that wants to obtain a high-level understanding of the OAI-ORE solution. The primer also provides pointers to OAI-ORE specifications and implementation guidelines, which provide detailed information for implementers. This primer is structured as follows:

Section 1 (this section) describes the motivation for using OAI-ORE.
Section 2 presents an example of an aggregation that is used throughout the rest of this document.
Section 3 introduces the conceptual foundations upon which the OAI-ORE solution is built.
Section 4 introduces the essence of the OAI-ORE solution to the aggregation problem: Aggregations and Resource Maps that describe Aggregations.
Section 5 shows simple Resource Maps using the XML-based Atom and RDF/XML syntax.
Section 6 describes the use of HTTP, the fundamental protocol of the Web, for making Resource Maps and Aggregations visible to browsers and Web agents such as search engines.
Section 7 provides a map to the remainder of the OAI-ORE documents.

2. Motivating Example

The aggregation problem that ORE addresses can be explained by means of a document in the arXiv, a well-known repository of physics, mathematics, and computer science research results. The human start page for this document is shown in Figure 1. Some aspects of the page relevant to the ORE aggregation problem are highlighted in red rectangles, each with a number. The meanings of the highlighted areas are as follows:

The URI http://arxiv.org/abs/astro-ph/0601007 of the human start page.
The formats in which the document is available, i.e. PostScript, PDF, etc. These are effectively the constituents of the aggregation that is the arXiv document. For the remainder of this example we will consider this human start page, the splash page, as also a constituent of the aggregation
The title of the arXiv document.
The authors of the arXiv document.
The creation and last modification date of the arXiv document.
Identifiers of entities that are in some manner comparable to this arXiv document. For example, a version of this document was later published as an article in a peer-reviewed journal, and the Digital Object Identifier of that article is shown.
The versions of this document.
Links to other arXiv documents in the same collection (i.e., astro-ph).
Citations made by this arXiv document, and citations it received from other documents.

Figure 1: Human start page for an arXiv document

Figure of human start page

This rather simple example highlights the core issues that ORE addresses:

The URI of the human start page is commonly used as the URI for the entire arXiv document. As indicated in the section Why use OAI-ORE?, this is not appropriate as this URI identifies the page itself, and not the aggregation that is the arXiv document.
The human start page can readily be understood by a human viewer, who has the ability to distinguish between constituents of the document, relationships of the document to other documents, identifiers related to this document, version of this document, etc. However, the human start page can not be interpreted unambiguously by a machine agent without an implementation of arXiv-specific heuristics. For example, a machine agent can not make the distinction between links in the human start page that point at constituents of the arXiv document (e.g. the PostScript, PDF, etc.), and links that point at information that is clearly outside of the document such as the navigational aids shown as (8) in Figure 1.

3. Foundations of ORE

ORE solves the aforementioned problems by introducing a URI for the aggregation that denotes the entire arXiv document, and by publishing a machine-readable document that describes that aggregation. For example, the document describes which resources are part of the aggregation, and which are merely related to it. This section briefly introduces the foundations upon which the ORE solution to the aggregation problem is built.

3.1. Architecture of the World Wide Web

The foundations of the Web as we know it are detailed in the Architecture of the World Wide Web [Web Architecture]. This architecture defines the following core notions:

Resource, an item of interest.
URI, a global identifier for a Resource.
Representation, a datastream corresponding to the state of a Resource at the time its URI is dereferenced via some protocol (e.g. HTTP).
Link, a directed connection between two Resources.

3.2. Semantic Web, Linked Data, Cool URIs for the Semantic Web

On the Web that we use on a daily basis, URIs are used primarily to identify Web documents. They are identifiers that, when dereferenced, return a human-readable Representation. However, on the Semantic Web, URIs are introduced to identify so-called real world entities, such as people or cars, or even abstract entities, such ideas or classes. Since these things are not documents, they have no Representation to indicate what these Resources mean. The Linked Data Effort [Linked Data Tutorial] describes an approach for obtaining information about those Resources despite the fact that they have no Representation. To summarize, the approach consists of:

Using HTTP URIs to identify those special Resources;
Publishing a document that provides information about the special Semantic Web Resource at a HTTP URI other than the HTTP URI of the Semantic Web Resource;
Using Cool URIs for the Semantic Web [Cool URIs] to allow discovering the HTTP URI of that document from the HTTP URI of the special Semantic Web Resource.

3.3 Resource Description Framework (RDF)

The documents that are proposed by the Linked Data effort to describe these abstract Resources are typically expressed in RDF/XML, which is an XML-based serialization for the Resource Description Framework (RDF) [RDF Concepts] that forms the foundational data model of the Semantic Web. This model consists of subject-predicate-object statements called triples. Triples express relationships pertaining to a subject Resource denoted by a URI. The predicate Resource, also denoted by a URI, indicates the nature of the relationship . The object expresses the actual value for the relationship expressed by the predicate; the object can be denoted by a URI or can be a literal value, such as a string or a number. When multiple triples are expressed, or asserted, they may share subjects and objects and, as a result they conceptually join together in what is called a graph in mathematical terms. This graph consists of nodes that are the Resources denoted by the subject and object URIs, and edges that are the relationship predicates.

An example is shown in Figure 2. The figure shows four RDF triples, which are then depicted in the graph in which Resources are yellow circles that list their URIs. Note that because R1 is the subject of two triples, it has two outgoing edges in the graph. Similarly since R2 is the object of two triples, it has two incoming edges. The illustration also shows a triple that has a literal string (e.g. "John Doe") as its object.

Figure 2: Four RDF triples and their graph representation

Figure of RDF triples and graph representation

4. ORE in a Nutshell

ORE leverages the foundations described above to arrive at a solution to handle aggregations of Web resources. The essence of the ORE solution can be summarized as follows (Figure 3):

In order to be able to unambiguously refer to an aggregation of Web resources, a new Resource is introduced that stands for a set or collection of other Resources. This new Resource, named an Aggregation, has a URI just like any other Resource on the Web does. And, since an Aggregation is a conceptual construct, it qualifies as one of those Semantic Web Resources that does not have a Representation.
Following the Linked Data guidelines, another Resource is introduced to make information about the Aggregation available. This new Resource, named a Resource Map, has a URI and it has a machine-readable Representation that provides details about the Aggregation. In essence, a Resource Map expresses which Aggregation it describes (the ore:describes relationship in Figure 3), and it lists the resources that are part of the Aggregation (the ore:aggregates relationship in Figure 3). But, a Resource Map can also express relationships and properties pertaining to all these Resources, as well as metadata pertaining to the Resource Map itself, e.g. who published it and when it was most recently modified (the dcterms:creator and dcterms:modified relationships in Figure 3). Resource Maps can be expressed in different formats including Atom XML, RDF/XML, RDFa, n3, turtle, and other RDF serialization formats. Serialization examples are shown in the Resource Map Serialization section, and described in detail in the implementation guidelines that show how to express Resource Maps in Atom XML and in RDF/XML and RDFa.
In order to make ORE work in the HTTP-based Web, both the Aggregation and the Resource Map are assigned HTTP URIs, and the Cool URI for the Semantic Web guidelines are adopted to support discovery of the HTTP URI of a Resource Map given the HTTP URI of an Aggregation. The section Resource Maps and Aggregations on the Web provides some basic information with this regard, and an HTTP implementation guideline is available.
ORE also introduces the notion of a Proxy resource, which stands for an Aggregated Resource in the context of a specific Aggregation. The URI of a Proxy resource provides a mechanism for denoting a resource in context and is described in the ORE Data Model and in the Resource Map Implementation documents.

Figure 3: The Aggregation A-1 aggregates three Resources and is described by Resource Map ReM-1

ORE basic model

5. Resource Map Serialization

ORE supports Resource Map serializations in RDF/XML, RDFa, and Atom XML. Below, examples are shown of RDF/XML and Atom XML Resource Maps that convey some essential information pertaining to the example arXiv document. Note that the URI http://arxiv.org/aggregation/astro-ph/0601007 was introduced as the HTTP URI to identify the Aggregation that denotes the arXiv document.

5.1 Resource Maps in RDF/XML

Figure 3 shows some of the core relationships introduced by the ORE Data Model that is entirely based on RDF. Because of that, a Resource Map that describes an Aggregation can readily be expressed in RDF/XML and other RDF serialization formats such as n3 and turtle. Table 1 shows a simple RDF/XML Resource Map that describes the arXiv Aggregation http://arxiv.org/aggregation/astro-ph/0601007. The comments in the RDF/XML document explain how the various RDF statements relate to the ORE concepts introduced in the section ORE in a Nutshell.

Table 1: A simple Resource Map for the arXiv Aggregation serialized in RDF/XML

<?xml version="1.0" encoding="utf-8"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" 
    xmlns:ore="http://www.openarchives.org/ore/terms/"
    xmlns:dcterms="http://purl.org/dc/terms/"
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:foaf="http://xmlns.com/foaf/0.1/" >
    <!-- About the Aggregation for the ArXiv document -->
    <rdf:Description rdf:about="http://arxiv.org/aggregation/astro-ph/0601007">
        <!-- The Resource is an ORE Aggregation  -->
        <rdf:type rdf:resource="http://www.openarchives.org/ore/terms/Aggregation"/>
        <!-- The Aggregation aggregates ... -->
        <ore:aggregates rdf:resource="http://arxiv.org/abs/astro-ph/0601007"/>
        <ore:aggregates rdf:resource="http://arxiv.org/ps/astro-ph/0601007"/>
        <ore:aggregates rdf:resource="http://arxiv.org/pdf/astro-ph/0601007"/>
        <!-- Metadata about the Aggregation: title and authors -->
        <dc:title>Parametrization of K-essence and Its Kinetic Term</dc:title>
        <dcterms:creator rdf:parseType="Resource">
            <foaf:name>Hui Li</foaf:name>
            <foaf:mbox rdf:resource="mailto:lihui@somewhere.cn"/>
        </dcterms:creator>
        <dcterms:creator rdf:parseType="Resource">
            <foaf:name>Zong-Kuan Guo</foaf:name>
        </dcterms:creator>
        <dcterms:creator rdf:parseType="Resource">
            <foaf:name>Yuan-Zhong Zhang</foaf:name>
        </dcterms:creator>
    </rdf:Description>
    <!-- About the Resource Map (this RDF/XML document) that describes the Aggregation -->
    <rdf:Description rdf:about="http://arxiv.org/rem/atom/astro-ph/0601007">
        <!-- The Resource is an ORE Resource Map  -->
        <rdf:type rdf:resource="http://www.openarchives.org/ore/terms/ResourceMap"/>
        <!-- The Resource Map describes a specific Aggregation   -->
        <ore:describes rdf:resource="http://arxiv.org/aggregation/astro-ph/0601007"/>
        <!-- Metadata about the Resource Map: datetimes, rights, and author -->
        <dcterms:modified>2008-10-03T07:30:34Z</dcterms:modified>
        <dcterms:created>2008-10-01T18:30:02Z</dcterms:created>
        <dc:rights>This Resource Map is available under the Creative Commons Attribution-Noncommercial Generic license</dc:rights>
        <dcterms:rights rdf:resource="http://creativecommons.org/licenses/by-nc/2.5/rdf"/>
        <dcterms:creator rdf:parseType="Resource">
            <foaf:page rdf:resource="http://arxiv.org"/>
            <foaf:name>arXiv.org e-Print Repository</foaf:name>
        </dcterms:creator>
    </rdf:Description>
    <!-- About the human start page that is part of the Aggregation -->
    <rdf:Description rdf:about="http://arxiv.org/abs/astro-ph/0601007">
        <dc:format>text/html</dc:format>
        <dc:title>[astro-ph/0601007] Parametrization of K-essence and Its Kinetic Term</dc:title>
        <rdf:type>info:eu-repo/semantics/humanStartPage</rdf:type>
    </rdf:Description>
    <!-- About the PostScript resource that is part of the Aggregation -->
    <rdf:Description rdf:about="http://arxiv.org/ps/astro-ph/0601007">
        <dc:format>application/postscript</dc:format>
        <dc:language>en</dc:language>
        <dc:title>Parametrization of K-essence and Its Kinetic Term</dc:title>
    </rdf:Description>
    <!-- About the PDF resource that is part of the Aggregation -->
    <rdf:Description rdf:about="http://arxiv.org/pdf/astro-ph/0601007">
        <dc:format>application/pdf</dc:format>
        <dc:language>en</dc:language>
        <dc:title>Parametrization of K-essence and Its Kinetic Term</dc:title>
    </rdf:Description>
</rdf:RDF>

5.2 Resource Maps in Atom XML

Atom is an XML-based format that was originally designed as a mechanism for syndicating feeds from news sources, blogs, and other dynamic Web sites. In that manner it is like the many versions of RSS. The design of Atom is more recent and includes modern XML features such as namespaces and, as a result has a flexible extensibility mechanism allowing elements and relationships from other namespaces. Because of this, in recent years the use of Atom has been extended to many purposes such as packaging descriptions of a variety of Web Resources.

This is not a primer for Atom and the interested reader is referred to RFC-4287 that fully describes Atom. The reader of this primer only needs to be aware of the basic entities in Atom, as shown in the Figure 4, including:

A feed that has a globally unique id, a number of links, and other metadata, and which contains a number of constituent entries;
One or more entries that each have a globally unique id, a number of links, and other metadata.

Figure 4: An Atom Feed with two Atom Entries

Atom Feed/Entry structure

An Atom document may either be one feed with multiple entries as shown in Figure 4, or may just be a single entry that is not contained within a feed. ORE leverages the latter, expressing a Resource Map as an Atom entry. Table 2 shows an Atom entry that describes the arXiv Aggregation http://arxiv.org/aggregation/astro-ph/0601007. The comments in the Atom XML document explain how the various Atom elements relate to the ORE concepts introduced in the section ORE in a Nutshell.

Table 2: A simple Resource Map for the arXiv Aggregation serialized in Atom XML

<?xml version="1.0" encoding="UTF-8"?>
<atom:entry 
    xmlns:dcterms="http://purl.org/dc/terms/" 
    xmlns:atom="http://www.w3.org/2005/Atom"
    xmlns:ore="http://www.openarchives.org/ore/terms/" 
    xmlns:foaf="http://xmlns.com/foaf/0.1/">
    <atom:id>tag:arxiv.org,2008:astro-ph:0601007</atom:id>
    <!-- About the Aggregation for the ArXiv document -->
    <!-- This Atom entry describes a specific ORE Aggregation -->
    <atom:link href="http://arxiv.org/aggregation/astro-ph/0601007"
        rel="http://www.openarchives.org/ore/terms/describes"/>
    <atom:category term="http://www.openarchives.org/ore/terms/Aggregation"
        scheme="http://www.openarchives.org/ore/terms/" label="Aggregation"/> 
    <!-- The Aggregation aggregates ... -->
    <atom:link href="http://arxiv.org/abs/astro-ph/0601007"
        rel="http://www.openarchives.org/ore/terms/aggregates"
        title="[astro-ph/0601007] Parametrization of K-essence and Its Kinetic Term"
        type="text/html" />
    <atom:link href="http://arxiv.org/ps/astro-ph/0601007"
        rel="http://www.openarchives.org/ore/terms/aggregates"
        title="Parametrization of K-essence and Its Kinetic Term" type="application/postscript"
        hreflang="en"/>
    <atom:link href="http://arxiv.org/pdf/astro-ph/0601007"
        rel="http://www.openarchives.org/ore/terms/aggregates"
        title="Parametrization of K-essence and Its Kinetic Term" type="application/pdf"
        hreflang="en"/>
    <!-- Metadata about the Aggregation: title and authors -->
    <atom:title>Parametrization of K-essence and Its Kinetic Term</atom:title>
    <atom:author>
        <atom:name>Hui Li</atom:name>
        <atom:email>lihui@somewhere.cn</atom:email>
    </atom:author>
    <atom:author>
        <atom:name>Zong-Kuan Guo</atom:name>
    </atom:author>
    <atom:author>
        <atom:name>Yuan-Zhong Zhang</atom:name>
    </atom:author>
    <!-- About the Resource Map (this Atom XML entry document) that describes the Aggregation -->
    <!-- The HTTP URI of this Resouce Map -->
    <atom:link href="http://arxiv.org/rem/atom/astro-ph/0601007"
        rel="self"
        type="application/atom+xml"/>
    <!-- Metadata about the Resource Map: datetimes, rights, and author -->
    <atom:updated>2008-10-03T07:30:34Z</atom:updated>
    <atom:published>2008-10-01T18:30:02Z</atom:published>
    <atom:rights>This Resource Map is available under the Creative Commons  Attribution-Noncommercial Generic license</atom:rights>
    <atom:link href="http://creativecommons.org/licenses/by-nc/2.5/rdf"
        rel="license"
        type="application/rdf+xml"/>
    <atom:source>
        <atom:author>
            <atom:name>arXiv.org e-Print Repository</atom:name>
            <atom:uri>http://arxiv.org</atom:uri>
        </atom:author>
    </atom:source>   
    <!-- About the human start page that is part of the Aggregation -->
    <atom:link href="http://arxiv.org/abs/astro-ph/0601007" rel="alternate"/>
</atom:entry>

6. Resource Maps and Aggregations on the Web

When a Resource Map is published on the Web, its URI can be dereferenced by an HTTP protocol request that returns an RDF/XML or Atom XML document as shown in the section Resource Map Serializations. Clients and agents can then interpret that document and provide enhanced services based on the included information. These include navigation, printing, archiving, visualizing, and transforming the Aggregation.

The reverse functionality is also important. Clients that get access to the HTTP URI of an Aggregation, via a citation or another form of linking, should be able to discover that the Resource identified by that HTTP URI is indeed an Aggregation, and to subsequently access a Resource Map that describes the Aggregation.

As noted, however, an Aggregation is a one of those special Semantic Web resources: dereferencing its URI via an HTTP protocol request does not yield a Representation. This section briefly describes two methods that ORE recommends for gaining access to a Resource Map that describes an Aggregation, given the HTTP URI of that Aggregation. These two methods are based on guidelines from the Semantic Web community that are fully articulated in the Cool URIs for the Semantic Web specification [CoolURIs].

6.1 HTTP 303 Forwarding from the Aggregation URI to the Resource Map URI

This method is appropriate in applications where the party that introduces an Aggregation and a Resource Map that describes it has control over a Web server. It also the recommended approach when Resource Maps in multiple formats, such as both Atom XML and RDF/XML, are published to describe the same Aggregation.

The mechanics of this method are as follows. When the server receives an HTTP request for the Aggregation URI A-1 it returns an HTTP 303 status code (which means "see also") with a redirection to the Resource Map URI ReM-1. The browser or agent may then make a new HTTP request for ReM-1. Requests for URI A-1 can also employ 303 redirection with content negotiation [RFC2616 , CoolURIs] to include redirection to a Resource Map in one of several formats.

Example URIs are:

Aggregation:   A-1   = http://example.org/foo
Resource Map:  ReM-1 = http://example.org/foo.rdf

and additional serializations may be added following the URI pattern:

               ReM-2 = http://example.org/foo.atom

6.2 The Aggregation URI is a Hash URI

This method does not require that the party that introduces an Aggregation and a Resource Map controls a Web server. The URI of the Aggregation A-1 is constructed by appending a fragment identifier #aggregation to the Resource Map URI ReM-1. Example URIs are:

Aggregation:  A-1   = http://example.org/foo.rdf#aggregation
Resource Map: ReM-1 = http://example.org/foo.rdf

As defined by HTTP [RFC2616], an agent should strip off the fragment identifier before issuing an HTTP request to the server. The result is that the server request is actually to ReM-1. But, by introducing the frament identifier, the URIs A-1 and ReM-1 still identify different Resources as defined by the Architecture of the World Wide Web [Web Architecture].

7. What should you read now?

This primer has briefly introduced concepts and approaches used in the OAI-ORE specifications. Interested readers, especially those intending to implement these specifications, should obtain further details in the following documents.

Resource Map Implementation in Atom - Describes all features of the Atom serialization of Resource Maps.
Resource Map Implementation in RDF/XML - Describes the recommended RDF/XML syntax for encoding Resource Maps.
Resource Map Implementation in RDFa - Describes how to embed Resource Map triples in xHTML documents, such as human start pages.
HTTP Implementation and Multiple Serializations - Provides the details for implementing the HTTP URI solutions summarized in the section Resource Maps and Aggregations on the Web.
Resource Map Discovery - Describes how to provide crawlers and harvesters information on the existence of Resource Maps and the Aggregations they describe.
Abstract Data Model - Describes all features of the RDF-based ORE Data Model that was summarized in the section ORE in a Nutshell of this document.
Vocabulary - Enumerates all properties and classes used in the ORE Abstract Data Model.

8. References

[Cool URIs]

Cool URIs for the Semantic Web Leo Sauermann, Richard Cyganiak, Max Völkel, 2007-08-09. Available at http://www.dfki.uni-kl.de/~sauermann/2006/11/cooluris/ . Also being developed into a W3C Working Draft available at http://www.w3.org/TR/cooluris/

[Linked Data Tutorial]

How to Publish Linked Data on the Web, Chris Bizer, Richard Cyganiak, Tom Heath, 2007-07-27. Available at http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/20070727/ . Latest version available at http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDataTutorial/ .

[RDFXML]

RDF/XML Syntax Specification (Revised), Dave Beckett and Brian McBrde, Editors. W3C Recommendation, 10 February 2004,
http://www.w3.org/TR/2004/REC-rdf-syntax-grammar-20040210/ .
Latest version available at http://www.w3.org/TR/rdf-syntax-grammar/.

[RDF Concepts]

Resource Description Framework (RDF): Concepts and Abstract Syntax, Graham Klyne and Jeremy J. Carroll, Editors, W3C Recommendation, 10 February 2004, Available at http://www.w3.org/TR/2004/REC-rdf-concepts-20040210/ . Latest version available at http://www.w3.org/TR/rdf-concepts/ .

[RFC2616]

RFC2616: Hypertext Transfer Protocol - HTTP/1.1, J. Gettys, J. Mogul, H. Frystyk, L. Masinter, P. Leach, T. Berners-Lee, IETF RFC2616, June 1999. Available at http://www.ietf.org/rfc/rfc2616.txt

[RFC3986]

RFC3986: Uniform Resource Identifier (URI): Generic Syntax, T. Berners-Lee, R. Fielding, L. Masinter, IETF RFC3986, January 2005. Available at http://www.ietf.org/rfc/rfc3986.txt

[Web Architecture]

Architecture of the World Wide Web, Volume One, I. Jacobs and N. Walsh, Editors, World Wide Web Consortium, 15 January 2004. Available at http://www.w3.org/TR/webarch/ .

Use of this page is tracked to collect anonymous traffic data. See OAI privacy policy.