The Aardvark Speaks : essence, effervescence, obscurity. Established 2002. A weblog by Horst Prillinger. ISSN 1726-5320


January 03, 2005

Using Dublin Core in RSS feeds

Revised and updated 5 January 2005.

Warning: This is a lengthy and highly technical post. If you are not interested in RSS and/or metadata standards, you can safely ignore it.

Ever since the advent of RSS 1.0 (a.k.a. RDF), Dublin Core elements have been used in RSS feeds as metadata descriptors. They even turned up in Movable Type's very own implementation of RSS 2.0, before dying a quick death with the arrival of Atom.

Due to Atom, much of what I am going to say here may seem obsolete; still it seems important to talk about the use of Dublin Core (DC) with weblogs in general and RSS feeds in particular, as it could be useful for the scaleabilty and interchangeability of weblog content, and point out the following facts:

  1. DC would be useful for weblog description;
  2. DC is possibly problematic to implement with weblogs;
  3. DC needs to be implemented correctly, or not at all;
  4. RSS Feed readers should be able to parse DC correctly.

Much of what I'm writing here comes from the implementation of DC where I work and from my experiences in implementing DC in the RSS 1.0 feed of my other weblog The Evil Empire. Here are my observations and humble opinions:

Dave Winer's RSS 2.0 standard is very strict about what its tags denote; therefore, and for the sake of clarity, I will compare the DC terms with the respective tags of RSS 2.0. I would also like to point out that Movable Type's implementation of DC is faulty and not recommended as a model, as use of incorrect DC elements undermines the standard. If you are using the Movable Type RSS 2.0 feed, get rid of it now and use the standard instead.

Namespace

First of all, it should be obvious that you must of course include the correct namespace. For RSS 1.0, it looks like this:

<RDF:rdf xmlns:dc="http://purl.org/dc/elements/1.1/">

Or, in RSS 2.0:

<rss version="2.0" xmlns:dc="http://purl.org/dc/elements/1.1/">

Title

<dc:title> is really synonymous with <title> in Winer's RSS 2.0. Within <channel> it denotes the title of the weblog; within <item> it denotes the title of the posting.

<dc:title>Using Dublin Core in RSS feeds</dc:title>

Creator

<dc:creator> is much more problematic. According to the DC specs, it denotes "An entity primarily responsible for making the content of the resource". The key term here is content. In the context of a weblog, this can mean two things:

  • If you are writing original weblog entries, then you are <dc:creator>.
  • However, if you are merely linking to an article elsewhere and are not adding significant material to the link (e.g. on a linkblog), then <dc:creator> is always the author of the original article.

With some linkblogs, it may be hard to decide which of the two to pick — you have to decide whether your entry is mostly original writing or mostly referring. I also know no weblog software that handles this correctly — usually simply the name of the weblog author, i.e. you, is inserted (it's difficult to implement; basically you'd need some kind of field to enter the Creator's name if it's not you, or your news aggregator would have to hand the original author on to your blogging software). In the case of The Evil Empire, which is a very strict linkblog and only uses text from the original sources, using my name as <dc:creator> would certainly not be correct. With a hack, I managed to tweak the template so that <dc:creator> always refers to the original author.

<dc:title>Using Dublin Core in RSS feeds</dc:title>
<dc:creator>Horst Prillinger</dc:creator>

However, even though the following posting appeared on my weblog, I am not automatically its Creator, because in this case I didn't write it:

<dc:title>Microsoft Internet Explorer XP SP2 Fully Automated Remote Compromise</dc:title>
<dc:creator>Michael Evanchik</dc:creator>

Ideally, the names used for <dc:creator> should be normative and taken from an authoritative thesaurus, such as LoC-NA or PND. Unlike <author> in Winer's RSS, the name rather than the e-mail address of the author is required.

Don't think that you are automatically <dc:creator> for all your weblog entries simply because this is your weblog. According to the DC specs, you are not. See also the notes on Publisher and Contributor below.

Subject

DC defines <dc:subject> as "A topic of the content of the resource ... [ideally] a value from a controlled vocabulary or formal classification scheme". This means that the standard practice in RSS 1.0 and MT's RSS 2.0 to use it for weblog categories is wrong — unless you organise your categories using LCSH or SWD (or even DDC, if you feel so inclined), none of which I've ever seen on any weblog so far.

This means that <dc:subject> makes sense as part of the <channel> description for topical weblogs with a strong thematic focus, such as

<dc:title>The Evil Empire</dc:title>
<dc:subject>Microsoft Corporation</dc:subject>

You would, however, need to be very systematic if you use it within <item>, although it is possible, I suppose. For example, using an LCSH heading is possible on this entry:

<dc:title>Microsoft Internet Explorer XP SP2 Fully Automated Remote Compromise</dc:title>
<dc:subject>Microsoft Internet explorer</dc:subject>

But do bear in mind that <dc:subject> is not at all synonymous with <category> in Winer's RSS 2.0. If you do not name your categories according to LCSH or SWD, use <category> instead.

Description

<dc:description> can include any free-text account of the content of the resource, ideally a summary, but there are no real restrictions here — you can also include the full text. This makes it synonymous with <description> in Winer's RSS 2.0 and with both(!) <summary> and <content> in Atom.

Publisher

<dc:publisher> is the "entity responsible for making the resource available". Notice the difference from <dc:creator>: if person A is creating a website and person B writes an article that is published on that website, then person A is <dc:publisher>, and person B is <dc:creator>. In terms of implementation, this is usually easy: since in most cases you are the one publishing your weblog, this is you.

<dc:title>Microsoft Internet Explorer XP SP2 Fully Automated Remote Compromise</dc:title>
<dc:creator>Michael Evanchik</dc:creator>
<dc:publisher>Horst Prillinger</dc:publisher>

There is no similar term in Winer's RSS 2.0, perhaps <managingEditor> is most closely related.

Contributor

<dc:contributor> is used for somebody "making contributions to the content of the resource". This is probably rarely used with weblog entries, where each article tends to have its own clear-cut author, but it can make sense for the channel description of multi-author weblogs:

<dc:title>The Aardvark Speaks</dc:title>
<dc:creator>Horst Prillinger</dc:creator>
<dc:contributor>Haldur Gislufsson</dc:contributor>

Notice the difference from <dc:creator> and <dc:publisher>: the following example is for an entry in Phil Gyford's Samuel Pepys diary weblog, which contains Pepys' text in the translation of Mynors Bright; this entry also contains several annotations by various readers.

<dc:title>Pepys' Diary: Monday 30 December 1661<dc:title>
<dc:creator>Samuel Pepys<dc:creator>
<dc:publisher>Phil Gyford<dc:publisher>
<dc:contributor>Mynors Bright<dc:contributor>
<dc:contributor>Australian Susan<dc:contributor>
<dc:contributor>Alan Bedford<dc:contributor>
<dc:contributor>Stolzi<dc:contributor>
<dc:contributor>vicenzo<dc:contributor>
<dc:contributor>Conrad<dc:contributor>

The distinction between <dc:creator> and <dc:contributor> is to decide what is the main creative work. If person A writes an article and person B takes a few photographs to illustrate the article, then A is <dc:creator> and B is <dc:contributor>. If person B takes a photograph and person A writes a brief explanatory note for it, then B is <dc:creator> and A is <dc:contributor>.

As you can see, this element has a very broad scope — depending on the topic it can even include people who post comments. There is no similar term in Winer's RSS 2.0.

Date

<dc:date> is "associated with the creation or availability of the resource"; this can be the posting date on your weblog, which is the best and easiest way to implement, but, if you are linking to another article elsewhere, can also be that article's date. Best practice, and implemented correctly in the MT templates, is to use W3CDTF (ISO 8601).

<dc:date>2004-12-28T11:42:40+01:00</dc:date>

This element is somewhat broader than <pubDate> Winer's RSS 2.0, but it can be used in the same manner. Notice, however, that <pubDate> uses a specific date format that is not W3CDTF, whereas <dc:date> should preferably use W3CDTF, but can also use other formats.

This is not to be confused with Coverage (see below).

Type

Not implemented in any template that I know of, although this could potentially be very useful. <dc:type> describes "nature or genre of the content of the resource". DC suggests a specific Type vocabulary (DCT1). For original articles, <dc:type> will probably always be Text, but for audioblogs, it may well be Sound, for photoblogs StillImage, and for videoblogs MovingImage. For links to other online rescources, almost any other Type is possible, depending on what you are linking to.

<dc:title>Using Dublin Core in RSS feeds</dc:title>
<dc:creator>Horst Prillinger</dc:creator>
<dc:type>Text</dc:type>

<dc:title>Way to Go</dc:title>
<dc:creator>Horst Prillinger</dc:creator>
<dc:type>MovingImage</dc:type>

This is not to be confused with Format (see below).

Format

<dc:format> is used for the "physical or digital manifestation of the resource"; this is simplified in the context of weblogs insofar as we are talking almost exclusively about online resources, and we can thus simply use the Internet Media Types (MIME). Again, this is not implemented in any RSS template that I know of.

<dc:title>Using Dublin Core in RSS feeds</dc:title>
<dc:creator>Horst Prillinger</dc:creator>
<dc:type>Text</dc:type>
<dc:format>text/html</dc:format>

<dc:title>Way to Go</dc:title>
<dc:creator>Horst Prillinger</dc:creator>
<dc:type>MovingImage</dc:type>
<dc:format>video/quicktime</dc:format>

As with <dc:type>, <dc:format> depends on the content of the weblog entry or the linked resource and may therefore be problematic to implement.

This is not to be confused with Type (see above).

Identifier

<dc:identifier> is the "unambiguous reference to the resource within a given context". Since the "given context" is your weblog, this makes it synonymous to <guid> in Winer's RSS 2.0. This means that if your permalinks are permanent and unambiguous (i.e. each item can be found via a unique URL), it is safe to use your permalink here.

<dc:title>Using Dublin Core in RSS feeds</dc:title>
<dc:identifier>http://www.aardvark.at/blog/archives/2005/01/000922.html</dc:identifier>

See also the entry on Source below.

Source

<dc:source> comes in whenever a weblog entry is not entirely original. According to the DC definition, it is a "Reference to a resource from which the present resource is derived".

Basically, it is needed whenever your weblog entries are "derived from" some other resource (rather than being original entries, or new entries that are merely "based on" other resources). Sometimes this distinction may be hard to make; best practice is to include <dc:source> in case of doubt. With linkblogs, this is always the URL of the original article. Notice the difference between <dc:source> and <dc:identifier>:

<dc:title>Microsoft Internet Explorer XP SP2 Fully Automated Remote Compromise</dc:title>
<dc:identifier>http://www.aardvark.at/evil-empire/archives/2004_12.html#000919</dc:identifier>
<dc:source>http://freehost07.websamba.com/greyhats/mirror/sp2rc-analysis.htm</dc:source>

Winer's RSS 2.0 has <source>, which is similar, but contains the name rather than the URL of the source.

<dc:identifier> is the local identifier of the current weblog entry; <dc:source> shows where the material in the current entry came from.

This is similar to, but somewhat stricter than <link> in Winer's RSS 2.0. Whereas <link> contains any article that you link to, <dc:source> should be used both for an article that you link to and for an article that your current article is derived from. This means that you will have to use <dc:source> more often than <link>.

Language

<dc:language> denotes the language of the content according to RFC3066, which itself is based on ISO 639. This can be done on the <channel> level if the entire weblog is in the same language (which is usually implemented correctly in most default feeds), or on the <item> level in the case of a multilingual weblog (which is, sadly, not really implemented anywhere).

<dc:title>The Aardvark Speaks</dc:title>
<dc:language>en-gb</dc:language>

This is synonymous with <language> in Winer's RSS 2.0.

Relation

<dc:relation> is used for references "to a related resource". In a weblog context, this is the perfect place for incoming Trackback URLs.

<dc:title>Why Wikipedia sucks. Big time.</dc:title>
<dc:relation>http://www.worldwideklein.de/index.php?/weblog/why-wikipedia-sucks/</dc:relation>
<dc:relation>http://revirement.de/weblog/index.php?p=1403</dc:relation>
<dc:relation>http://home.planet.nl/~nhavd/clog/2004/06/02.htm#a1125</dc:relation>
<dc:relation>http://vowe.net/archives/004590.html</dc:relation>
<dc:relation>http://www.itst.org/web/why_wikipedia_sucks_big_time.shtml</dc:relation>
<dc:relation>http://213.225.30.218/archives/000242.html</dc:relation>
<dc:relation>http://blog.schockwellenreiter.de/3791</dc:relation>
<dc:relation>http://weblog.plasticthinking.org/item/2757</dc:relation>
<dc:relation>http://www.irox.de/roxomatic/277/wikipedias-in-der-kritik</dc:relation>
<dc:relation>http://en.wikipedia.org/wiki/Wikipedia:Village_Pump</dc:relation>

In previous versions of RSS, this was implemented via the <trackback:about> model. As you can see, simple DC would have sufficed.

Of course you can also use <dc:relation> to manually add URLs to web pages that you consider of related interest. Notice that there is a difference between <dc:relation> (related content) and <dc:source> (related content that was the basis for your entry) — see the entry on Source above.

Coverage

<dc:coverage> is used for the spatial or temporal "extent or scope of the content of the resource", ideally using terms from a controlled vocabulary such as the TGN for places and W3CDTF for dates.

It is perhaps most useful for weblogs with a specific geographic and/or historical focus.

<dc:title>Pepys' Diary: Thursday 2 January 1661/62</dc:title>
<dc:coverage>London</dc:coverage>
<dc:coverage>1662-01-02</dc:coverage>

<dc:title>Going Underground's Blog</dc:title>
<dc:coverage>London</dc:coverage>

Notice that there is a difference between <dc:date>, which is about when the entry was made available, and <dc:coverage>, which is about the time covered by the entry. So if I publish an article about what I did on New Year's Eve a couple of days later, it looks like this:

<dc:title>What I did on New Year's Eve</dc:title>
<dc:date>2005-01-03T12:29:04+01:00</dc:date>
<dc:coverage>2004-12-31</dc:coverage>

One other use that comes to mind is for monthly or weekly weblog archives, although that would probaly mostly apply to web pages, and not RSS feeds.

<meta name="DC.Title" content="The Evil Empire - August 2004 Archive" />
<meta name="DC.Coverage" content="2004-08" />

(For further details on including DC elements in meta tags of web pages see below.)

Rights

<dc:rights> is used for any "Information about rights held in and over the resource". This works on the <channel> level as well as the <item> level, but may be harder to implement on the latter if no distinction is made between <dc:creator> and <dc:publisher>, as you — the publisher — do not automatically own the rights to an article if you are not also the creator.

<dc:title>Microsoft Internet Explorer XP SP2 Fully Automated Remote Compromise</dc:title>
<dc:creator>Michael Evanchik</dc:creator>
<dc:rights>Copyright 2004 by Michael Evanchik</dc:rights>

This is synonymous with <copyright> in Winer's RSS 2.0.

DC in meta tags

Apart from RSS feeds, DC elements can also be included in meta tags of web pages. This is probably only useful if you are generating a separate web page for each individual weblog entry, and can be a real pain to do correctly as most weblog software will not allow you to easily create all of these meta tags without further, often complicated, hacks. So merely to show you what it could be like, here's what a full set of DC meta tags for this page, if it existed, would look like:

<meta name="DC.Title" content="Using Dublin Core in RSS feeds" />
<meta name="DC.Creator" scheme="PND" content="Prillinger, Horst" />
<meta name="DC.Subject" scheme="LCSH" content="Weblogs" />
<meta name="DC.Subject" scheme="LCSH" content="Dublin Core" />
<meta name="DC.Description" content="Ever since the advent of RSS 1.0 (a.k.a. RDF), Dublin Core elements have been used in RSS feeds as metadata descriptors. They even turned up in Movable Type's very own implementation of RSS 2.0, before dying a quick death with the arrival of Atom. Due to Atom, much of what I am going to say here may seem obsolete; still it seems important to talk about the use of Dublin Core (DC) with weblogs in general and RSS feeds in particular, as it could be useful for the scaleabilty and interchangeability of weblog content." />
<meta name="DC.Publisher" content="Horst Prillinger" />
<meta name="DC.Date" scheme="W3CDTF" content="2005-01-03T13:37:01" />
<meta name="DC.Type" scheme="DCT1" content="Text" />
<meta name="DC.Format" scheme="IMT" content="text/html" />
<meta name="DC.Identifier" scheme="URI" content="http://www.aardvark.at/blog/archives/2005/01/000922.html" />
<meta name="DC.Source" scheme="URI" content="http://dublincore.org/documents/dces/" />
<meta name="DC.Language" scheme="RFC3066" content="en-gb" />
<meta name="DC.Relation" scheme="URI" content="http://blog.schockwellenreiter.de/7571" />
<meta name="DC.Rights" content="Copyright 2005 by Horst Prillinger" />
<link rel="schema.dc" href="http://purl.org/DC/elements/1.1/" title="Dublin Core" />

Explanation: DC.Title: the title of the weblog entry. - DC.Creator: the author of the text, spelt according to the authoritative heading in PND. - DC.Subject: two subject headings according to LCSH. - DC.Description: a summary of the text. - DC.Publisher: the person who runs the weblog. - DC.Date: date of publication on the weblog, formatted according to W3CDTF. - DC.Type: resource type according to DCT1. - DC.Format: Internet Media Type (IMT) of the online resource. - DC.Identifier: the local permalink URI of the weblog entry. - DC.Source: included because this article could be seenas a reinterpretation of DC for weblogs, hence the URI of that page. - DC.Language: language of the text according to RFC3066. - DC.Relation: the URI of a website that sent a Trackback ping to this entry. - DC.Rights: Copyright notice. - DC.Contributor and DC.Coverage do not apply and were thus left out. The final <link rel> points to the DC element set for reference purposes.

To demonstrate the use of all DC elements in the description of a weblog entry, I made a sample description for an entry in Phil Gyford's Samuel Pepys diary weblog:

<meta name="DC.Title" content="Pepys' Diary: Monday 30 December 1661" />
<meta name="DC.Creator" scheme="LoC-NA" content="Pepys, Samuel (1633-1703)" />
<meta name="DC.Subject" scheme="LCSH" content="Pepys, Samuel, 1633-1703 -- Diaries" />
<meta name="DC.Subject" scheme="LCSH" content="Cabinet officers -- Great Britain -- Diaries" />
<meta name="DC.Subject" scheme="LCSH" content="Great Britain -- Social life and customs -- 17th century -- Sources" />
<meta name="DC.Subject" scheme="LCSH" content="Great Britain -- History -- Charles II, 1660-1685 -- Sources" />
<meta name="DC.Subject" scheme="DDC21" content="941.066092" />
<meta name="DC.Description" content="At the office about this estimate and so with my wife and Sir W. Pen to see our pictures, which do not much displease us, and so back again, and I staid at the Mitre, whither I had invited all my old acquaintance of the Exchequer to a good chine of beef..." />
<meta name="DC.Publisher" content="Phil Gyford" />
<meta name="DC.Contributor" scheme="LoC-NA" content="Bright, Mynors (1818-1883)" />
<meta name="DC.Date" scheme="W3CDTF" content="2004-12-30" />
<meta name="DC.Type" scheme="DCT1" content="Text" />
<meta name="DC.Format" scheme="IMT" content="text/html" />
<meta name="DC.Identifier" scheme="URI" content="http://www.pepysdiary.com/archive/1661/12/30/index.php" />
<meta name="DC.Source" scheme="URI" content="http://www.gutenberg.org/etext/4130" />
<meta name="DC.Language" scheme="RFC3066" content="en" />
<meta name="DC.Relation" scheme="URI" content="http://blogs.msdn.com/mcreasy/archive/2004/12/31/344960.aspx" />
<meta name="DC.Coverage" scheme="TGN" content="London" />
<meta name="DC.Coverage" scheme="W3CDTF" content="1661-12-30" />
<meta name="DC.Rights" content="The main diary entries, the footnotes in the right-hand sidebar, the text in the Diary Introduction section, and the main text on the People and Places pages are taken from the Project Gutenberg version of Pepysĺ diary and as such are free of copyright restrictions. All annotations added by users in the Diary section (attached to the diary entries and People and Places pages) and the rest of the site are available under a Creative Commons Attribution-NonCommercial-ShareAlike license. Any material posted in the annotations by users that is quoted from elsewhere retains its original copyright status." />
<link rel="schema.dc" href="http://purl.org/DC/elements/1.1/" title="Dublin Core" />

Explanation: DC.Title: the title of the weblog entry. - DC.Creator: the author of the original text, spelt according to the authoritative heading in LoC-NA. - DC.Subject: several subject headings according to LCSH, one according to DDC. - DC.Description: a brief excerpt from the text. - DC.Publisher: the person who runs the weblog. - DC.Contributor: the person who translated the diary from Pepys' secret script into English, spelt according to the heading in LoC-NA. - DC.Date: date of publication on the weblog, formatted according to W3CDTF. - DC.Type: resource type according to DCT1. - DC.Format: Internet Media Type (IMT) of the online resource. - DC.Identifier: the local permalink URI of the weblog entry. - DC.Source: the URI where the original text is located. - DC.Language: language of the text according to RFC3066. - DC.Relation: the URI of a website that sent a Trackback ping to this entry. - DC.Coverage: the covered place spelt according to TGN, the covered time formatted according to W3CDTF. - DC.Rights: Copyright notice from the weblog.

The problem?

The main problem why most of the 15 DC elements have not been properly implemented in weblogs, neither in RSS feeds nor in meta tags, is that there is no weblog software which offers enough fields to enter all the necessary metadata (or is intelligent enough to create at least some of them automatically), and even if there was one that did, I cannot think of a user interface that would not confuse the average user — people who don't know that such a thing as Dublin Core even exists.

Conclusion

Why talk about Dublin Core now that everybody is using Atom anyway?

Because many people still use RSS feeds with incorrect implementations of Dublin Core. Because DC would have provided a standardised, useful vocabulary for RSS feeds if anyone had cared to listen and pay attention rather than cook their own flavours of RSS, which are now all becoming obsolete. Because Dublin Core is a widely and extensively used standard for metadata and applying it to weblogs might have been useful. Because far too few people know about it at all.

Should you implement DC in your feed(s)?

No. I'm just pointing out that it's possible and what it would look like.

Should you include DC elements in the meta tags of your weblog pages?

No. Of course, in an ideal world, every web page would use DC meta tags. But then this is no ideal world, so you don't have to use them.

Posted by Horst on January 3, 2005 01:37 PM to metablogging | Tell-a-friend
Trackbacks


We received this ping from Der Schockwellenreiter on January 4, 2005 08:54 AM:

I love RSS: Und Horst hat einen wunderbaren Artikel geschrieben: Using Dublin Core in RSS feeds. Danach d Ř rften eigentlich keine Fragen mehr offen sein. Au ▀ er: RDF (und damit auch Dublin Core) funktioniert nur mit RSS 1.0, Podcasting nur mit RSS 2.0 (berichtwe... [more]

Comments
gibarian said on January 6, 2005 07:24 PM:

Horst, how come you've got such an in-depth knowledge of these matters? Aren't you an English lit major turned librarian? Or is knowledge about Dublin Core mandatory for a librarian? Or am I simply way off track and you're actually a webdeveloper and I should have read your whole weblog archives before posting that comment? I hope not.

Horst said on January 8, 2005 06:02 PM:

Dublin Core is more common with archives that with libraries, but yes, it is part of our training, and it is also something I'm professionally and semi-professionally interested in.

Joanne Harrington said on June 1, 2005 10:28 PM:

My project team and I are going to use RSS to enable the get and pull of government and NGO links among Web sites that are part of what is called the Collaborative Seniors' Portal Network.
This is our way of making No Wrong Door happen so that consistent search results related mostly to government programs and services are presented to visitors to any member of the CSPN. Sounds like a simple thing to accomplish but it isn't when you are working with partners across different governments and NGO's. Our consultant David Megginson - who you may have heard about recommended RSS and now - finally the point of my comment is that for years the federal government standard has been to employ DC elements to meta data and we are going to be working with our partners at the provincial and community level to adopt a standard use of DC elements.
I am not an IM expert- I'm one of those odd program owners that believes that just like any good mechanic will say as a car owner you have to know what is going on under the hood - at least to some degree and that is how I view IM - a little bit of knowledge about IM is a GOOD thing.
Found your blog very easy to read and understandable. Thank you.

Web Design Bulgaria said on December 5, 2007 04:18 PM:

Hi Horst,
thanks for all that useful information, however can you tell us is an RSS feed containing a DC items is useful for the average RSS user or it's useful only for libraries ?

Comments have been closed for this entry.


© Copyright 2002-2008 Horst Prillinger, 

Most of the stuff on this page is fiction. Everything else is my private opinion. Please read the disclaimer.

Valid XHTML 1.0! Powered by Movable Type Made with a Mac