eFoundations

Condividi contenuti
Aggiornato: 5 min 22 sec fa

legislation.gov.uk

7 ore 29 min fa

I woke up this morning to find a very excited flurry of posts in my Twitter stream pointing to the launch by the UK National Archives of the legislation.gov.uk site, which provides access to all UK legislation, including revisions made over time. A post on the data.gov.uk blog provides some of the technical background and highlights the ways in which the data is made available in machine-processable forms. Full details are provided in the "Developer Zone" documents.

I don't for a second pretend to have absorbed all the detail of what is available, so I'll just highlight a couple of points.

First and foremost, this is being delivered with an eye firmly on the Linked Data principles. From the blog post I mentioned above:

For the web architecturally minded, there are three types of URI for legislation on legislation.gov.uk. These are identifier URIs, document URIs and representation URIs. Identifier URIs are of the form http://www.legislation.gov.uk/id/{type}/{year}/{number} and are used to denote the abstract concept of a piece of legislation - the notion of how it was, how it is and how it will be. These identifier URIs are designed to support the use of legislation as part of the web of Linked Data. Document URIs are for the document. Representation URIs are for the different types of possible rendition of the document, so htm, pdf or xml.

(Aside: I admit to a certain squeamishness about the notion of "representation URIs" and I kinda prefer to think in terms of URIs for Generic Documents and for Specific Documents, along the lines described by Tim Berners-Lee in his "Generic Resources" note, but that's a minor niggle of terminology on my part, and not at all a disagreement with the model.)

A second aspect I wanted to highlight (given some of my (now slightly distant) past interests) is that, on looking at the RDF data (e.g. http://www.legislation.gov.uk/ukpga/2010/24/contents/data.rdf), I noticed that it appears to make use of a FRBR-based model to deal with the challenge of representing the various flavours of "versioning" relationships.

I haven't had time to look in any detail at the implementation, other than to observe that the data can get quite complex - necessarily so - when dealing with a lot of whole-part and revision-of/variant-of/format-of relationships. (There was one aspect where I wondered if the FRBR concepts were being "stretched" somewhat, but I'm writing in haste and I may well be misreading/misinterpreting the data, so I'll save that question for another day.)

It's fascinating to see the FRBR approach being deployed as a practical solution to a concrete problem, outside of the library community in which it originated.

Pretty cool stuff, and congratulations to all involved in providing it. I look forward to seeing how the data is used.

Categorie: LIS, stranieri

Getting techie... what questions should we be asking of publishers?

Mer, 21/07/2010 - 12:31

The Licence Negotiation team here are thinking about the kinds of technical questions they should be asking publishers and other content providers as part of their negotiations with them. The aim isn't to embed the answers to those questions in contractual clauses - rather, it is to build up a useful knowledge base of surrounding information that may be useful to institutions and others who are thinking about taking up a particular agreement.

My 'starter for 10' set of questions goes like this:

  • Do you make any commitment to the persistence of the URLs for your published content? If so, please give details. Do you assign DOIs to your published content? Are you members of CrossRef?
  • Do you support a search API? If so, what standard(s) do you support?
  • Do you support a metadata harvesting API? If so, what standard(s) do you support?
  • Do you expose RSS and/or Atom feeds for your content? If so, please describe what feeds you offer?
  • Do you expose any form of Linked Data about your published content? If so, please give details.
  • Do you generate OpenURLs as part of your web interface? Do you have a documented means of linking to your content based on bibliographic metadata fields? If so, please give details.
  • Do you support SAML (Service Provider) as a means of controlling access to your content? If so, which version? Are you a member of the UK Access Management Federation? If you also support other methods of access control, please give details.
  • Do you grant permission for the preservation of your content using LOCKSS, CLOCKSS and/or PORTICO? If so, please give details.
  • Do you have a statement about your support for the Web Accessibility Initiative (WAI)? If so, please give details?

Does this look like a reasonable and sensible set of questions for us to be asking of publishers? What have I missed? Something about open access perhaps?

Categorie: LIS, stranieri

SLOODLE gets further funding from the JISC

Mar, 20/07/2010 - 11:14

I don't do much thinking about 3D virtual worlds these days but it's good to see the recent announcement by one of our early Second Life projects, SLOODLE, that they have been awarded a Learning & Teaching Innovation Grant from the JISC:

The year long project on Supporting Education in Virtual Worlds with Virtual Learning Environments will conduct pilots at each participating institution and will explore how web-based learning environments (esp. Moodle) can effectively support and enhance learning in virtual worlds.
Categorie: LIS, stranieri

Finding e-books - a discovery to delivery problem

Ven, 16/07/2010 - 15:43

Some of you will know that we recently ran a quick survey of academic e-book usage in the UK - I hope to be able to report on the findings here shortly. One of the things that we didn't ask about in the survey but that has come up anecdotally in our discussions with librarians is the ease (or not) with which it is possible to find out if a particular e-book title is available.

A typical scenario goes like this. "Lecturer adds an entry for a physical book to a course reading list. Librarian checks the list and wants to know if there is an e-book edition of the book, in order to offer alternatives to the students on that course". Problemo. Having briefly asked around, it seems (somewhat surprisingly?) that there is no easy solution to this problem.

If we assume that the librarian in question knows the ISBN of the physical book, what can be done to try and ease the situation? Note that in asking this question I'm conveniently ignoring the looming, and potentially rather massive, issue around "what the hell is an e-book anyway?" and "how are we going to assign identifiers to them once we've worked out what they are?" :-). For some discussion around this see Eric Hellman's recent piece, What IS an eBook, anyway?

But, let's ignore that for now... we know that OCLC's xISBN service allows us to navigate different editions of the same book (I'm desperately trying not to drop into FRBR-speak here). Taking a quick look at the API documentation for xISBN yesterday, I noticed that the metadata returned for each ISBN can include both the fact that something is a 'Book' and that it is 'Digital' (form == 'BA' && form == 'DA') - that sounds like the working definition of an e-book to me (at least for the time being) - as well as listing the ISBNs for all the other editions/formats of the same book. So I knocked together a quick demonstrator. The result is e-Book Finder and you are welcome to have a play. To get you started, here are a couple of examples:

Of course, because e-Book Finder is based on xISBN, which is in turn based on WorldCat, you can only use it to find e-books that are listed in the catalogues of WorldCat member libraries (but I'm assuming that is a big enough set of libraries that the coverage is pretty good). Perhaps more importantly, it also only represents the first stage of the problem. It allows you to 'discover' that an e-book exists - but it doesn't get the thing 'delivered' to you.

Wouldn't it be nice if e-Book Finder could also answer questions like, "is this e-book covered by my existing institutional subscriptions?", "can I set up a new institutional subscription that would cover this e-book?" or simply "can I buy a one-off copy of this e-book?". It turns out that this is a pretty hard problem. My Licence Negotiation colleagues at Eduserv suggested doing some kind of search against myilibrary, dawsonera, Amazon, eBrary, eblib and SafariBooksOnline. The bad news is that (as far as I can tell), of those, only Amazon and SafariBooksOnline allow users to search their content before making them sign in and only Amazon offer an API. (I'm not sure why anyone would design a website that has the sole purpose of selling stuff such that people have to sign in before they can find out what is on offer, nor why that information isn't available in a openly machine-readable form but anyway...). So in this case, moving from discovery to delivery looks to be non-trivial. Shame. Even if each of these e-book 'aggregators' simply offered a list1 of the ISBNs of all the e-books they make available, it would be a step in the right direction.

On the other hand, maybe just pushing the question to the institutional OpenURL resolver would help answer these questions. Any suggestions for how things could be improved?

1. It's a list so that means RSS or Atom, right?
Categorie: LIS, stranieri

Going LOCAH: a Linked Data project for JISC

Gio, 08/07/2010 - 12:24

Recently I worked with Adrian Stevenson of UKOLN and Jane Stevenson and Joy Palmer of MIMAS, University of Manchester on a bid for a project under the JISC O2/10 call, Deposit of research outputs and Exposing digital content for education and research, and I'm very pleased to be able to say that the proposal has been accepted and the project has been funded.

The project is called "Linked Open Copac Archives Hub" (LOCAH). It aims to address the "expose" section of the call, and focuses on making available data hosted by the Copac and Archives Hub services hosted by MIMAS - i.e. library catalogue data and data from archival finding aids - in the form of Linked Data; developing some prototype applications illustrating the use of that data; and analysing some of the issues arising from that work. The main partners in the work are UKOLN and MIMAS, with contributions from Eduserv, OCLC and Talis. The Eduserv contribution will take the form of some input from me, probably mostly in the area of working with Jane on modelling some of the archival finding aid data, currently held in the form of EAD-encoded XML documents, so that it can be represented in RDF - though I imagine I'll be sticking my oar in on various other aspects along the way.

UKOLN is managing the project and hosting a project weblog. I'm not sure at the moment how I'll divide up thoughts between here and there; I'll probably end up with a bit of duplication along the way.

Categorie: LIS, stranieri

On federated access management, usability and discovery

Mer, 07/07/2010 - 13:10

A little over a week ago I attended a meeting in London organised by the JISC Collections team entitled From discovery to log-in and use: a workshop for publishers, content owners and service providers.

The meeting was targetted at academic publishers (and other service providers), of whom there were between 30 and 40 in the room. It started with presentations about two reports, the first by William Wong et al (Middlesex University), User Behaviour in Resource Discovery: Final Report, the second by Rhys Smith (Cardiff University), JISC Service Provider Interface Study. Both reports are worth reading, though, as I noted somewhat cheekily on Twitter prior to the meeting, if the JISC had paid more for the first one it might have been shorter!

Anyway... the eagle-eyed amongst you will have noticed that the two reports are somewhat different in scope and scale. Both talk about 'discovery' but the first uses that word in a very broad 'resource discovery' sense whilst the second uses it in the context of the 'discovery problem' as it applies to federated access management - i.e. the problem of how a 'service provider' knows which institutional login page to send the user to when they want to access their site. This difference in focus left me thinking that the day overall was a little out of balance.

For this blog post I don't intend to say anything more about 'resource discovery' in its wider sense, other than to note that Lorcan Dempsey has been writing some interesting stuff about this topic recently, that there are issues about SEO and how publishers of paid-for academic content can best interact with services like Google that could usefully be discussed somewhere (though they weren't discussed at this particular meeting), and that, in my humble opinion, any approach to resource discovery that assumes that institutions can dictate or control which service(s) the end-user is going to use to discover stuff is pretty much doomed from the start. On that basis, I'm not a big believer in library (or any other kind of) portals, nor in any architectural approach that assumes that a particular portal is what the user wants to use!

The two initial presentations were followed by a talk about the 'business case' for an 'EduID' brand - essentially a logo and/or button signifying to the user that they are about to undertake an 'academic federated login' (as opposed to an OpenID login, a Facebook Connect login, a Google login, or whatever else). Such a brand was one of the recommendations coming out of the Cardiff study. I fundamentally disagree with this approach (though I struggled to put my case across on the day). I'm not convinced that we have a 'branding' problem here and I'm worried that the way this work was presented makes it look as though the decision that we need a new 'brand' has already been taken.

During the ensuing discussion about the 'discovery problem' I mentioned the work of the Kantara Initiative and, in particular, the ULX group which is developing a series of recommendations about how the federated access management user experience should be presented to users. I think this group is coming up with a very sensible set of pragmatic recommendations and I think we need to collectively sit up and take some notice and/or get involved. Unfortunately, when I mentioned the initiative at the meeting, it appeared that the bulk of the publishers in the room were not aware of it.

To try and marshal my thoughts a little bit around the Kantara work I decided to try and implement a working demo based on their recommendations. I took as my starting point a fictitious academic service called EduStuff with a requirement to offer three login routes:

  • for UK university students and staff via the UK Federation,
  • for NHS staff via Athens, and
  • for other users via a local EduStuff login.

I'm assuming that this is a reasonably typical scenario for many academic publishers (with the exception of the UK-only targetting on the academic side of things, something I'll come back to later).

Note that this scenario is narrower than the scope of the Kantara ULX work, which includes things like Facebook Connect, Google, OpenID and so on, so I've had to interpret their recommendations somewhat, rather than implement them in their totality.

You can see the results on the demo site. Note that the site itself does nothing other than to provide a backdrop for demonstrating how the 'sign in' process might look - none of the other links work for example.

The process starts by clicking on the 'Sign in' link at the top right (as per the Kantara recommendations). This generates a pop-up 'sign in' box offering the three options. Institutional accounts are selected using a dynamic JQuery search interface which, once an institution has been selected, takes the user to their institutional login page. (My thanks to Mike Edwards at Eduserv for the original code for this). The NHS Athens option takes the user to an Athens login page. The EduStuff option goes to a fairly typical local login/register page, but one which also carries a warning about using one of the other two account types if that is more appropriate.

Whichever account type is chosen, the selection is remembered in a cookie so that future visits to the pop-up 'sign in' box can offer that as the default (again, as per Kantara).

Have a play and see what you think.

Ok, some thoughts from my perspective...

  • In the more general Kantara scenario, some options (Facebook, Google, OpenID, etc.) are presented using clickable buttons/icons. I haven't done this for my scenario because the text wording felt more helpful to me. If icons were to be used, for example if a publisher wanted to offer a Google-based login, then I would probably present the NHS Athens and EduStuff choices as icons as well.
  • You'll note that the word 'Athens' only appears next to the NHS option. I think that our Athens/OpenAthens branding should become largely invisible to users in the context of the UK Federation - or, to put it another way, one of our current usability problems is that publishers are still presenting Athens as an explicit 'sign in' option when they really do not need to so. In the context of the UK Federation, OpenAthens is just an implementation choice for SAML - users need be no more aware of it than they are of the fact that Apache is being used as the Web server. (The same can be said of Shibboleth of course). Part of our current problem is that we are highlighting the wrong brands - i.e. Shibboleth and OpenAthens/Athens rather than the institution - something that both the JISC and Eduserv have been guilty of encouraging in the past.
  • The institutional search box part of the demo is currently built on UK Federation metadata, so it only offers access to UK institutions. There is no reason why this interface couldn't deal with metadata from multiple federations. Indeed, I see no reason why it wouldn't scale to every institution in the world (with some sensible naming). So although the current demo is UK-specific, I think the approach adopted here can be expanded quite significantly.
  • On that basis, you'll note that there is no need in this interface for an EduID brand/button. Users need only concern themselves with the name of their institution - other brands become largely superficial, except where things like Google, Facebook, OpenID and so on are concerned.
  • I've presented only the front page for the EduStuff site. On the basis that we can't control how users discover stuff, i.e. we have to assume that users might arrive directly at any page of our site as the result of a Google search, the 'sign in' process has to be available on each and every page of the site.
  • Finally, the demo only deals with the usability of the first part of the process. It doesn't consider the usability of the institutional login screen, nor of what happens when the user arrives back at the publisher site after they have successfully (or otherwise) authenticated with their institution. I think there are probably significant usability issues at this point as well - for example, how to best indicate that the user is signed in - but I haven't addressed this as part of the current demo.

I'd be very interested in people's views on this work. It's at a very early stage - I haven't even presented it properly to other Eduserv staff yet - but we have some agreement (internally) that work in this area will likely be of value both to ourselves and our current customers and to the wider community. On that basis, I'm hopeful that we will do more work with this demo:

  • to make it more fully functional, i.e. to complete the round-trip back to the EduStuff site after successful authentication,
  • to make the 'sign in' pop-up into a re-usable 'widget' of some kind,
  • and to experiment with the usability of much larger lists of institutions, taken from multiple federations.

Whatever our conclusions, any results will be shared publicly.

Overall the day was very interesting. I'll leave you with my personal highlight... the point at which one of the (non-publisher) participants said (somewhat naively), "What would it take to make all this [publisher] content available for free? Then we wouldn't need to worry about authentication". Oh boy... there was a collective sharp intake of breath and you could almost hear the tumble-weed blowing for a minute there! :-)

Addendum (8 July 2010): in light of comments below I have re-worked my demo using a more icon-based approach. This is much more in line with the current Kantara ULX mockups (version 4) including the addition of a 'more options'/'less options' toggle on second and subsequent sign ins. Overall, it is, I think, rather better than my initial text-based approach. I stand by my assertion that an EduId button is not required in the 'sign in' process demonstrated here (irrespective of whether the icon-based or text-based approach is used). That said, I'd welcome views on how/where such a button would fit in.

Categorie: LIS, stranieri

Now don't tell me I've nothin' to do

Ven, 02/07/2010 - 09:37

Clay Shirky gave a polished performance at the Watershed in Bristol the other night for his talk, Our Cognitive Surplus: Creativity and Generosity in a Connected Age, given as part of the Bristol Festival of Ideas. One would expect nothing less of course.

The basic premise of the talk was that a combination of free time, talent, goodwill (our 'cognitive surplus') and the social Web are now allowing things to happen in ways that were previously not possible. The talk was peppered with anecdotal evidence for the kinds of changes being wrought by new technology and social media, from struggles for women's rights in India thru to changes of government policy on the environment (specifically car-sharing) in Canada and, yes, even to our use of Lolcats.

The individual examples were all new to me, though I've seen the general theme being covered several times before, using different examples of much the same thing. For me, there was a certain sense of, "Well, yes... but so what?" - perhaps I missed something? - though, oddly, that didn't detract from a very enjoyable evening.

Listening to the talk though did cause me to question my own use of social networks, something that I actually find quite hard to justify in any rational sense.

Here's an example...

For the last 574 days I have taken a photograph every day and put it on Blipfoto.com along with a few words of text. Blipfoto is a photo-blogging site - a social network, at least at the level of the number of "Wow... nice image" type comments that get exchanged, though it probably comes closer to the Lolcats end of the spectrum than the 'changing the planet' end. I probably spend somewhere between 30 minutes and an hour and a half on each photo - by the time I've taken the photo, editied it, uploaded it, written some text and so on. That probably represents something like 400 hours of my life over the last couple of years. Boggle!

To which one might sensibly ask, "Why?". And I don't think I'd be able to give you a coherent answer to such a question.

It's the closest thing I have to an artistic outlet I guess - which is certainly not a bad thing. My photography is getting better... maybe? There's a slight competetive element to it, both in the sense of forcing oneself to do something every day and in the sense of getting good comments and ratings. And there's the "Woo hoo... this is me... I'm over here" type of thing going on as well I suppose (something that is present in all social networks). But beyond that I'm not sure I can offer any rationalisation that will convince either you or me about why I am doing it? I'm certainly not making the world a better place with my time, whereas I could be. I could use that time to be a governor of a school again. Or use it to edit Wikipedia. Or to spend additional time working on my local school's website. Or to campaign on environmental issues. Or any number of other things. I could even do some private consultancy and make some money!

But I don't do any of those things... instead, I spend my time faffing around with a camera and a website in the vain hope of getting one or two positive comments from people that I've never met and who I will probably never meet.

Or as the Statler Brothers put it:

Countin' flowers on the wall
That don't bother me at all
Playin' solitaire till dawn with a deck of fifty-one
Smokin' cigarettes and watchin' Captain Kangaroo
Now don't tell me I've nothin' to do [Photo created using Autostitch on an iPhone 3G]
Categorie: LIS, stranieri

Now don't tell me I've nothin' to do

Ven, 02/07/2010 - 09:37
Countin' flowers on the wall
That don't bother me at all
Playin' solitaire till dawn with a deck of fifty-one
Smokin' cigarettes and watchin' Captain Kangaroo
Now don't tell me I've nothin' to do

Clay Shirky gave a polished performance at the Watershed in Bristol the other night for his talk, Our Cognitive Surplus: Creativity and Generosity in a Connected Age, given as part of the Bristol Festival of Ideas. One would expect nothing less of course.

The basic premise of the talk was that a combination of free time, talent, goodwill (our 'cognitive surplus') and the social Web are now allowing things to happen in ways that were previously not possible. The talk was peppered with anecdotal evidence for the kinds of changes being wrought by new technology and social media, from struggles for women's rights in India thru to changes of government policy on the environment (specifically car-sharing) in Canada and, yes, even to our use of Lolcats.

The individual examples were all new to me, though I've seen the general theme being covered several times before, using different examples of much the same thing. For me, there was a certain sense of, "Well, yes... but so what?". Perhaps I missed something?

Listening to the talk though did cause me to question my own use of social networks, something that I actually find quite hard to justify in any rational sense.

Here's an example...

For the last 574 days I have taken a photograph every day and put it on Blipfoto.com along with a few words of text. Blipfoto is a photo-blogging site - a social network, at least at the level of the number of "Wow... nice image" type comments that get exchanged, though it probably comes closer to the Lolcats end of the spectrum than the 'changing the planet' end. I probably spend somewhere between 30 minutes and an hour and a half on each photo - by the time I've taken the photo, editied it, uploaded it, written some text and so on. That probably represents something like 400 hours of my life over the last couple of years. Boggle!

To which one might sensibly ask, "Why?". And I don't think I'd be able to give you a coherent answer to such a question.

It's the closest thing I have to an artistic outlet I guess - which is certainly not a bad thing. My photography is getting better... maybe? There's a slight competetive element to it, both in the sense of forcing one's self to do something every day and in the sense of getting good comments and ratings. And there's the "Woo hoo... this is me... I'm over here" type of thing going on as well I suppose. But beyond that I'm not sure I can offer any rationalisation that will convince either you or me about why I am doing it? I'm certainly not making the world a better place with my time, whereas I could be. I could use that time to be a governor of a school again. Or use it to edit Wikipedia. Or to spend additional time working on my local school's website. Or to campaign on environmental issues. Or any number of other things. I could even do some private consultancy and make some money!

But I don't do any of those things... instead, I spend my time faffing around with a camera and a website in the vain hope of getting one or two positive comments from people that I've never met and who I will probably never meet.

Or as the Statler Brothers put it, "Now don't tell me I've nothing to do".

Categorie: LIS, stranieri

Where next for resource licensing?

Gio, 17/06/2010 - 15:25

Five hours of presentations and discussion about scholarly resource licensing probably doesn't strike most people as a 'good day out' but, actually, yesterday's joint JIBS/Eduserv Where next for Resource Licensing? event was a surprisingly enjoyable and interesting experience.

My live-blogged notes of all the talks are available on eFoundations LiveWire. On that basis, I won't go into the details of any of the talks here. Rather, I'll focus on my overall impressions and thoughts (all of which is very much a personal view)...

Firstly, the academic landscape is changing, both in terms of student expectations and in terms of the nature of university 'business' practice (e.g. greater intra-UK and international collaboration around course delivery). A number of the talks provided evidence for this. Now, of course, we already knew that the landscape was changing... but it doesn't do any harm to keep reminding ourselves of how (and how much) and it was particularly pleasing (for me) to see Owen Stephens (who gave the opening keynote) quoting a couple of the speakers (Paul Golding and Chris Sexton) at our recent symposium by way of evidence.

Secondly, there is something of a tension between wanting to grow the complexity of our resource licences (to take account of newly emerging business practices and user groups for example) and the desire to consolidate, and indeed grow, our existing use of a small number of 'model' licences. (Clearly, this is an area in which the Eduserv Licence Negotiation team has had a big impact over the last 10 to 15 years). In theory, the emerging technical possibility for machine-readable licences (Mark Bide of EDItEUR gave an interesting talk about ONIX-PL for example) means that we can leave software to deal with making access decisions based on a growing collection of different licences. Yet there seemed to be little appetite for this in the room. (Indeed, I'm not even sure such a scenario is really possible or effective for a variety of reasons). As a counterpoint, my colleague Martyn Jansen put forward some suggestions in the final talk of the day to simplify the existing standard Chest Agreement, both in terms of having a smaller number of classes of users and in terms of simplifying the types of use allowed. For my part, this feels like a sensible way forward.

Thirdly, the idea of allowing 'walk-in users' in the digital age was called into question. Owen Stephens referred to the whole notion as "stupid" in his opening talk, suggesting that we need to completely revisit what we are trying to achieve by it and, more importantly, talk to publishers about what we want to do. Sticking my neck out a little, my personal take on this is that in the age of the Web and widely implemented federated access management it is somewhat unreasonable of academic institutions to expect publishers to provide any access to digital resources by walk-in users. But perhaps I'm just being naive about the issues here?

Fourthly, there was some discussion around overseas students. Louise Cole of Kingston University noted, with some irony, that in some cases walk-in users with no affiliation to the institution can get a better deal in terms of access to resources than registered students of that institution who happen to be based overseas. Again, I'll stick my neck out with a personal view (quite possibly a view not shared by my colleagues here!). Geography has become irrelevant and should play no part in our licensing deals. A university with 6000 undergrads should be dealt with as a university of 6000 undergrads, irrespective of whether 3000 of them happen to be based overseas. If this gives publishers problems in terms of pricing across different geographic markets, get over it. The world is largely flat.

And finally, another personal view about something that didn't really come up during the day (at least until drinks in the pub afterwards!) but which increasingly struck me as the day progressed. We seem to be hitting something of a disconnect between theory and practice in this area - which is probably something that neither institutions nor publishers really like to acknowledge. On the one hand, we have relatively complex discussions around licensing terms and conditions, coupled potentially with relatively detailed ways of exchanging those licences in a machine-readable form. At the same time we have an over-arching emphasis on security and data protection in the way our access management federation is delivered (in a way that I've not really seen justified in terms of the risk of abuse of the resources being made available thru that federation). Meanwhile, on the other hand (err... back in the real world?) Shibboleth and OpenAthens system administrators are nearly always just setting the simplest kind of "This person is a member of the institution" attribute, passing it to the service provider and having them gain access to the resource as a result.

Are we routinely comparing our technology choices against a measure of the risk we are dealing with? Are we joining up our discussions about new kinds of users and usages in our licences with the same constructs in our SAML attribute sets? And finally, are we taking note of whether people on the ground are actually acting in line with our somewhat theoretical technology-centric positions?

Or is the reality that the people doing the day job are getting by with a just good enough approach and that, actually, publishers are perfectly happy with that provided the university pays the subscription fee?

Categorie: LIS, stranieri

Brief e-Book usage survey

Mar, 15/06/2010 - 15:29

I've been tangentially involved in some discussions here at Eduserv about the future direction that our Licence Negotiation team should go in with respect to e-books. As I hinted previously, getting a good picture of where things stand and where the use of e-books is likely to go in the near future doesn't strike me as being particularly easy at the moment.

So, whilst we have already negotiated a number of Chest Agreements for e-books with Emerald, Springer, IEEE, and other suppliers, we are now seeking feedback on e-books to help inform future negotiations in line with our charitable mission of working on behalf of universities and colleges to bring savings to the community.

As a first step, we have put together a very brief survey covering current and planned future use, budgetary issues, and licensing models. The survey is only 5 screens and 16 questions long, all but 2 of which are optional. We are quite happy for you to only answer the questions that you know the answers to, skipping the rest (though preferably passing the survey URL on internally to other people). It is targeted at our Licence Negotiation contacts in the UK and Ireland (mainly in libraries) although it would be fantastic if we could also get some responses from faculty teaching staff as well. One of the things we are interested in is how the decision-making process works for e-books purchases. (Note that at this time we are not interested in responses from outside the UK and Ireland).

Anyway... we will publish an anonymised summary of the results of the survey in due course. If you have an interest in the way e-books are being used in UK higher or further education institutions please encourage someone at your site to complete the survey.

Thanks.

Categorie: LIS, stranieri

I/AM moves up from 6 to 5, alright pop pickers? Not 'arf

Ven, 11/06/2010 - 10:21

[Title for Alan Freeman fans.]

Via a tweet from @chrisb (of the JISC) I note that Educause have published the results of their survey of the Top-Ten IT Issues, 2010 (for US HE institutions).

I/AM (identity and access management) has moved up from number 6 to number 5, about which the report says:

Critical questions for Identity/Access Management include the following:

  1. What is the institution's documented process for verifying the identity of individuals and linking physical and electronic identities?
  2. What standards, trust systems, or existing federations (e.g., InCommon) can be used to ensure that an institution can trust another institution's electronic identities?
  3. Are I/AM policies and processes adaptable and flexible to allow for changes in roles and access rights over time?
  4. How should institutions strike the balance between carefully managing identity and access and utilizing broadly distributed networked resources?
  5. Do current I/AM strategies account for federation and single sign-on with third-party hosted and cloud-based applications?
  6. How can institutions create stronger linkages between physical and electronic identities?

(Note: the bullet points were not numbered in the original.)

I think the JISC's work on the UK Access Management Federation has done much to help with these kinds of issues in the UK, so I wonder if the critical questions in the UK might be somewhat different?  For example, number 2 would probably focus more heavily on issues around inter-federation trust (i.e. trust between institutions in the UK and those elsewhere).

Numbers 3 and 4 are interesting and I expect that these kinds of issues will be touched on during next week's Where next for resource licensing? event, organised jointly by JIBS and Eduserv and from which I hope to live-blog on eFoundations LiveWire.  The explicit cross-over between resource licensing and access management seems to feature fairly low in our discussion priorities (at least as far as I'm aware) though it is clearly a topic of interest to Eduserv, since we offer services in both spaces (Licence Negotiation and Access and Identity Management).

I suspect that number 5 is of interest to us all and, for information, we have a bit of work bubbling under here at the moment to link together OpenAthens with Google Apps, though I'm not sure if there's anything more public that I can share with you yet.

Number 6 looks interesting, though I'm slightly bemused by what it actually means.

Categorie: LIS, stranieri

Is the e-book glass half full or half empty in UK academia?

Gio, 10/06/2010 - 10:48

There was a article about e-book uptake in the (US) university sector in the THE the other day, re-printed from Inside Higher Ed, The E-Book Sector.

The piece suggests that uptake might be less than the general hype around e-book indicates except in the world of for-profit online education (I'm not sure how that applies in the UK?):

Among the respondents to a 2009 Campus Computing Project survey of 182 online programmes at non-profit universities, 9 per cent said e-textbooks were “widely used” at their institutions, while nearly half said electronic versions were “rarely used”. Even fewer brick-and-mortar institutions are deploying e-books in lieu of hard copies, with fewer than 5 per cent citing e-book deployment as a key IT priority in the short term, according to another Campus Computing Project Survey. And according to data from market research firm Student Monitor, e-textbooks accounted for only 2 per cent of all e-textbook sales last autumn.

In the UK, the final report from the JISC-funded National e-Books Observatory Project apparently paints a rather different picture:

E-books are now part of the academic mainstream: nearly 65% of teaching staff and students have used an e-book to support their work or study or for leisure purposes.

My initial reaction was that these two statements seem at odds with each other but on reflection I think not - "nearly half said electronic versions were 'rarely used'" isn't that different from "nearly 65% of teaching staff and students have used an e-book", it's just got a different emphasis.

As with our own snapshots of 3-D virtual world usage in UK education, carried out on our behalf by John Kirriemuir (a project which has coincidentally just come to the end of our funding though John plans to continue the work in other ways), stats are easy to play with. Whilst it may be technically correct to say "all UK universities are active in virtual worlds", doing so isn't particularly helpful since the uptake may be extremely patchy across each institution.

Nonetheless, the 65% figure quoted by the JISC-funded study seems very high to me (based on my very limited experience of the uptake of these things). Are e-books really gaining ground in UK academia that fast?

(I note that the JISC study doesn't actually define what it means by e-book, other than to say "it refers to generic e-books available via the library, retail channels or on the web". I'm assuming that the study uses that term in line with the Wikipedia definition:

An e-book (short for electronic book and also known as a digital book, ebook, and eBook) is an e-text that forms the digital media equivalent of a conventional printed book, sometimes restricted with a digital rights management system.

but I'm not sure.)

Categorie: LIS, stranieri

The implications of mobile... or "carry on up the smart phone"

Mar, 25/05/2010 - 16:39

This is the second of my two posts on the Eduserv Symposium 2010: The Mobile University, this one focusing on the day's content.

I'll start by revisiting the sound-bites that I used in my brief summing up at the end of the day. I'm not totally sure how useful these are but I wrote them down as things I would have tweeted, had I been on Twitter during the day (which I wasn't, for the reasons outlined in my last post). I'm not going to analyse the talks in detail - all the material is now available (slides and video) so you can watch/listen to it all in any case and various other people have written their own summing up of the day - Marieke Guy, Mike Nolan, Christine Sexton, Paul Sweeney, Chris Thomson and Mike J for example. [Who have I missed?]

Paul Golding kicked the day off with a great overview of the mobile space.  He provided all sorts of facts and figures but added, "It [mobile] is not just about the tech, it's about how it changes behaviour" and I think this theme re-emerged at several points during the day. The key point, for me, is that mobile is different this time round and it is different because mobile technology now allows us to do things we couldn't do before, to work, communicate, socialise, play and relax in different ways, and that is being recognised not just by the geeks but by all sorts of ordinary people.  So what is different? Paul's key drivers for smartphone adoption are worth re-iterating:

  • "faster access,
  • rich user-interfaces,
  • sensor proliferation,
  • cloud computing,
  • social computing,
  • real-time web"

and it strikes me that one of things that is really interesting here is the coming together of handheld devices with the social web.

Christine Sexton re-iterated the cultural change aspects of mobile anecdotally by noting that students now turn up at university not with questions like, "how do I get a username and password?" or "where are the computers?" but with "where's the Internet?". She went on to outline the implications on support models for universities - control, choice, innovation and hands off - before ending with a nod to the business model drivers at play in this space... something that I'll return to in a moment. Christine ended with a call for universities to build 'mobile' into existing strategies and policies around delivery, infrastructure and support and to "carry on innovating".

Andy Ramsden drew an analogy between change in universities and 'soil creep' (a slow underlying process where you can't tell if much has really changed, where the changes themselves are quite variable across the landscape and where it's not clear what the underlying processes are). He was talking specifically about moves towards the greater use of mobile in teaching and learning within the HE sector, though I suspect that the analogy works just as well more generally!  Andy also characterised two kinds of mobile adoption - the first being "more of the same but on your phone" (continuing the trend from desktop to laptop) and the second being the "new learning landscapes" that the use of mobile enables.

Simon Marsden ended his lightning talk by suggesting that we (as providers within universities) need to "lighten up" - again a strong reference to the cultural changes that are happening around us but also, I suspect, echoing Christine's anti-"we don't support that" approach and hinting instead at a 'just do it' kind of mentality.

Tom Hume's talk was very pragmatic, coming from years of experience of building mobile apps on various platforms.  It struck me that much of what he talked about concerned quite generic 'agile' approaches to software development ("release early, release often"), rather than being specifically about mobile but it was very interesting nonetheless. For example, I really liked his case-study where a mobile app was built around the hypothetical needs of a bunch of named but imaginary "real" people. He noted that one of the key things that Apple had done with the introduction of the iPhone was not the handset itself but the fact that they managed to force mobile operators to move to flat-rate data tarrifs (he used the phrases "more fragmentation, simpler tarrifs" and "commoditisation of access" both of which I quite like), a fundamental part of the cultural shift we have seen happen since.

Finally, John Traxler rounded off the day with a wide-ranging keynote about the use of mobile in education. He noted that "mobile doesn't necessarily mean free[dom] - there are a new set of affordances but also a new set of tetherings" - something that we would all do well to remember the next time we are tempted/forced to make work-related use of our smart phones outside of working hours... which reminds me of Dick's regular calls back to the office in Woody Allen's Play it Again Sam (1972):

Making telephone calls from landlines wherever you are such as "This is Mister Christie. I'm at the Hong Fat Noodle Company. That's er, 824-7996." are probably lost on the mobile phone generation. Infact, there were barely any people like this in the early seventies but this running gag is a classic addition to this great movie anyway.

John hinted at three problem areas:

  • "Lack of scale,
  • lack of sustainability, and
  • lack of evidence of effectiveness."

Which brings me back to those business-model issues...

Towards the end of her talk, Christine considered the financial situation. She said (words to the effect of), "the question is not, 'can we afford to support mobile?' but, 'can we afford not to?'" - not an unusual sentiment where new technologies are concerned, particularly where uptake outside education has been widespread.  But it is an interesting statement and I can think of two reasons for making it - either that there will be financial penalties for not adopting/supporting it or that universities will be failing in their mission to deliver learning and research effectively unless they do (or both). Note that this is my interpretation, Christine may have meant something completely different. However, given that the assertion was made in the context of money, I assume that the former was intended.

Which makes me wonder...

In financial terms, how significant are the drivers for universities to adopt 'mobile', or any other form of ICT for that matter? The implication is that prospective students and/or prospective staff and researchers will not bring their funding to a university that is perceived to be lagging behind others in ICT terms. Speaking as a parent of one actual (and two potential) university student(s), I'm not convinced we are at that point yet. Provision of (and use of) ICT is a factor in the overall perception of what makes one university a better choice than another one but it is only one such factor and (I suggest) still a relatively small one. Coupled with the lack of evidence for the effectiveness and sustainability of mobile in both teaching and learning and research I'm not sure how much of a watertight business case could be made for significant investment in 'mobile' currently?

Now, of course, a similar lack of business case would have existed around the adoption of the Web at the end of the last century (I love being able to say that!) and there would have fairly rapidly come a point (though I don't recall exactly when it was) where any university that didn't have a website would have looked very out of place, probably to the point of having a negative impact on staff and student recruitment. Are we at that point yet with mobile? No, I don't think so. How quickly will we reach that point? I don't know, though I guess it will be reasonably soon. But I also think we need to understand the issues about the effectiveness and sustainability of 'mobile' and the perception and decision-making factors within our target audiences rather better than we do currently in order to be able to make more balanced decisions in this area.

At the start of the day I suggested that the symposium had two objectives from an Eduserv perspective... Firstly, to help us understand the impact that 'mobile' might have on both our current services (single sign-on, licence negotiation, web development and hosting, and the data centre) and our potential future services. Secondly, to help the HE community in thinking about how it responds to an increasingly mobile world.

I find it hard to comment on whether we succeeded in the second of these two aims, other than to note that all the talks seemed to me to be both relevant and helpful in that context. In terms of our own services, it seems clear to me that we have to take 'mobile' on board in everything we do, whether that's in the way our access management services work on smart phones, the relevance of our licence negotiation services to the mobile space, the kinds of web solutions we build for government and other clients and the kind data centre services we offer.

Or, as Christine put it, we have to build 'mobile' into everything we do and carry on innovating.

Categorie: LIS, stranieri

Audiences and chairing events in a 'social media' world

Gio, 20/05/2010 - 10:11

This is the first of two blog posts about the recent Eduserv Symposium 2010: The Mobile University, which took place last Thursday at the Royal College of Physicians in London.

My next post will take a look at the content of the day, including my take on what it all meant. For this post I want to think more about mechanics - not of the "did the streaming and wifi work?" kind (actually, we did have some problems with the streaming early on in the day but Switch New Media, our streaming partner, and the venue's networking staff acted swiftly to resolve them by and large, for which I am very grateful) but thinking about my role as chair of the event.

Before doing so, let's think a little bit about the nature of conferences, and conference audiences, in the new 'social media' world (I'm using social media here as a shorthand for the use of those technologies that allow people to collaborate online in a real-time, relatively open, and social way with their peers, colleagues and friends - I'm including both the live-streaming of the event and tools like Twitter).

Let's start by partitioning delegates at conferences into three broad groups:

  • Firstly, there is the local physical audience - the people who are in the venue, watching and listening live to all the talks, asking questions, collaring speakers after their talks, and drinking the coffee at the breaks but who are, critically, not taking part in any digital activity during the event. This is what you might call the 'traditional' audience I guess.
  • Secondly, there is the local virtual audience - those people who, like the first group, are physically in the venue but who are also using their mobile devices and social networking services (such as Twitter) to discuss what is going on in the room. This discussion is typically refered to as the 'conference back-channel' though it is worth noting that it might start well before the event ("I'm on the train") and continue well after it ("presentation slides are now available"). In my experience, this group is usually smaller than the first group (often much smaller) and is often mis-understood or unrecognised by the people in the first group. It is perhaps also worth noting that this group tend to create a disproportionately large amount of the wider online buzz around an event.
  • Finally, there is the remote virtual audience - the people watching the live video stream from their office or home and who are typically also an active part of the event's back-channel.


This is not a perfect partitioning of the audience, and the names aren't quite right, but bear with me for a moment...

Increasingly, I think that event organisers need to strive to bring these three groups together, i.e. to maximise the interaction that takes place in the middle of the diagram above. That responsibility can be shared of course. For example, at the symposium this year, my colleague Mike Ellis had primary responsibility for encouraging the two virtual groups to gel effectively. However, I also think that the chair of the event increasingly has to be fully engaged with all three groups in order to properly do his or her job... and that, in my experience at least, is not an easy thing to do well. In short, it's not enough just to 'chair' what is going on in the room.

It is interesting that we use the term 'back-channel' for the virtual groups above (the right-hand side of the diagram), which implies there is also a 'front-channel' (the left-hand side). The labels 'front' and 'back' seem to me to be somewhat pejorative of what I'm labelling 'virtual' and I tend to think that, for all sorts of reasons, we need to get over this. I also think there are some barriers that currently get in the way of maximising the interaction between the three groups and it is perhaps worth outlining these briefly.

For those people physically in the room there are some very practical issues around the growth of 'virtual' activity - ownership of appropriate mobile devices, availability of power outlets (still a regular issue at events), good 3G coverage, and confidence that the wifi will be good enough spring immediately to mind. There are also problems of 'attitude' to the virtual activity. How many events still ask people to turn off their mobile devices at the start of the day? At this year's symposium we offered a quiet area for those delegates who did not want to sit next to someone who was using their laptop and, as reported previously, this was reasonably popular. My suspicion is that those people who don't use mobile devices and social networks at events see them only as a distraction, as being somewhat trivial ("oh, they're just reading email"), or perhaps even as being rude to the speakers on the day. Clearly, these views would not be shared by those people who see great value in a vibrant back-channel. There is a cultural shift going on here... and such shifts take time and happen at different rates across different parts of the population and I think we are still in the relatively early stages of this particular one.

For those people in the back-channel (both local and remote) I think there is generally a good 'coming together' of the two groups and Mike's work on the day helped this to happen at this event. Clearly though, those people who are actually in the room are able to engage directly with the speakers (they can put up their hand or interrupt or whatever) in a way that remote delegates can not. Remote delegates can usually only engage with speakers via an intermediary. Admittedly, there are some speakers who do appear to be able to stay on Twitter even as they speak but these are still few and far between and so, for the most-part, the lack of direct engagement by remote participants remains. For our symposia, we channel questions from remote delegates thru a designated person in the room (Mike Ellis in this case) but for this to work properly the chair has to give that person special attention and I think that, by and large, I failed to do so on the day this time round. Even where such attention is given, it still feels like something of a second-class experience for those delegates that choose to make use of it.

There is also the cognitive barrier of doing two things at once (perhaps it's just me?) - i.e. listening to the speaker and engaging in the back channel. This is partly device dependant I think. I can live-blog an event without difficulty using my laptop - indeed I strongly suspect that doing so actually improves the way I listen to the speaker - but I can't do the same on my iPhone (largely because the soft keyboard is too fiddly for me to use without thinking).

Finally then, there's the intersection between the local physical audience (who are not using the back-channel) and the remote virtual audience (who are). It seems to me that these two groups are least engaged in any real sense. For those people who are remote, there is some sense of shared presence with those in the room by virtue of the shots of the physical audience being shown as part of the live stream. (Incidentally, this is the main reason why I actually quite like having such shots included in the stream, though this is not a view shared by some of my colleagues here, nor by part of the audience.) On the other hand, for those people in the room, it is probably quite hard to remember that there even is a remote audience (let alone the fact that such an audience might actually be bigger than the one in the room - this year, 691 visitors from 7 countries, in 93 cities, in 153 organisations watched the live stream).


The result is something of a disconnect between the two groups.

Interestingly, I think this might currently leave the local virtual group in the role of bridging the two other groups. I don't think this is done in an explicit or intentioned way but it is interesting to note it nonetheless. Of course, it is also part of the event organiser's and chair's roles to bring these two groups together in some way.

Thinking back to our 3D virtual world symposium a few years ago, we overcame the 'local audience not being aware of the remote audience' problem to a certain extent by actually showing the virtual audience to the real audience during the day. (As an aside, one of the advantages of hybrid real and virtual world events is the greater sence of presence that is generated for delegates in the virtual world.)

For this year's (non-3D virtual world) symposium, one way of highlighting the remote virtual delegates would have been to show the Twitter stream live during the talks. We took the decision (I think rightly) not to do so because of the distraction this might cause to the in-room audience. We did however try to achieve some of the same effect by displaying the event Twitter stream in the lunch/coffee/tea room. My suspicion is that this didn't work - the single screen which we used was probably too small and people were busy doing other things to notice.

So... a couple of recommendations (essentially in the form of notes to self for next year!):

Event chairs should engage as much as possible with all three groups above (preferably actively - i.e. by tweeting or whatever - but at least passively). At my age, this means having a screen in front of me for most of the day, showing me what is happening in the back-channel. This doesn't have to be projected for everyone else but trying to do it on an iPhone screen is too difficult with anything less than 20:20 eyesight!

Event chairs should speak directly to the remote audience as often as possible and should explicitly acknowledge the back-channel in their communication with speakers and audience. Oddly, I felt that I've done this better in previous years than I did this year. I'm not sure why, though the time that I gave myself to introduce the day at the start of this year's event, coupled with the fact that we had some early teething problems with the streaming, meant that I wasn't properly able to introduce the remote audience and back-channel as I would have liked.

To sum up then, a chair's role in this new 'social media' world is to actively engage with the whole audience, not just with those sitting in the room in front of him or her. This is not easy to do and I suspect it requires a slight change of mindset. The chair's role is quite complex, at least that is my experience, at the best of times, a situation made worse by the new environment. For this reason, I'm not convinced that it can easily be combined with other tasks (like keeping one eye on other mechanics of the event or preparing a final summing up). Such tasks are better handled by other people.

To a certain extent, the chair's role becomes rather like that of David Dimleby hosting BBC's Question Time. The bulk of his time is spent focusing on the local audience and speakers but the remote audience watching the TV is the real reason why the programme is being made at all and every so often he will speak explicitly to camera to address that audience.

Note that this post is not intended to be negative in any sense. I think this symposium was our best yet and I'm really pleased with the way it went both in terms of the coherence of the overall theme and individual speakers and in terms of the mechanics of the day itself. I also think that our decision to limit the back-channel to Twitter-only was the right one and actually resulted in less confusion about what should be discussed where - though there is a proviso that 140 characters is probably too short for asking serious questions (so this is something we will have to think about for next year). But one can always do things better and that only starts by acknowledging where there were areas of weakness. When I woke up the morning after the event I was concerned that I could, and possibly should, have done a much better job of embracing the true 'hybrid' nature of the symposium in my role as chair for the day.

And a final thought... I've written this post with a particular focus on the chair's role within an event. The reality is that embracing the hybrid nature of events is incumbent on us all. We are going thru a cultural shift that requires the development of new social norms, not just in the digital space but in the hybrid space where physical meets digital. My suspicion is that the groups above will remain for some time to come (probably for ever) and that we will all have to work to bring these groups together as best we can - chairs, speakers and delegates - even if that just means remembering that the other groups exist!

Categorie: LIS, stranieri

Preparing for the mobile university

Mar, 11/05/2010 - 17:21

We're in the final stages of preparing for this year's Eduserv Symposium, The Mobile University, and now that the programme-setting, speaker-inviting, venue-finding, catering-arranging, badge-making, printing, courriering, hotel-booking and the rest of it are pretty much out of the way (I hope that isn't a case of famous last words) I'm hoping that I can relax slightly and look forward to the talks by Paul Golding, Christine Sexton, Andy Ramsden, Tom Hume and John Traxler as well as the lightning talks by Nick Skelton, Wayne Barry, Simon Marsden and Tim Fernando.

In short, I think we have a great programme.

We also have our biggest audience ever this year (around 280) and we are live-streaming all the talks as usual (done by Switch New Media as per last year) so I'm hoping that we will have a big virtual audience as well.  The stream is open to anyone, so feel free to watch and contribute - check your timings if you are joining us from outside the UK.

There are also a couple of minor changes to the way we have organised things this year:

  • In response to last year's feedback, we have set aside an area of the auditorium, designated as a 'quiet area', where we will ask people not to use laptops and where we will try and avoid capturing people in photos and on the video stream.  This dual use is slightly confusing I guess, but we felt it would be even more confusing to try and segregate people into separate 'no photos' and 'quiet' areas.  We'll see how it goes.  For info... about 20% of this year's delegates indicated that they would like to sit in this area, though it isn't clear whether the preference was primarily for the quiet or the lack of photos - my guess is that it is the former.
  • Last year we used both an online chat room and Twitter to encourage a symposium back-channel (with an emphasis on "use the chat room to ask questions" for remote delegates).  The back-channel was used both in the room and by remote delegates but we felt that the choice of virtual venues caused some confusion as to what was expected to happen where.  This year, we've decided to only use Twitter.  There's a cost to this (for delegates), in that everyone has to sign up to Twitter if they want to take part in the back-channel, but we felt that the time is right to make that particular move.  Again, we'll see how things work out.  If you want to take part in the back-channel, the hash-tag for the event is #esym10.
  • Last year (as in previous years) we set up a social network for the symposium using Ning before the event so that people could introduce each other. This year we sensed that people were feeling somewhat jaded about these kinds of meeting-specific social networks and so we decided against the use of one this time around.  To be honest, such networks rarely seem to get used for anything much in any case.
  • Finally, it wouldn't have been a 'real' mobile event without some use of QR Codes, so please remember to install a QR Code reader onto your smart phone before you leave home.  More info on the day itself.

From an Eduserv perspective the symposium has two objectives... Firstly, to help us understand the impact that 'mobile' might have on both our current services (single sign-on, licence negotiation, web development and hosting, and the data centre) and our potential future services. Secondly, to help the HE community in thinking about how it responds to an increasingly mobile world.

All in all, I'm really looking forward to the event on Thursday and I hope it proves useful to people.  I'll blog again after the event with my own thoughts on how it went and what it might mean.

Categorie: LIS, stranieri

RDFa for the Eduserv Web site

Mer, 05/05/2010 - 17:25

Another post that I've been intermittently chiselling away at in the draft pile for a while... A few weeks ago, I was asked by Lisa Price, our Website Communications Manager, to make some suggestions of how Eduserv might make use of the RDFa in XHTML syntax to embed structured data in pages on the Eduserv Web site, which is currently in the process of being redesigned. I admit this is coming mostly from the starting point of wanting to demonstrate the use of the technology rather than from a pressing use case, but OTOH there is a growing interest from RDFa amongst some of Eduserv's public sector clients so a spot of "eating our own dogfood" would be a Good Thing, and furthermore there are signs of a gradual but significant adoption of RDFa by some major Web service providers.

It seems to me Eduserv might use RDFa to describe, or make assertions about:

  • (Perhaps rather trivially) Web pages themselves i.e. reformulating the (fairly limited) "document metadata" we supply as RDFa.
  • (Perhaps rather more interestingly) some of the "things" that Eduserv pages "are about", or that get mentioned in those pages (e.g. persons, organisations, activities, events, topics of interest, etc).

Within that category of data about "things", we need to decide which data it is most useful to expose. We could:

  • look at those classes of data that are processed by tools/services that currently make use of RDFa (typically using specified RDF vocabularies); or
  • focus on data that we know already exists in a "structured" form but is currently presented in X/HTML either only in human-readable form or using microformats (or even new data which isn't currently surfaced at all on the current site)

Another consideration was the question of whether data was covered by existing models and vocabularies or required some analysis and modelling.

To be honest, there's a fairly limited amount of "structured" information on the site currently. There is some data on licence agreements for software and data, currently made available as HTML tables and Excel spreadsheets. While I think some of the more generic elements of this might be captured using a product/service ontology such as Good Relations, the license-specific aspects would require some additional modelling. For the short term at least, we've taken a somewhat "pragmatic" approach and focused mainly on that first class of data for which there are some identifiable consuming applications, based on the use of specified RDF vocabularies - and more specifically on data that Google and Yahoo make particular reference to in their documentation for creators/publishers of Web pages.

That's not to say there won't be more use of RDFa on the site in the future: at the moment, this is something of a "dipping toes in the water" exercise, I think.

The following is by best effort to summarize Google and Yahoo support for RDFa at the time of writing. Please note that this is something which is evolving - as I was writing up this post, I just noticed that the Google guidelines have changed slightly since I sent my initial notes to Lisa. And I'm still not at all sure I've captured the complete picture here, so please do check their current documentation for content providers to get an idea of the current state of play.

Google and RDFa

Google's support for RDFa is part of a larger programme of support for structured data embedded in X/HTML that they call "rich snippets" (announced here), which includes support for RDFa, microformats and microdata. (The latter, I think, is a relatively recent addition).

Google functionality extends to extracting specified categories of RDFa data in (some) pages it indexes, and displaying that in search result sets (and in place pages in Google Maps). It also provides access to the data in its Custom Search platform.

Initially at least, Google required the use of its own RDF vocabularies, which attracted some criticism (see e.g. Ian Davis' response), but it appears to have fairly quietly introduced some support for other RDF vocabularies. "In addition to the Person RDFa format, we have added support for the corresponding fields from the FOAF and vCard vocabularies for all those of you who asked for it." And Martin Hepp has pointed to Google displaying data encoded using the Good Relations product/service ontology.

The nature of the RDFa syntax is such that it is often fairly straightforward to use multiple RDF vocabularies in RDFa e.g. triples using the same subject and object but different predicates can be encoded using a single RDFa attribute with multiple white-space-separated CURIEs - though things do tend to get more messy if the vocabularies are based on different models (e.g. time periods as literals v time periods as resources with properties of their own).

Google provides specific recommendations to content creators on the embedding of data to describe:

Yahoo and RDFa

Yahoo's support for RDFa is through its SearchMonkey platform. Like Google, it provides a set of "standard" result set enhancements, based on the use of specified RDF vocabularies for a small set of resource types:

In addition, my understanding is that although Yahoo defines some RDF vocabularies of its own, and describes the use of specified vocabularies in the guidelines for the resource types above, it exposes any RDFa data in pages it indexes to developers on its SearchMonkey platform, to allow the building of custom search enhancements. Several existing vocabularies are discussed in the SearchMonkey guide and the FAQ in Appendix D of that document notes "You may use any RDF or OWL vocabulary".

Linked Data

The decentralised extensibility built into RDF means that a provider can choose to extend what data they expose beyond that specified in the guidelines mentioned above.

In addition, I tried to take into account some other general "good practice" points that have emerged from the work of the Linked Data community, captured in sources such as:

So in the Eduserv case, for example (I hope!) URIs will be assigned to "things" like events, distinct from the pages describing them, with suitable redirects put in place on the HTTP server and syitable triples in the data linking those things and the corresponding pages.

Summary

Anyway, on the basis of the above sources, I tried to construct some suggestions, taking into acccount both the Google and Yahoo guidelines, for descriptions of people, organisations and events, which I'll post here in the next few entries.

Postscript: Facebook

Even more recently, of course, has come the news of Facebook's announcement at the f8 conference of their Open Graph Protocol. This makes use of RDFa embedded in the headers of XHTML pages using meta elements to provide (pretty minimal) metadata "about" things described by those pages (films, songs, people, places, hotels, restaurants etc - see the Facebook page for a full (and I imagine, growing) list of resource types supported).

Facebook makes use of the data to drive its "Like" application: a "button" can be embedded in the page to allow a Facebook user to post the data to their Fb account to signal an "I like this" relationship with the thing described. Or as Dare Obasanjo expresses it, an Fb user can add a node for the thing to their Fb social graph, making it into a "social object". This results in the data being displayed at appropriate points in their Fb stream, while the button displays, as a minimum, a count of the "likers" of the resource on the source page itself; logged-in Fb users would, I think, see information about whether any of their "friends" had liked it.

My reporting of these details of the interface is somewhat "second-hand" as I no longer use Facebook - I deleted my account some time ago because I was concerned about their approaches to the privacy of personal information (see these three recent posts by Tony Hirst for some thoughts on the most recent round of changes in that sphere).

Perhaps unsurprisingly given the popularity of Fb and its huge user base, the OGP announcement seems to have attracted a very large amount of attention within a very short period of time, and it may turn out to be a significant milestone for the use of XHTML-embedded metadata in general and of RDFa in particular. The substantial "carrot" of supporting the Fb "Like" application and attracting traffic from Fb users is likely to be the primary driver for many providers to generate this data, and indeed some commentators (see e.g. this BBC article) have gone as far as to suggest that this represents a move by Facebook to challenge Google as the primary filter of resources for people searching and navigating the Web.

However, I also think it is important to distinguish between the data on the one hand and that particular Facebook app on the other. Having this data available, minimal as it may be, also opens up the possibility of other applications by other parties making use of that same data.

And this is true also, of course, for the case of data constructed following the Google and Yahoo guidelines.

Categorie: LIS, stranieri

The future of UK Dublin Core application profiles

Mer, 05/05/2010 - 15:24

I spent yesterday morning up at UKOLN (at the University of Bath) for a brief meeting about the future of JISC-funded Dublin Core application profile development in the UK.

I don't intend to report on the outcomes of the meeting here since it is not really my place to do so (I was just invited as an interested party and I assume that the outcomes of the meeting will be made public in due course). However, attending the meeting did make me think about some of the issues around the way application profiles have tended to be developed to date and these are perhaps worth sharing here.

By way of background, the JISC have been funding the development of a number of Dublin Core application profiles in areas such as scholarly works, images, time-based media, learning objects, GIS and research data over the last few years.  An application profile provides a model of some subset of the world of interest and an associated set of properties and controlled vocabularies that can be used to describe the entities in that model for the purposes of some application (or service) within a particular domain. The reference to Dublin Core implies conformance with the DCMI Abstract Model (which effectively just means use of the RDF model) and an inherent preference for the use of Dublin Core terms whenever possible.

The meeting was intended to help steer any future UK work in this area.

I think (note that this blog post is very much a personal view) that there are two key aspects of the DC application profile work to date that we need to think about.

Firstly, DC application profiles are often developed by a very small number of interested parties (sometimes just two or three people) and where engagement in the process by the wider community is quite hard to achieve. This isn't just a problem with the UK JISC-funded work on application profiles by the way. Almost all of the work undertaken within the DCMI community on application profiles suffers from the same problem - mailing lists and meetings with very little active engagement beyond a small core set of people.

Secondly, whilst the importance of enumerating the set of functional requirements that the application profile is intended to meet has not been underestimated, it is true to say that DC application profiles are often developed in the absence of an actual 'software application'. Again, this is also true of the application profile work being undertaken by the DCMI. What I mean here is that there is not a software developer actually trying to build something based on the application profile at the time it is being developed. This is somewhat odd (to say the least) given that they are called application profiles!

Taken together, these two issues mean that DC application profiles often take on a rather theoretical status - and an associated "wouldn't it be nice if" approach. The danger is a growth in the complexity of the application profile and a lack of any real business drivers for the work.

Speaking from the perspective of the Scholarly Works Application Profile (SWAP) (the only application profile for which I've been directly responsible), in which we adopted the use of FRBR, there was no question that we were working to a set of perceived functional requirements (e.g. "people need to be able to find the latest version of the current item"). However, we were not driven by the concrete needs of a software developer who was in the process of building something. We were in the situation where we could only assume that an application would be built at some point in the future (a UK repository search engine in our case). I think that the missing link to an actual application, with actual developers working on it, directly contributed to the lack of uptake of the resulting profile. There were other factors as well of course - the conceptual challenge of basing the work on FRBR and that fact that existing repository software was not RDF-ready for example - but I think that was the single biggest factor overall.

Oddly, I think JISC funding is somewhat to blame here because, in making funding available, JISC helps the community to side-step the part of the business decision-making that says, "what are the costs (in time and money) of developing, implementing and using this profile vs. the benefits (financial or otherwise) that result from its use?".

It is perhaps worth comparing current application profile work and other activities. Firstly, compare the progress of SWAP with the progress of the Common European Research Information Format (CERIF), about which the JISC recently reported:

EXRI-UK reviewed these approaches against higher education needs and recommended that CERIF should be the basis for the exchange of research information in the UK. CERIF is currently better able to encode the rich information required to communicate research information, and has the organisational backing of EuroCRIS, ensuring it is well-managed and sustainable.

I don't want to compare the merits of these two approaches at a technical level here. What is interesting however, is that if CERIF emerges as the mandated way in which research information is shared in the UK then there will be a significant financial driver to its adoption within systems in UK institutions. Research information drives a significant chunk of institutional funding which, in turn, drives compliance in various applications. If the UK research councils say, "thou shalt do CERIF", that is likely what institutions will do.  They'll have no real choice. SWAP has no such driver, financial or otherwise.

Secondly, compare the current development of Linked Data applications within the UK data.gov.uk initiative with the current application profile work. Current government policy in the UK effectively says, 'thou shalt do Linked Data' but isn't really any more prescriptive. It encourages people to expose their data as Linked Data and to develop useful applications based on that data. Ignoring any discussion about whether Linked Data is a good thing or not, what has resulted is largely ground-up. Individual developers are building stuff and, in the process, are effectively developing their own 'application profiles' (though they don't call them that) as part of exposing/using the Linked Data. This approach results in real activity. But it also brings with it the danger of redundancy, in that every application developer may model their Linked Data differently, inventing their own RDF properties and so on as they see fit.

As Paul Walk noted at the meeting yesterday, at some stage there will be a huge clean-up task to make any widespread sense of the UK government-related Linked Data that is out there. Well, yes... there will. Conversely, there will be no clean up necessary with SWAP because nobody will have implemented it.

Which situation is better!? :-)

I think the issue here is partly to do with setting the framework at the right level. In trying to specify a particular set of application profiles, the JISC is setting the framework very tightly - not just saying, "you must use RDF" or "you must use Dublin Core" but saying "you must use Dublin Core in this particular way". On the other hand, the UK government have left the field of play much more open. The danger with the DC application profile route is lack of progress. The danger with the government approach is too little consistency.

So, what are the lessons here? The first, I think, is that it is important to lobby for your prefered technical solution at a policy level as well as at a technical level. If you believe that a Linked Data-compliant Dublin Core application profile is the best technical way of sharing research information in the UK then it is no good just making that argument to software developers and librarians. Decisions made by the research councils (in this case) will be binding irrespective of technical merit and will likely trump any decisions made by people on the ground.

The second is that we have to understand the business drivers for the adoption, or not, of our technical solutions rather better than we do currently. Who makes the decisions? Who has the money? What motivates the different parties? Again, technically beautiful solutions won't get adopted if the costs of adoption are perceived to outweigh the benefits, or if the people who hold the purse strings don't see any value in spending their money in that particular way, or if people simply don't get it.

Finally, I think we need to be careful that centralised, top-down, initiatives (particularly those with associated funding) don't distort the environment to such an extent that the 'real' drivers, both financial and user-demand, can be ignored in the short term, leading to unsustainable situations in the longer term. The trick is to pump-prime those things that the natural drivers will support in the long term - not always an easy thing to pull off.

Categorie: LIS, stranieri

RDFa 1.1 drafts available from W3C

Mar, 27/04/2010 - 12:51

Last week, the W3C RDFa Working Group announced the availability of two new "First Public Working Drafts" which it is circulating for comment:

Ivan Herman, the W3C Semantic Web Activity Lead, and a co-editor of these documents, has provided a very helpful summary of their main features, and particularly of some of the differences they introduce whem compared with the current W3C Recommendation for RDFa RDFa in XHTML: Syntax and Processing: A collection of attributes and processing rules for extending XHTML to support RDF. I think the intent is that the new drafts maintain compatibility with the current recommendation, in the sense that all the features used in XHTML+RDFa 1.0 are also present in RDFa 1.1. I should reiterate what Ivan says at the start of his piece: these are drafts and features may change based on feedback received.

Some of the most interesting features in these drafts, at least for data creators, are those which enable a more concise/compact style of RDFa markup. One of the criticisms of the initial version of RDFa, particularly from communities unfamiliar with RDF syntaxes, was the dependency on the use of prefixed names, in the form of CURIEs, two-part names made up of a "prefix" and a "reference", mapped to URIs by associating the prefix with a "base" URI, and concatenating the reference part of the CURIE with that URI. In XHTML the prefix-URI association was made through an XML Namespace Declaration. In particular, arguments against this approach focused on problems of "copy-and-paste", where a document fragment including RDFa markup was extracted from a source document without also copying the in-scope XML Namespace declarations, and as a result the RDF interpretation of the fragment in the context of a different (paste target) document was changed. More generally, there were some concerns that the use of prefixes was difficult to explain and understand, at least when compared with the "unprefixed name" styles typically adopted in approaches like microformats.

The new drafts introduce several mechanisms which can simplify markup for authors.

I should emphasise that my examples below are based on my fairly rapid reading of the drafts, and any errors and misrepresentations are mine!

The @vocab attribute

@vocab is a new RDFa attribute which provides a means for defining a "default" "vocabulary URI" to which the "terms" in attribute values are appended to construct a URI.

Aside: I should note here that I'm using the word "term" in the sense it is used in the RDFa 1.1 draft, where it refers to a datatype for a string used in an attribute value; this differs from usage by e.g. DCMI where "term" typically refers to a property, class, vocabulary encoding scheme or syntax encoding scheme i.e. to the "conceptual resource" identified by a DCMI URI, rather than to a syntactic component. In RDFa 1.1, "terms" have the syntactic constraints of the NCName production in the XML Namespaces specification.

This mechanism provides an alternative to the use of a CURIE (with prefix, reference and namespace declaration) to represent a URI.

Consider an example based on those from my recent post about RDFa (1.0) and document metadata (This is a "hybrid" of the examples 1.1.5, 1.3.5, and 2.1.5 in that post):

XHTML+RDFa 1.0:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> </head> <body> <h1 property="dc:title">My World Cup 2010 Review</h1> <p>About: <a rel="dc:subject" href="http://example.org/resource/2010_FIFA_World_Cup"> The 2010 World Cup </a> </p> <p>Date last modified: <span property="dc:modified" datatype="xsd:date">2010-07-04</span> </p> </body> </html>

This represents the following three triples (in Turtle):

@prefix dc: <http://purl.org/dc/terms/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . @prefix ex: <http://example.org/resource/> . <> dc:title "My World Cup 2010 Review" . <> dc:subject ex:2010_FIFA_World_Cup . <> dc:modified "2010-07-04"^^xsd:date .

XHTML+RDFa 1.1 using @vocab:

Using the @vocab attribute on the body element to set http://purl.org/dc/terms/ as the default vocabulary URI, I could write this as:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" version="XHTML+RDFa 1.1"> <head> <title>My World Cup 2010 Review</title> </head> <body vocab="http://purl.org/dc/terms/"> <h1 property="title">My World Cup 2010 Review</h1> <p>About: <a rel="subject" href="http://example.org/resource/2010_FIFA_World_Cup"> The 2010 World Cup </a> </p> <p>Date last modified: <span property="modified" datatype="xsd:date">2010-07-04</span> </p> </body> </html>

In that case, where just three properties are referenced, the reduction in the number of characters is minimal, but if several properties from the same vocabulary were referenced, then the saving could be more substantial.

The @vocab approach provides limited help where, as is often the case, terms from multiple RDF vocabularies are used in combination (e.g. the example above continues to use a CURIE for the URI of the XML Schema date datatype), but other features of RDFa 1.1 are useful in those cases.

RDFa Profiles and the @profile attribute

Perhaps more powerful than the @vocab attribute is the new RDFa 1.1 feature known as the RDFa profile, and the @profile attribute:

RDFa Profiles are optional external documents that define collections of terms and/or prefix mappings. These documents must be defined in an approved RDFa Host Language (currently XHTML+RDFa [XHTML-RDFA]). They may also be defined in other RDF serializations as well (e.g., RDF/XML [RDF-SYNTAX-GRAMMAR] or Turtle [TURTLE]). RDFa Profiles are referenced via @profile, and can be used by document authors to simplify the task of adding semantic markup.

Let's take each of these two functions - defining terms and defining prefix mappings - in turn.

Defining term mappings in an RDFa profile

An RDFa profile can provide mappings between "terms" and URIs. The following example provides four such "term mappings", for the URIs of three properties from the DC Terms RDF vocabulary and for the URI of one XML Schema datatype:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:rdfa="http://www.w3.org/ns/rdfa#" version="XHTML+RDFa 1.1"> <head> <title>My RDFa Profile for a few DC and XSD terms</title> </head> <body> <h1>My RDFa Profile for a few DC and XSD terms</h1> <ul> <li typeof="rdfa:TermMapping"> <span property="rdfa:term">title</span> : <span property="rdfa:uri">http://purl.org/dc/terms/title</span> </li> <li typeof="rdfa:TermMapping"> <span property="rdfa:term">about</span> : <span property="rdfa:uri">http://purl.org/dc/terms/subject</span> </li> <li typeof="rdfa:TermMapping"> <span property="rdfa:term">modified</span> : <span property="rdfa:uri">http://purl.org/dc/terms/modified</span> </li> <li typeof="rdfa:TermMapping"> <span property="rdfa:term">xsddate</span> : <span property="rdfa:uri">http://www.w3.org/2001/XMLSchema#date</span> </li> </ul> </body> </html>

Note that - in contrast to the case of CURIE references - the content of the "term" doesn't have to match the trailing characters of the URI; so for example, here I've mapped the term "about" to the URI http://purl.org/dc/terms/subject. So sets of "terms" corresponding to various community-specific or domain=specific lexicons could be mapped to a single set of URIs.

Also a single RDFa profile might provide mappings for URIs from different URI owners - the example above reference three DCMI-owned URIs for properties and a W3C-owned URI for a datatype. Conversely, different subsets of URIs owned by a single agency may be referenced in different RDFa profiles.

If the URI of this RDFa profile is http://example.org/profile/terms/, then I can reference it in an XHTML+RDFa 1.1 document, and make use of the term mappings it defines. So taking the example above again, and now using @profile to reference the profile and its term mappings:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" version="XHTML+RDFa 1.1"> <head> <title>My World Cup 2010 Review</title> </head> <body profile="http://example.org/profile/terms/"> <h1 property="title">My World Cup 2010 Review</h1> <p>About: <a rel="about" href="http://example.org/resource/2010_FIFA_World_Cup"> The 2010 World Cup </a> </p> <p>Date last modified: <span property="modified" datatype="xsddate">2010-07-04</span> </p> </body> </html>

The @profile attribute may appear on any XML element, so it is possible that an element with a @profile attribute referencing profile A may contain as a child element with a @profile attribute referencing profile B.

<body profile="http://example.org/profile/a/"> <h1 property="title">My World Cup 2010 Review</h1> <div profile="http://example.org/profile/b/"> <p>About: <a rel="about" href="http://example.org/resource/2010_FIFA_World_Cup"> The 2010 World Cup </a> </p> </div> </body>

And the value of a single @profile attribute may be a whitespace-separated list of URIs.

<body profile="http://example.org/profile/a/ http://example.org/profile/b/"> </body>

One of the questions I'm not quite sure about is what happens if the same "term" is mapped to different URIs in different profiles. I think, but I'm not 100% sure, only a single mapping is used and a single triple is generated, but I'm not sure about the precedence rules for determining which mapping is to be used.

As Ivan notes, probably the most common pattern for deploying RDFa profiles will be for the owners/publishers of RDF vocabularies (such as DCMI) to publish profiles for their vocabularies, and for data providers to simply reference those profiles, rather than creating their own.

Defining prefix mappings in an RDFa profile

RDFa 1.1 continues to support the use of XML Namespace Declarations to associate CURIE prefixes with URIs (see my first example above and the use of the XML Schema datatype) but it also introduces other mechanisms for achieving this. One of these is the ability to supply CURIE prefix to URI mappings in RDFa profiles.

The following example provides four such "prefix mappings", for the URIs of three DCMI vocabularies and for the URI of the XML Schema datatype vocabulary:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:rdfa="http://www.w3.org/ns/rdfa#" version="XHTML+RDFa 1.1"> <head> <title>My RDFa Profile for DC and XSD prefixes</title> </head> <body> <h1>My RDFa Profile for DC and XSD prefixes</h1> <ul> <li typeof="rdfa:PrefixMapping"> <span property="rdfa:prefix">dc</span> : <span property="rdfa:uri">http://purl.org/dc/terms/</span> </li> <li typeof="rdfa:PrefixMapping"> <span property="rdfa:prefix">dcam</span> : <span property="rdfa:uri">http://purl.org/dc/dcam/</span> </li> <li typeof="rdfa:PrefixMapping"> <span property="rdfa:prefix">dcmitype</span> : <span property="rdfa:uri">http://purl.org/dc/dcmitype/</span> </li> <li typeof="rdfa:PrefixMapping"> <span property="rdfa:prefix">xsd</span> : <span property="rdfa:uri">http://www.w3.org/2001/XMLSchema#</span> </li> </ul> </body> </html>

If the URI of this RDFa profile is http://example.org/profile/prefixes/, then I can reference it in an XHTML+RDFa 1.1 document, and make use of the prefix mappings it defines. Taking the example above again, and using @profile to reference this second profile and its prefix mappings:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.1//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-2.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" version="XHTML+RDFa 1.1"> <head> <title>My World Cup 2010 Review</title> </head> <body profile="http://example.org/profile/prefixes/"> <h1 property="dc:title">My World Cup 2010 Review</h1> <p>About: <a rel="dc:subject" href="http://example.org/resource/2010_FIFA_World_Cup"> The 2010 World Cup </a> </p> <p>Date last modified: <span property="dc:modified" datatype="xsd:date">2010-07-04</span> </p> </body> </html>

As in the case of term mappings, the issue arises of what happens in the case that two profiles provide different prefix-URI mappings for the same prefix. I think the CURIE datatype is based on the notion that at a point in a document, for prefix p, a single prefix-URI mapping is in force for that prefix, so I assume there are precedence rules for establishing which of the profile prefix mappings is to be applied.

Access to profiles and changes to triples?

Although the RDFa 1.1 profile mechanism is a powerful mechanism, it also introduces a new element of complexity for consumers of RDFa. In RDFa 1.0, an XHTML+RDFa document is "self-contained", by which I mean an RDFa processor can construct an interpretation of the document as a set of RDF triples using only the content of the document itself. In RDFa 1.1, however, the interpretation of terms and prefixes may be determined by the term mappings and prefix mappings specified in profiles external to the document containing the RDFa markup.

Consider my last example above. When the processor encounters the @profile attribute it retrieves the profile and obtains a list of prefix-URI mappings to be applied in subsequent processing, and when it encounters the CURIE "dc:title" it generates the URI http://purl.org/dc/terms/title

But if for some reason, the processor is unable to dereference the URI, and doesn't have a cached copy of the referenced profile, then it does not have those mappings available. In that case, for my example above, when the processor encounters the CURIE "dc:title" it would not have a mapping for the "dc" prefix, and (I think?) would instead (with the new "URI everywhere" rules in force) treat the string "dc:title" as a URI? (See e.g. the section on CURIE and URI Processing)

In the case where two profiles are referenced, and both provide a mapping for the same prefix, then it seems possible that the prefix mapping in force might change depending on the availability of access to the profiles.

I lurk on the RDFa WG list, and I've seen various discussions of how these sort of issues should be handled - see, for example, this thread on "What happens when you can't dereference a profile document?", though related issues surface in other discussions too. I suspect the current draft is far from the "last word" in this area, and these are the sort of issues on which the authors are seeking feedback.

Summary

I've focused here only on a few "highlights" of the RDFa 1.1 drafts, and Ivan's post covers a couple more which I won't discuss here (the use of the @prefix attribute to provide CURIE prefix mappings and the ability to use URIs in contexts where previously CURIEs were required), but I hope they give a flavour of the sort of functionality which is being introduced. The examples here are based on my understanding of the current drafts, but I may have made mistakes, so please do check out the drafts rather than relying on my interpretations.

It seems to me the WG is trying hard to address some of the criticisms made of RDFa 1.0, and to provide mechanisms that make the provision of RDFa markup simpler while retaining the power and flexibility of the syntax and ensuring that RDFa 1.0 data remains compatible. In particular, it seems to me the "term mapping" feature of RDFa profiles may be very useful in "shielding" data providers from some of the complexity of name-URI mappings and prefixed names, especially once the owners of commonly used RDF vocabularies start to make such profiles available.

However, such flexibility doesn't come without its own challenges. and it also seems that the profile mechanism in particular introduces some complexity which I imagine will become a focus of some discussion during the comment period for these drafts. Comments on the drafts themselves should be sent to the RDFa Working Group list.

Categorie: LIS, stranieri

Document metadata using DC-HTML and using RDFa

Gio, 22/04/2010 - 18:12

In the context of various bits and pieces of work recently (more of which I'll write about in some upcoming posts), I've been finding myself describing how document metadata that can be represented using DCMI's DC-HTML meta data profile, described in Expressing Dublin Core metadata using HTML/XHTML meta and link elements, might also be represented using RDFa. (N.B. Here I'm considering only the current RDFa in XHTML W3C Recommendation, not the newly announced drafts for RDFa 1.1). So I thought I'd quickly list some examples here. Please note: I don't intend this to be a complete tutorial on using RDFa. Far from it; here I focus only on the case of "document metadata" whereas of course RDFa can be used to represent data "about" any resources. And these are really little more than a few rough notes which one day I might reuse somewhere else.

I really just wanted to illustrate that:

  • in terms of its use with the XHTML meta and link elements, RDFa has many similarities to the DC-HTML profile - unsurprisingly, as the RDF model underlies both; and
  • RDFa also provides the power and flexibility to represent data that can not be expressed using the DC-HTML profile.

The main differences between using RDFa in XHTML and using the DC-HTML profile are:

  • RDFa supports the full RDF model, not just the particular subset supported by DC-HTML
  • RDFa introduces some new XML attributes (@about, @property, @resource, @datatype, @typeof)
  • RDFa uses a datatype called CURIE for the abbreviation of URIs; DC-HTML uses a prefixed name convention which is essentially specific to that profile (though it was also adopted by the Embedded RDF profile)
  • Perhaps most significantly, RDFa can be used anywhere in an XHTML document, so the same syntactic conventions can be used both for document metadata and for data ("about" any resources) embedded in the body of the document

I'm presenting these examples following the description set model of the DCMI Abstract Model, and in more or less the same order that the DC-HTML specification presents the same set of concepts.

For each example, I present the data:

  • using DC-Text
  • using Turtle
  • in XHTML using DC-HTML
  • in XHTML+RDFa, using meta and link elements
  • in XHTML+RDFa, using block and inline elements (to illustrate that the same data could be embedded in the body of an XHTML document, rather than only in the head)

As an aside, it is possible to use the DC-HTML profile alongside RDFa in the same document, but I haven't bothered to show that here.

Footnote: Hmmm. Considering that I said to myself at the start of the year that I was rather tired of thinking/writing about syntax, I still seem to be doing an awful lot of it! Will try to write about other things soon....

1. Literal Value Surrogates

See DC-HTML 4.5.1.2.

1.1 Plain Value String

1.1.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> . DescriptionSet ( Description ( Statement ( PropertyURI ( dc:title ) LiteralValueString ( "My World Cup 2010 Review" ) ) ) )

1.1.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> . <> dc:title "My World Cup 2010 Review" .

1.1.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://dublincore.org/documents/2008/08/04/dc-html/"> <title>My World Cup 2010 Review</title> <link rel="schema.DC" href="http://purl.org/dc/terms/" /> <meta name="DC.title" content="My World Cup 2010 Review" /> </head> </html>

1.1.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> <meta property="dc:title" content="My World Cup 2010 Review" /> </head> </html>

In this example, it would also be possible to simply add an attribute to the title element, instead of introducing the meta element:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" version="XHTML+RDFa 1.0"> <head> <title property="dc:title">My World Cup 2010 Review</title> </head> </html>

1.1.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> </head> <body> <h1 property="dc:title">My World Cup 2010 Review</h1> </body> </html>

1.2 Plain Value String with Language Tag

1.2.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> . DescriptionSet ( Description ( Statement ( PropertyURI ( dc:title ) LiteralValueString ( "My World Cup 2010 Review" Language ( en ) ) ) ) )

1.2.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> . <> dc:title "My World Cup 2010 Review"@en .

1.2.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://dublincore.org/documents/2008/08/04/dc-html/"> <title>My World Cup 2010 Review</title> <link rel="schema.DC" href="http://purl.org/dc/terms/" /> <meta name="DC.title" xml:lang="en" content="My World Cup 2010 Review" /> </head> </html>

1.2.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> <meta property="dc:title" xml:lang="en" content="My World Cup 2010 Review" /> </head> </html>

1.2.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> </head> <body> <h1 property="dc:title" xml:lang="en">My World Cup 2010 Review</h1> </body> </html>

1.3 Typed Value String

1.3.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . DescriptionSet ( Description ( Statement ( PropertyURI ( dc:modified ) LiteralValueString ( "2010-07-04" SyntaxEncodingSchemeURI ( xsd:date ) ) ) ) )

1.3.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <> dc:modified "2010-07-04"^^xsd:date .

1.3.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://dublincore.org/documents/2008/08/04/dc-html/"> <title>My World Cup 2010 Review</title> <link rel="schema.DC" href="http://purl.org/dc/terms/" /> <link rel="schema.XSD" href="http://www.w3.org/2001/XMLSchema#" > <meta name="DC.modified" scheme="XSD.date" content="2010-07-04" /> </head> </html>

1.3.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> <meta property="dc:modified" datatype="xsd:date" content="2010-07-04" /> </head> </html>

1.3.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> </head> <body> <p>Date last modified: <span property="dc:modified" datatype="xsd:date">2010-07-04</span> </p> </body> </html>

2. Non-Literal Value Surrogates

See DC-HTML 4.5.2.2.

2.1 Value URI

2.1.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> . @prefix ex: <http://example.org/resource/> . DescriptionSet ( Description ( Statement ( PropertyURI ( dc:subject ) ValueURI ( ex:2010_FIFA_World_Cup ) ) ) ) )

2.1.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> . @prefix ex: <http://example.org/resource/> . <> dc:subject ex:2010_FIFA_World_Cup .

2.1.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://dublincore.org/documents/2008/08/04/dc-html/"> <title>My World Cup 2010 Review</title> <link rel="schema.DC" href="http://purl.org/dc/terms/" /> <link rel="DC.subject" href="http://example.org/resource/2010_FIFA_World_Cup" /> </head> </html>

2.1.4 XHTML+RDFa using meta and link

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> <link rel="dc:subject" href="http://example.org/resource/2010_FIFA_World_Cup" /> </head> </html>

2.1.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> </head> <body> <p>About: <a rel="dc:subject" href="http://example.org/resource/2010_FIFA_World_Cup"> The 2010 World Cup </a> </p> </body> </html>

2.2 Value URI with Plain Value String

2.2.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> . @prefix ex: <http://example.org/resource/> . DescriptionSet ( Description ( Statement ( PropertyURI ( dc:subject ) ValueURI ( ex:2010_FIFA_World_Cup ) ValueString ( "2010 FIFA World Cup" ) ) ) ) )

2.2.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> . @prefix ex: <http://example.org/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <> dc:subject ex:2010_FIFA_World_Cup . ex:2010_FIFA_World_Cup rdf:value "2010 FIFA World Cup" .

2.2.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://dublincore.org/documents/2008/08/04/dc-html/"> <title>My World Cup 2010 Review</title> <link rel="schema.DC" href="http://purl.org/dc/terms/" /> <link rel="DC.subject" href="http://example.org/resource/2010_FIFA_World_Cup" title="2010 FIFA World Cup" /> </head> </html>

2.2.4 XHTML+RDFa using meta and link

Here the single DCAM statement is made up of two RDF triples, and in RDFa both a link and a meta element are used:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:ex="http://example.org/resource/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> <link rel="dc:subject" href="http://example.org/resource/2010_FIFA_World_Cup" /> <meta about="[ex:2010_FIFA_World_Cup]" property="rdf:value" content="2010 FIFA World Cup" /> </head> </html>

2.2.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:ex="http://example.org/resource/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> </head> <body> <p>About: <a rel="dc:subject" href="http://example.org/resource/2010_FIFA_World_Cup"> <span property="rdf:value">2010 FIFA World Cup</span> </a> </p> </body> </html>

2.3 Value URI with Plain Value String with Language Tag

2.3.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> . @prefix ex: <http://example.org/resource/> . DescriptionSet ( Description ( Statement ( PropertyURI ( dc:subject ) ValueURI ( ex:2010_FIFA_World_Cup ) ValueString ( "2010 FIFA World Cup" Language ( en ) ) ) ) ) )

2.3.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> . @prefix ex: <http://example.org/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . <> dc:subject ex:2010_FIFA_World_Cup . ex:2010_FIFA_World_Cup rdf:value "2010 FIFA World Cup"@en .

2.3.3 XHTML using DC-HTML:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head profile="http://dublincore.org/documents/2008/08/04/dc-html/"> <title>My World Cup 2010 Review</title> <link rel="schema.DC" href="http://purl.org/dc/terms/" /> <link rel="DC.subject" href="http://example.org/resource/2010_FIFA_World_Cup" xml:lang="en" title="2010 FIFA World Cup" /> </head> </html>

2.3.4 XHTML+RDFa using meta and link

Again, the single DCAM statement is made up of two RDF triples, and in RDFa both a link and a meta element are used:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:ex="http://example.org/resource/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> <link rel="dc:subject" href="http://example.org/resource/2010_FIFA_World_Cup" /> <meta about="[ex:2010_FIFA_World_Cup]" property="rdf:value" xml:lang="en" content="2010 FIFA World Cup" /> </head> </html>

With RDFa, multiple value strings might be provided, using multiple meta elements (which is not supported in DC-HTML):

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:ex="http://example.org/resource/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> <link rel="dc:subject" href="http://example.org/resource/2010_FIFA_World_Cup" /> <meta about="[ex:2010_FIFA_World_Cup]" property="rdf:value" xml:lang="en" content="2010 FIFA World Cup" /> <meta about="[ex:2010_FIFA_World_Cup]" property="rdf:value" xml:lang="es" content="Copa Mundial de Fútbol de 2010" /> </head> </html>

2.3.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> </head> <body> <p>About: <a rel="dc:subject" href="http://example.org/resource/2010_FIFA_World_Cup"> <span property="rdf:value" xml:lang="en">2010 FIFA World Cup</span> </a> </p> </body> </html>

2.4 Value URI with Typed Value String

2.4.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> . @prefix ex: <http://example.org/resource/> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . DescriptionSet ( Description ( Statement ( PropertyURI ( dc:language ) ValueURI ( ex:English ) ValueString ( "en" SyntaxEncodingSchemeURI ( xsd:language ) ) ) ) )

2.4.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> . @prefix ex: <http://example.org/resource/> . @prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> . @prefix xsd: <http://www.w3.org/2001/XMLSchema#> . <> dc:language ex:English . ex:English rdf:value "en"^^xsd:language .

2.4.3 XHTML using DC-HTML:

Not supported by DC-HTML.

2.4.4 XHTML+RDFa using meta and link

Again, the single DCAM statement is made up of two RDF triples, and in RDFa both a link and a meta element are used:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:ex="http://example.org/resource/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> <link rel="dc:language" href="http://example.org/resource/English" /> <meta about="[ex:English]" property="rdf:value" datatype="xsd:language" content="en" /> </head> </html>

2.4.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#" xmlns:xsd="http://www.w3.org/2001/XMLSchema#" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> </head> <body> <p>Language: <a rel="dc:language" href="http://example.org/resource/English"> <span property="rdf:value" datatype="xsd:language" content="en">English</span> </a> </p> </body> </html>

2.5 Value URI with Vocabulary Encoding Scheme URI

2.5.1 DC-Text:

@prefix dc: <http://purl.org/dc/terms/> . @prefix ex: <http://example.org/resource/> . DescriptionSet ( Description ( Statement ( PropertyURI ( dc:subject ) ValueURI ( ex:2010_FIFA_World_Cup ) VocabularyEncodingSchemeURI ( ex:MyScheme ) ) ) ) )

2.5.2 Turtle:

@prefix dc: <http://purl.org/dc/terms/> . @prefix dcam: <http://purl.org/dc/dcam/> . @prefix ex: <http://example.org/resource/> . <> dc:subject ex:2010_FIFA_World_Cup . ex:2010_FIFA_World_Cup dcam:memberOf ex:MyScheme .

2.5.3 XHTML using DC-HTML:

Not supported by DC-HTML.

2.5.4 XHTML+RDFa using meta and link

Again, the single DCAM statement is made up of two RDF triples, and in XHTML using RDFa two link elements are used:

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:dcam="http://purl.org/dc/dcam/" xmlns:ex="http://example.org/resource/" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> <link rel="dc:subject" href="http://example.org/resource/2010_FIFA_World_Cup" /> <link about="[ex:2010_FIFA_World_Cup]" rel="dcam:memberOf" href="http://example.org/resource/MyScheme" /> </head> </html>

2.5.5 XHTML+RDFa in body

<?xml version="1.0" encoding="utf-8" ?> <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML+RDFa 1.0//EN" "http://www.w3.org/MarkUp/DTD/xhtml-rdfa-1.dtd"> <html xmlns="http://www.w3.org/1999/xhtml" xmlns:dc="http://purl.org/dc/terms/" xmlns:dcam="http://purl.org/dc/dcam/" version="XHTML+RDFa 1.0"> <head> <title>My World Cup 2010 Review</title> </head> <body> <p>About: <a rel="dc:subject" href="http://example.org/resource/2010_FIFA_World_Cup"> <span rel="dcam:memberOf" resource="http://example.org/resource/MyScheme" /> The 2010 World Cup </a> </p> </body> </html>
Categorie: LIS, stranieri

A small GRDDL (XML, really) gotcha

Mar, 13/04/2010 - 13:25

I've written previously here about DCMI's use of an HTML meta data profile for document metadata, and the use of a GRDDL profile transformation to extract RDF triples from an XHTML document. DCMI has had made use of an HTML profile for many years, but providing a "GRDDL-enabled" version is a more recent development - and it is one which I admit I was quietly quite pleased to see put in place, as I felt it illustrated rather neatly how DCMI was trying to implement some of the "follow your nose" principles of Web Architecture.

A little while ago, I noticed that the Web-based tools which I usually use to test GRDDL processing (the W3C GRDDL service and the librdf parser demonstrator) were generating errors when I tried to process documents which reference the profile. I've posted a more detailed account of my investigations to the dc-architecture Jiscmail list, and I won't repeat them all here, but in short it comes down to the use of the entity references (&nbsp; and &copy;) in the profile document, which itself is subject to a GRDDL transformation to extract the pointer to the profile transformation.

The problem arises because XHTML defines those entity references in the XHTML DTD, i.e. externally to the document itself, and a non-validating XML processor is not required to read that DTD when parsing the document, with the consequence that it fails to resolve the references - and there's no guarantee that a GRDDL processor will employ a validating parser. There's a more extended discussion of these issues in a post by Lachlan Hunt from 2005 which concludes:

Character entity references can be used in HTML and in XML; but for XML, other than the 5 predefined entities, need to be defined in a DTD (such as with XHTML and MathML). The 5 predefined entities in XML are: &amp;, &lt;, &gt;, &quot; and &apos;. Of these, you should note that &apos; is not defined in HTML. The use of other entities in XML requires a validating parser, which makes them inherently unsafe for use on the web. It is recommended that you stick with the 5 predefined entity references and numeric character references, or use a Unicode encoding.

And the GRDDL specification itself cautions :

Document authors, particularly XHTML document authors, who wish their documents to be unambiguous when used with GRDDL should avoid dependencies on an external DTD subset; specifically:

  • Explicitly include the XHTML namespace declaration in an XHTML document, or an appropriate namespace in an XML document.
  • Avoid use of entity references, except those listed in section 4.6 of the XML specification.
  • And, more generally, follow the rules listed for the standalone document validity constraint.

A note will be added to the DC-HTML profile document to emphasise this point (and the offending references removed).

I guess I was surprised that no-one else had reported the error, particularly as it potentially affects the processing of all instance documents. The fact that they hadn't does rather lends weight to the suspicion that I voiced here a few weeks ago that it may well be that few implementers are actually making use of the DC-HTML GRDDL profile transformation.

Categorie: LIS, stranieri