LIS, stranieri
Library Blog Awards
Omeka in the Cloud
Omeka.net will expand Omeka’s current offerings with a completely web-based service. No server or programming experience required. Similar to services offered by WordPress, the popular open-source blogging software, with the launch of Omeka.net users will be able to sign up for a free hosted Omeka site. Just create a username and password, and your online collection or exhibition is up and running.
This new hosted web service will further the Omeka project’s mission to make collections-based online publishing more accessible to small cultural heritage institutions, individual scholars, enthusiasts, educators, and students.
With Omeka.net, your online exhibit is one click away.OCLC and OIX
Common Tag
MARC 21 Update No. 11: Full and Concise available online
The changes are indicated in red in Update 11. Update 10 (October 2009) changes have also been kept in red since that update was only recently issued and 10 and 11 are being combined. Each format also has an appendix, "Format Changes for Update No. 10 (October 2009) and Update No. 11 (February 2010)" that lists the changes that comprise the combined update. The Web version of the formats is the official version and is considered the start for implementation planning for MARC 21. Users are not expected to begin using the new features in the format until 60 days from the date of this announcement: May 5, 2010. For more information about format documentation see: http://www.loc.gov/marc/status.htmlThe printed version of the update will be available through the Cataloging Distribution Service in the future. The print format update will combine Updates 10 and 11 into one update dated 2009/February 2010. The printed publications will be announced when they are ready for distribution.
Ting: collaboratively sourced library infrastructure
Mission of the library redux
The context web
more on weird OCLC business decisions
Originally posted in shorter version as a comment on a post by Karen Coyle on this issue…
The frustrating thing here is that libraries ARE willing to pay a reasonable amount to SUBMIT their holdings to an ILL service, such as OCLC’s, which (unlike their cataloging copy service) really has no competitive peers (yet…?).
Libraries get no DIRECT benefit from this — submitting holdings just means other libraries can more easily request things from YOU, and I don’t think fulfilling ILL requests is usually a profit center. Libraries are willing to do it just to serve the larger community, and out of “generalized reciprocity” where they realize that we all need to submit holdings so we call can request from each other.
Libraries ARE still willing to pay a reasonable fee to fulfill their community responsibilities to resource sharing. They’re just not willing to pay an UNREASONABLE fee, or to be ‘locked in’ to buying cataloging from a service that is not the best quality-to-price point for them, in order to continue resource sharing!
(MSU noted they pay tens of thousands of dollars for the reosurce sharing/ILL service, and are willing to keep paying that, just not an unreasonable per-record rate for loading:)
Regarding these statements, MSU’s Haka wrote, “The contention has been made that actions such as ours seek to undermine the WorldCat database. I would simply respond that the price currently quoted to upload these records into the database is the factor that should be questioned.” He also notes that the $88,500 MSU pays for resource sharing “does not seem like freeloading.”
So… you think we’ll see a SkyRiver resource sharing network too? (I guess III already has one? Maybe they’ll provide infrastructure to open it up to non-III libraries? Although III’s own reputation/history for promoting ‘lock in’ at all costs… Is SkyRiver itself open and useable by non-III libraries?)
I don’t know if OCLC’s actions are an intentional attempt at forcing ‘lock in’, or due to unfortunate lack of technical flexibility in their back-end systems.
But if the former, it’s just as likely to backfire, and cause them to lose the Resource Sharing business that libraries were perfectly happy to keep with OCLC at a reasonable price!
What I would do if I were king of OCLCWhich I am clearly not.
1) Work with SkyRiver to get OCLC numbers added to as many SkyRiver records as possible. Not share cataloging copy, but merely get a SkyRiver cataloging record to have a MARC field somewhere meaning “this record represents the same manifestation as OCLC record # N.” OCLC already has quite a bit of technical expertise with this kind of record matching, from their ‘reclamation service’. Charge SkyRiver a reasonable rate for this service, which SkyRiver is going to be willing to pay if the price is reasonable, because SkyRiver is worried about not being able to attract cataloging customers if they become locked out of OCLC resource sharing, and having OCLC numbers on the records will lead to…
2) Perhaps some of the apparently unreasonable expense of OCLC’s “just add holdings” quotes to MSU is not just an attempt to punish/enforce lock-in, but is actually because it’s expensive to load holdings from non-OCLC records, because you’ve first got to figure out what OCLC record they correspond to. But if the ‘foreign’ records have an OCLC number equivalency in them, as above, then it becomes technically must more feasible/efficient/cheap. So OCLC can offer a much more reasonable per-record price for loading holdings (not loading the records themselves, with another vendor’s copy; just holdings attachments) from records that have an OCLC equivalency in them.
OCLC gets to retain resource sharing record loading income from customers moving to other vendors for cataloging, instead of losing them entirely. OCLC gets a new revenue stream from vendors such as SkyRiver, paying to establish OCLC equivalencies on their records. SkyRiver’s happy, because their customers aren’t being ‘locked in’ to OCLC. Libraries are happy, because services have been “de-coupled” and they can choose the service at the best quality/price point for them. OCLC members who request items via the resource sharing service are happy, because their database of holdings continues to be as comprehensive as possible (and OCLC is happy that their valuable database of holdings maintains and increases in value with as many holdings as possible).
If lots of OCLC members move their cataloging away from OCLC, OCLC is still going to lose net revenue, but I think that’s inevitable at this point, the train’s already left the station. Better to establish revenue streams for their (without peer) resource network from libraries that are cataloging elsewhere, then to lose those too.
Many OCLC members are already purchasing cataloging from other vendors in addition to OCLC. Many of those records do not end up having holdings registered in OCLC. If those vendors could be brought into the fold as above, then it’s more revenue for OCLC, and more holdings in their database making their database more valuable (which is what gives value to their resource sharing service, and other services like WorldCat in the first place).
You can try to stick to the business model of 20 years ago, harming the interests of actual libraries in the process, and probably fighting a losing battle anyway. Or you can adjust to new environments with new business models. OCLC has still got a lot of valueable assets — discouraging people from supplying records to their resource sharing network (with infeasible prices) threatens to reduce the value of their assets.
In fact, even calling that a 20 year old business model might not be accurate. When OCLC still had competitors like RLG, did OCLC allow libraries who purchased cataloging copy from other sources to attach holdings for resource sharing at a reasonable cost? I’m not sure if they did or not. But if they did… what’s changed? If they did, that would make it seem like it’s less of a technical issue, and more of OCLC trying to take advantage of their (possibly short-lived) monopoly position to lock in customers. (Anti-trust issues?).
Filed under: General
MARC: from mark-up to data
The first question to be answered is: What are our data elements? In theory, this should be one of the simpler questions, but it's not. I can create a list of all of the MARC fields, subfields, and fixed field elements (which I have, and they are linked from this page of the futurelib wiki), but that doesn't answer the question. Here's why:
Indicators
The indicators in the MARC fields are like a wild card in poker -- they can be used to utterly transform the play. Some of the indicators are simple and probably can be dismissed: the non-filing indicators and the indicators that control printing. Some are data elements in themselves: "Existence in NAL collection" is essentially a binary data element. Many further refine the meaning of the field, allowing the field to carry any one of a number of related subelements:
Second - Type of ring # - Not applicable 0 - Outer ring 1 - Exclusion ringOthers name the source of the term, such as LCSH or MeSH. It'll take a fair amount of work to figure out what all of these qualifiers mean in terms of actual data elements.
Redundancy
There is non-textual (although not non-string) data in the MARC record, primarily in the fixed fields (00X) but also in some of the number and code fields (0XX). Some of these, actually most of these, are redundant with display information in the body of the record. Should these continue to be separate data elements, or can we remove this redundancy and still have useful user displays? Basically, having the same information entered in two different ways in your data is just begging for trouble and we've all seen fixed field dates and display (260 $c) dates that contradict each other.
Inconsistency
Primarily due to the constraints of the MARC format, the same information has been coded differently in different fields. A personal author entry in the 100 field uses subfields abcdejqu; in the 760 linking entry field, all of that data is entered into subfield a. It's the same data element, and by that I mean that the some contents are contained in the concatenation of abcdejqu as in a. Bringing together all of these krufty bits into a more rational data definition is something I really long for.
And of course my favorite... data buried in text
So much of our data isn't data, it's text, or it's data buried in text. My favorite example is the ISBN. Everyone knows how important the ISBN is in all kinds of bibliographic linking operations. But there isn't a place in our record for the ISBN as a data element. Instead, there is a subfield that takes the ISBN as well as other information.
020 __ |a 0812976479 (pbk.)This means that every system that processes MARC records has to have code that separates out the actual ISBN from whatever else might be in the subfield. Other buried information includes things like pagination and size or other extents:
300 __ |a 1 sound disc : |b analog, 33 1/3 rpm, stereo. ; |c 12 in.
300 __ |a 376 p. ; |c 21 cm.
Once this analysis is done (and I do need help, yes, thank you!), it may be possible to compare MARC to the RDA elements and see where we do and don't have a match. I have a drafty web page where I am putting the lists I'm creating of RDA elements, but I will try to get it all written up on the futurelib wiki so it's all in one place. I encourage others to grab this data and play with it, or to start doing whatever you think you can do with the registered RDA vocabularies. And please post your results somewhere and let me know so that I can gather it all, probably on the wiki.
The Letters Keep Coming In
The letter is addressed to "Link+™ Member Libraries and ILL Partners." The subject line on Kochan's letter reads: Threat to CSULB Library's ILL Participation. He states that faced with budget cuts, not only this year but foreseeable for many years to come, CSULB decided to move to SkyRiver™ as their cataloging utility, with anticipated significant savings.
The next three paragraphs are worth quoting in their entirety:
"We notifed OCLC of this decision, while at the same time advising them of the Library's intent to continue membership in OCLC, to continue to make use of OCLC interlibrary loan services, and to contribute records for our current and future acquisitions to OCLC for batch upload. OCLC's charge for batch upload was (until recently) popsted on the OCLC website as 23¢ per record. That is the amount I referred to in my letter to the organization. I have subsequently learned that:
- The price schedule for batch downloading [sic, read: uploading] that contained the 23¢ charge has suddenly and mysteriously disappeared from the OCLC website
- Another academic library that chose to displace OCLC with SkyRiver reports that OCLC has quoted a revised charge for downloading their records that amounts to about $2.85 per record; it is a charge that they report would effectively (and one might not think coincidentally) offset the savings accrued from their change to SkyRiver.
Offsetting the cost of having a library move to another vendor may make some economic sense, but this is a matter that will need to get cleared up before other libraries move to SkyRiver thinking that they'll be able to upload their records to OCLC for $.23. MSU and CSULB were caught be surprise, which is very unfortunate.
de-coupling of vendor services
We talk a lot about de-coupling of software, so you are not stuck with a monolithic stack of software, but can mix and match components that meet your needs.
The same thing applies to vendor services. Apparently OCLC will not really let you share your holdings for ILL with them unless you also buy your cataloging copy from them. (How much of your cataloging copy? Few OCLC members get 100% of their copy cataloging from OCLC, despite the policy fiction to the contrary).
Okay, officially they’ll “let” you, but they’ll charge you a prohibitive enough fee to make it infeasible.
Cathy De Rosa, VP for the Americas and Global VP of Marketing for OCLC, sought to clarify the situation in a conversation with LJ….De Rosa framed MSU’s request as essentially a request for the cooperative to perform “data stewardship” duties for MSU’s records, in addition to the services provided by the resource-sharing subscription, but said “that’s not the way the cost-share model works today.”
It’s not entirely clear to me how sharing your holdings for ILL (OCLC’s original Raison d’être, right?) necessarily involves “data stewardship”, but I don’t know the details of how OCLC’s systems work.
OCLC may very well be constrained here by technical limitations of their system (their own lack of ‘decoupling’ in their in-house software), not simply by a business decision. Perhaps there is no efficient way for OCLC to load your holdings if you aren’t doing copy cataloging, it ends up costing them just the same either way.
If so, that’s unfortunate, for everyone involved. It seems to me that it’s in the interest of OCLC members, OCLC as an organization, and libraries that may be considering OCLC services — to let you share your holdings for ILL even if you aren’t buying cataloging copy from OCLC. It means that OCLC’s holdings registry for ILL sharing (and for WorldCat API’s) remains competetive, remains the best around. Driving people away from sharing holdings only hurts OCLC, does it not?
Unless they think they can trap people into buying (some large percentage) of their cataloging from OCLC by this “coupling”. But that ship has already left the harbor, I bet almost all OCLC members are buying non-trivial supplementary cataloging records from other vendors, and more and more will consider cheaper alternatives for larger portions of their cataloging. If OCLC can’t compete in quality-per-dollar, they aren’t going to be able to succeed in ‘trapping’ libraries into buying cataloging anyway. If they try, they’re just going to send their OTHER services, that libraries DO want to keep paying for, down the road to ruin too.
“We share the resources, and we share the costs,” she concluded, noting that the cooperative also invests significantly in activities such as: “data stewardship, infrastructure and standards support, FRBR, controlled vocabulary services, WorldCat Identities, crosswalks between metadata formats, record enhancement, automated record delivery from vendor partners, MARC format updates, name authorities management, audience-level data work, mapping and visual data discovery, and a growing number of WorldCat APIs for linking to Web services, to name a few of the shared services.”
That’s right, and those WorldCat API’s are really great and in my opinion significantly increase the value of an OCLC full membership. But OCLC’s going to have to compete on service and cost, they’re not (I predict, I hope) going to succeed by ‘trapping’ their customers into paying for services whether they like them or not, using a ‘lock-in’ model very reminiscent of the same methods we don’t like when our ILS vendors do it.
De-coupling, as with software, so with vendor services, so with cooperatives.
I think OCLC ought to not only allow holdings-sharing without record-buying, but ought to be aggressively getting OCLC numbers on other vendors records, to make it as efficient as possible for libraries to share their holdings with OCLC regardless of where they get their cataloging. (And it needs to be said again for repetition, that there are probably few OCLC members left who do no copycataloging from anywhere but OCLC). That will keep OCLC’s holdings database the best there is, and keep their services built on that holdings database (ILL, worldcat, worldcat API’s) without any competetive peer — even if their cataloging service doesn’t remain so, as it already has not.
Filed under: General
Happy Belated Birthday Dublin Core
New Orleans
Facts on File MARC Records
marc-json
We’ve all got a lot of data in MARC (that statement making sense shows that MARC is effectively a data vocabularly, not just a transmission standard, but anyway, moving on), that we need to sling around between applications, including for many of us “next generation” discovery tools that need to index it.
Marc21 binary format is the ‘native’ marc transmission format for our data. It’s got some benefits; it’s a ‘lowest common denominator’ that systems we work with are most likely to produce and consume; it’s fairly fast to de-serialize (I was going to say ‘parse’, but ‘deserialize’ is probably more accurate for a format like Marc21).
However binary Marc21 has got some significant problems too:
- If your programming language of choice doesn’t already have a robust, well-performing, free library for serializing/deserializing Marc21, it’s kind of a bear to write one. It’s a very weird format in some ways (offset data encoded as ascii numerals?), and an overly complex data format for contemporary standards. Just because you think you have a library available doesn’t neccesarily mean that open source library is as robust or well-performing as you might hope.
- Just because an existing system (like an ILS) says it outputs Marc21 doesn’t neccesarily mean it outputs legal Marc21. If some records are structurally illegal in certain ways, they may not be de-serializable on the other end, or may take more complex and less-well-performing de-serialization code on the other end. The weirdness and complexity of the marc21 format (see above) contributes to this prevalence of non-compliant output.
- Perhaps most significantly, binary Marc21 has a maximum length. A legal marc binary Marc21 record can’t be any larger than 99999 bytes (10k). While this must have seemed larger than you’d ever want in the 1960s, currently it’s often not large enough for us — especially when you try to include ‘item’ information in a marc bib record (which isn’t standard, but is often done for various reasons).
To get around these problems, many people choose to work with MarcXML instead of binary Marc21 when they can. And MarcXML does get around the problems listed above pretty well, but involves a couple trade-offs which in some circumstances don’t matter, but in others do:
- A MarcXML file generally has a much larger file size than it’s equivalent Marc21.
- A MarcXML file is often significantly slower to deserialize than it’s equivalent Marc21.
In many cases, those issues don’t matter at all. But in some cases, they are unfortunate. (Like when you are exporting, re-indexing, and re-storing your entire multi-million-record Marc corpus).
So some people came up with the idea of marc in Json. If you can serialize marc in xml, why not do something very similar to serialize marc in Json in a standard way? Json is much more compact than XML, and typically faster to parse. While still being a standard beyond the library world (meaning there are tools to support it and validate it). And without the issues of marc21 binary including length limits.
In fact, I know of a couple people who independently had this idea of marc-json, but Bill Dueber did a little proto- mini- spec for a standard way to do marc in json, so different people writing tools can do it can be inter-operable.
I encourage anyone dealing with these issues to consider marc-json per Bill’s proto-mini-spec. I plan/hope to!
Filed under: General
Harvard Business School open access policy: Oh, the irony
So Harvard Business School has approved an open access policy. (Thanks to Nicole Engard for the alert).
Under the HBS policy, Like the previous policies, faculty agree to provide copies of their scholarly articles for distribution from the university’s DASH repository and grant the university a waivable license to distribute the articles.
This may strike some business librarians as kind of ironic. The popular Harvard Business Review (and their Case Studies) is published by Harvard Business School Publishing, which I assume is associated with the Harvard Business School.
HBR (and especially HBR Case Studies) are known to be some of the (strictest) (publishers) (around) when it comes to controlling their intellectual property. They try not to allow any library to supply a case study via Inter-Library Loan, they try to make sure one copy of a case study is purchased for every student in a class using it (no sharing a hard-copy!), etc.
So it seems a safe guess that for a Harvard Business School faculty member to get an article published in their own school’s Harvard Business Review… they’re going to have to waive their school’s open access policy. No copies of those articles going in the repo, at least not publicly accessible.
Oh, actually, I should have read the actual policy. They did think of that. The policy actually does say:
Since the policy will apply only to articles prepared for peer review, it thus does not apply to Harvard Business School Cases and Notes, or to articles written for the Harvard Business Review or other publications that are not peer-reviewed. The Dean or the Dean’s designate will waive application of the license for a particular article upon express direction by a Faculty member.
Earlier in their policy, they say “The Faculty of the Harvard Business School is committed to disseminating the fruits of its research and scholarship as widely as possible.” Is it just me, or does it seem like there’s an “except when the school, not some other publisher, makes money from it” implicitly tagged on at the end?
I think that’s what’s called “irony”, but I always get that confused with “please remind me what these policies are meant to accomplish?”
[I am curious to hear from a business grad student, faculty member, or librarian... how often is formal peer review used for business scholarship in general? If the HBR isn't peer reviewed, my guess is that most business scholarship isn't?]
Filed under: General
e-book pricing and publisher business models
Interesting article in today’s new york times on e-book pricing and publisher business models.
While the article doesn’t explicitly mention the recent Amazon/MacMillon kerfuffle, it does supply some context to understand what was going on there.
Amazon had previously been paying a wholesale price that was (probably) 50% off the publisher-set retail price of e-books. This is the standard business relationship between retailers and publishers for print books, with 50% being a typical ‘discount’ a large retailer could get (it’s possible giganto retailers like Amazon got a bigger discount, and small retailers definitely sometimes get a smaller one, can be as small as 30% or even less, or no discount at all for some titles from small publishers, making them effectively unavailable to small retailers.).
But then Amazon could actually sell the e-book for whatever price they wanted, the suggested retail price was only a suggestion. They could sell for less and reduce their own margin, or they could even sell for under their wholesale price as a “loss leader”. Again, this is all the way the traditional print book market works too.
MacMillan wanted to switch to an ‘agency’ model like the iTunes bookstore is/will use. MacMillan wanted to be able to set the retail price themselves, and then Amazon as retailing “agent” would get a fixed percentage of the sale, probably less than 50% (I think iTunes is doing 30%).
Now, depending on what prices MacMillan ends up setting for e-books, MacMillan might end up even making less per copy under the model they wanted (and eventually got) than under what Amazon wanted. All the press coverage indicated that this wasn’t really about per-unit profit for the publisher, it was about MacMillan being unhappy that Amazon was selling their e-books at too low a price, they wanted control over the retail pricing.
Why would MacMillan care about the retail price being too low, if MacMillan’s wholesale price remained the same regardless of the retail price Amazon set? Presumably they’re worried about establishing consumer expectations for the emerging e-book market.
But the NYT article cited above quotes another publisher with a slightly different concern, that might be more simpatico to the librarian book-loving set: They’re worried about cannibalizing their traditional print sales:
Another reason publishers want to avoid lower e-book prices is that print booksellers like Barnes & Noble, Borders and independents across the country would be unable to compete. As more consumers buy electronic readers and become comfortable with reading digitally, if the e-books are priced much lower than the print editions, no one but the aficionados and collectors will want to buy paper books.
“If you want bookstores to stay alive, then you want to slow down this movement to e-books,” said Mike Shatzkin, chief executive of the Idea Logical Company, a consultant to publishers. “The simplest way to slow down e-books is not to make them too cheap.”
I don’t know how much this is just meant to sound good to the book-loving public (and to bookstores and libraries used to hating on publishers), and how much this is really anyone’s actual concern. But it certainly could happen. That scenario is predicated on the assumption that consumers will eventually switch en masse to e-books and e-readers for recreational reading, and/or that e-books being cheaper than print books will be sufficient to push them there, if hardware etc keeps going like we expect.
Many in the library and book-loving world are skeptical of this premise, but I suspect it is in fact true. (I don’t love that it’s true; I have an aesthetic relationship with print books too, and also concerns about the social/political effects of abandoning them for e-books. But I suspect it’s true, love it or not.)
Now, interestingly, in the US, in many cases wholesalers “fixing” retail prices is an anti-trust problem. There are some cases where a wholesaler/manufacturer/publisher can get away with this, but in general it’s not allowed for a wholesaler to force a retailer to sell a product for a certain minimum price. Which is pretty much exactly what MacMillan just did to Amazon. I wonder if the FTC is considering it. If e-books really do continue increasing in market share to become a significant portion of the book market, we can expect the FTC will at least look into it, although I’m not a lawyer and can’t predict what they’d decide, there are some cases where a wholesaler can get away with that kind of “price fixing”.
Filed under: General