Coyle's InFormation
Comments on the digital age, which, as we all know, is 42.Karen Coylehttp://www.blogger.com/profile/02519757456533839003noreply@blogger.comBlogger186125
Aggiornato: 39 min 36 sec fa
SkyRiver Sues OCLC over Anti-Trust
(Full document now here! Thanks Marshall Breeding!)
The newly created competitor to OCLC's cataloging services, SkyRiver, is suing OCLC in federal court in San Francisco. (Press release, PDF) I have only seen the press release, so until someone figures out how to free up the actual legal document, what we know is:
SkyRiver is claiming that OCLC is attempting to "monopolize the the markets for cataloging services, interlibrary lending, and bibliographic data, and attempting to monopolize the market for integrated library systems, by anticompetitive and exclusionary practices." The press release refers to OCLC's "tax-free profits," and that OCLC has used those profits to purchase 14 for-profit companies.
The press release quotes Leslie Straus, President of SkyRiver, as saying:
“In the process OCLC has punished its own members who have tried to seek out lower cost alternatives like SkyRiver.”Which undoubtedly refers to the Michigan State issue, which I reported on here. In that case, OCLC appears to charge MSU an unusually large fee for uploading records to WorldCat after MSU began cataloging on SkyRiver instead of OCLC.
Undoubtedly, a good part of the concern here is over OCLC's plans to provide Web services that comprise the full functionality of an integrated library system (ILS), thus competing with current ILS vendors. You probably know that SkyRiver was started by Jerry Kline, owner of Innovative Interfaces. If OCLC successfully launches a full-service option for libraries, Innovative and other ILS's will suffer. As the representative of a major ILS company explained to me a few years ago, the library market is a zero-sum game: every time one vendor wins, others must lose, because the number of customers is not growing. The library market is a pie that can be divided into any number of slices, but the pie remains the same. This makes the rise of any one company a threat to all. In the commercial marketplace, the vendors compete over functionality and price. With its non-profit status OCLC has a distinct advantage: it doesn't pay federal income tax on the revenues it brings in. That said, given its size and depth of its involvement in day-to-day library operations, it is plausible that even without its non-profit status OCLC would be a formidable competitor for ILS vendors.
I cannot comment on the charges of anti-trust because the press release does not give enough information. Hopefully we will get more details about this suit in the near future.
The newly created competitor to OCLC's cataloging services, SkyRiver, is suing OCLC in federal court in San Francisco. (Press release, PDF) I have only seen the press release, so until someone figures out how to free up the actual legal document, what we know is:
SkyRiver is claiming that OCLC is attempting to "monopolize the the markets for cataloging services, interlibrary lending, and bibliographic data, and attempting to monopolize the market for integrated library systems, by anticompetitive and exclusionary practices." The press release refers to OCLC's "tax-free profits," and that OCLC has used those profits to purchase 14 for-profit companies.
The press release quotes Leslie Straus, President of SkyRiver, as saying:
“In the process OCLC has punished its own members who have tried to seek out lower cost alternatives like SkyRiver.”Which undoubtedly refers to the Michigan State issue, which I reported on here. In that case, OCLC appears to charge MSU an unusually large fee for uploading records to WorldCat after MSU began cataloging on SkyRiver instead of OCLC.
Undoubtedly, a good part of the concern here is over OCLC's plans to provide Web services that comprise the full functionality of an integrated library system (ILS), thus competing with current ILS vendors. You probably know that SkyRiver was started by Jerry Kline, owner of Innovative Interfaces. If OCLC successfully launches a full-service option for libraries, Innovative and other ILS's will suffer. As the representative of a major ILS company explained to me a few years ago, the library market is a zero-sum game: every time one vendor wins, others must lose, because the number of customers is not growing. The library market is a pie that can be divided into any number of slices, but the pie remains the same. This makes the rise of any one company a threat to all. In the commercial marketplace, the vendors compete over functionality and price. With its non-profit status OCLC has a distinct advantage: it doesn't pay federal income tax on the revenues it brings in. That said, given its size and depth of its involvement in day-to-day library operations, it is plausible that even without its non-profit status OCLC would be a formidable competitor for ILS vendors.
I cannot comment on the charges of anti-trust because the press release does not give enough information. Hopefully we will get more details about this suit in the near future.
Categorie: LIS, stranieri
Catching up: OCLC, GBS, LOD
Some short comments on recurring themes:
OCLC Record Use Policy
OCLC has finalized its record use policy. The content is substantially the same as it was in the previous draft, which I commented on. There is one important improvement, however: the text clarifies OCLC's claims to copyright.
While, on behalf of its members, OCLC claims copyright rights in WorldCat as a compilation, it does not claim copyright ownership of individual records.Of course, claiming copyright and actually having the right are not the same thing, especially with databases. Here's what BitLaw says:
Databases as Compilations: Databases are generally protected by copyright law as compilations. Under the Copyright Act, a compilation is defined as a "collection and assembling of preexisting materials or of data that are selected in such a way that the resulting work as a whole constitutes an original work of authorship." 17. U.S.C. § 101. Generally, carefully selected compilations may make the "original work of authorship" cut; I'm not convinced that a union catalog of library holdings does.
Google Books
We are still waiting to hear from the judge in the Google Books case. (Every time I write that I check to see if it hasn't been released in the last hour.) Meanwhile, GBS continues to function in Internet time. Google has many publishers on board with its partners program, enough that GBS is becoming a serious rival to Amazon. It has even announced that it will begin selling e-books. The opening screen is the exact opposite of the Google Search screen -- it loads up many dozens of book covers and requires significant scrolling to browse to the bottom. Google has added personalization options ("my library") and lets you create multiple "shelves" to organize your materials.
Google was first sued in 2005. Five years is a very long time where technology is concerned. In 2005 the ebook was considered dead; now with the Kindle and the iPad, ebooks are alive and well and everyone is trying to get into that game. In that time since 2005, Google has pretty much shown the publishing industry that they can benefit from the online presence that Google is providing. The settlement reads like it was written in another era, trying to solve problems that may not really be considered problems today. The only issue remaining is that of orphan works, and if we could do a decent analysis of copyright holdings, I suspect that the number of orphan works would not be all that large.
Linked Library Data
At ALA there was a one-day preconference on linked data, and a half day un-conference attended by about 50 people. There are notes from the un-conference, which broke out barcamp-style into 6 groups for discussion.
The World Wide Web consortium has an incubator group on linked library data. This group is tasked to spend one year figuring out how to jump-start the creation of linked data in the library world.
There are ongoing efforts at Library of Congress to produce vocabularies, and of course the RDA vocabularies are available (and almost finalized). Ross Singer has announced some of the MARC codes are available (I presume on his own site). FRBR is being defined in linked data form by IFLA.
We've got just about everything but ... linked data. I'm thrilled that things are moving forward, but frustrated that I still can't see usable results. Deep breath; patience.
OCLC Record Use Policy
OCLC has finalized its record use policy. The content is substantially the same as it was in the previous draft, which I commented on. There is one important improvement, however: the text clarifies OCLC's claims to copyright.
While, on behalf of its members, OCLC claims copyright rights in WorldCat as a compilation, it does not claim copyright ownership of individual records.Of course, claiming copyright and actually having the right are not the same thing, especially with databases. Here's what BitLaw says:
Databases as Compilations: Databases are generally protected by copyright law as compilations. Under the Copyright Act, a compilation is defined as a "collection and assembling of preexisting materials or of data that are selected in such a way that the resulting work as a whole constitutes an original work of authorship." 17. U.S.C. § 101. Generally, carefully selected compilations may make the "original work of authorship" cut; I'm not convinced that a union catalog of library holdings does.
Google Books
We are still waiting to hear from the judge in the Google Books case. (Every time I write that I check to see if it hasn't been released in the last hour.) Meanwhile, GBS continues to function in Internet time. Google has many publishers on board with its partners program, enough that GBS is becoming a serious rival to Amazon. It has even announced that it will begin selling e-books. The opening screen is the exact opposite of the Google Search screen -- it loads up many dozens of book covers and requires significant scrolling to browse to the bottom. Google has added personalization options ("my library") and lets you create multiple "shelves" to organize your materials.
Google was first sued in 2005. Five years is a very long time where technology is concerned. In 2005 the ebook was considered dead; now with the Kindle and the iPad, ebooks are alive and well and everyone is trying to get into that game. In that time since 2005, Google has pretty much shown the publishing industry that they can benefit from the online presence that Google is providing. The settlement reads like it was written in another era, trying to solve problems that may not really be considered problems today. The only issue remaining is that of orphan works, and if we could do a decent analysis of copyright holdings, I suspect that the number of orphan works would not be all that large.
Linked Library Data
At ALA there was a one-day preconference on linked data, and a half day un-conference attended by about 50 people. There are notes from the un-conference, which broke out barcamp-style into 6 groups for discussion.
The World Wide Web consortium has an incubator group on linked library data. This group is tasked to spend one year figuring out how to jump-start the creation of linked data in the library world.
There are ongoing efforts at Library of Congress to produce vocabularies, and of course the RDA vocabularies are available (and almost finalized). Ross Singer has announced some of the MARC codes are available (I presume on his own site). FRBR is being defined in linked data form by IFLA.
We've got just about everything but ... linked data. I'm thrilled that things are moving forward, but frustrated that I still can't see usable results. Deep breath; patience.
Categorie: LIS, stranieri
FRBR and Sharability
One of the possible advantages to using FRBR as a bibliographic model is that it can provide us with sharable bits in the form of the defined entities. I've been working on creating a test set of records to illustrate some linked data concepts, and so I began thinking about how the data would break out into sharable units. It turns out to be... an interesting question.
Work
Let's start with the Work, which I believe many people have high hopes for. I have a book in hand which I will use for this illustration. Because this is a book, there are only a few possible data elements in the Work, and these are:
Title of the work: Mort
Preferred title for the work: Mort
Date of work: 1987
Place of origin of the work: England, UK
As you can see, there isn't a lot of information in the Work entity itself. In many cases, a cataloger will not know the date of the work, and may not know where the work was written, in which case you could have just title, and the entire Work entity would be:
Title of the work: MortWhat is obviously missing here is the name of the author. That, however, is not an attribute of the Work in FRBR, but is an entity of its own, either Person, Corporate Body, or Family. It seems clear that without the name of the creator (where appropriate) the Work isn't terribly useful on its own. So I am going to add that creator from FRBR Group 2:
Work:
Title of the work: Mort
Preferred title for the work: Mort
Date of work: 1987
Place of origin of the work: England, UK
Person:
Author: Terry PratchettOK, now we are getting somewhere. We have an author and a title. This is a "unit" that someone could grab or link to and make use of. They aren't really separable, which is what puzzles me a bit about FRBR. It's not like you could re-use this Work for another book with the same title (and there are others with this same title). It's only the Work by Terry Pratchett that this Work entity can represent. As far as I am concerned, the creator entity and the work entity are inseparable in the description of a work. A creator can be associated with many works, but Work cannot be re-used with different creators. Once the creator(s) of the Work are defined, that relationship is fixed as part of the identity of the Work.
We could leave Work as it is here, but if you want to include subject headings in your sharing, they need to be included in the shared Work, because subject headings in FRBR are only associated with the Work. Given that, our sharable Work becomes:
Work:
Title of the work: Mort
Preferred title for the work: Mort
Date of work: 1987
Place of origin of the work: England, UK
Person:
Author: Terry Pratchett
Subject:
Topic: Fantasy fiction, English
Topic: Discworld (Imaginary place) -- Fiction
This is the unit that needs to be created so we can share Works.
Expression
Now let's move on to the Expression, the real bugbear of FRBR. For books, Expression has few data elements. In this case we have:
Date of expression: 1987
Language of expression: English
All perfectly fine and well, but clearly not something that can stand alone. Similar to Work, this expression is not usable with just any English language work written in 1987 -- it's not sharable in that sense. This Expression must be associated irrevocably with a particular Work, in this case the Work we created above. There will be some link that essentially says:
E:identifier --> expresses --> W:identifierSecond thought: Expression can also have an important creator/agent role, such as translator, editor, adaptor -- and possibly others related to music that I'm not knowledgeable about -- so it, too, should include those for sharing. In fact, probably all of the Group2 to Group1 relationships need to be included in a sharing situation. So we get:
Expression
Date of expression: 1987
Language of expression: French
Person
Translator: J-P SartreThe unit of sharing here must be the expanded Expression plus the expanded Work (with Group2 and Group3 entities). This illustrates something that has bothered me a bit about the Group1 FRBR entities, which is the dependency inherent in the hierarchy WEMI. WEMI essentially must be created as a single thing with multiple parts. This is true even of the Manifestation.
Manifestation
The Manifestation is seemingly the richest and therefore the most independent of the FRBR Group1 entities, but as we'll see, without the Work and Expression you do not get a useful set of data elements. Here is what we have for our Manifestation:
Title proper: Mort
Statement of responsibility: Terry Pratchett
Title proper of series: Discworld
Date of publication: 2001
Copyright date: 1987
Place of publication: New York, NY
Publisher's name: HarperTorch
Extent of text: 243 pages
Dimensions: 17 cm
Carrier type: volume
Mode of issuance: single unit
Media type: unmediatedWhat is lacking here? Well, there's no link to the entity for the author, which would provide an identification of the author and any variant forms of the author's name. There's no language of text, because that's in the Expression. And there are no subject headings, because those are associated with the Work. If this were a translation, there would be no link to the Work in the original title. The Manifestation entity is very readable, but if we are sharing for the purposes of copy cataloging, it has to be bundled with the Work and Expression to be usable.
Our Sharable Units
So this is what we get as sharable units:
Now we just need a system to test this out.
Work
Let's start with the Work, which I believe many people have high hopes for. I have a book in hand which I will use for this illustration. Because this is a book, there are only a few possible data elements in the Work, and these are:
Title of the work: Mort
Preferred title for the work: Mort
Date of work: 1987
Place of origin of the work: England, UK
As you can see, there isn't a lot of information in the Work entity itself. In many cases, a cataloger will not know the date of the work, and may not know where the work was written, in which case you could have just title, and the entire Work entity would be:
Title of the work: MortWhat is obviously missing here is the name of the author. That, however, is not an attribute of the Work in FRBR, but is an entity of its own, either Person, Corporate Body, or Family. It seems clear that without the name of the creator (where appropriate) the Work isn't terribly useful on its own. So I am going to add that creator from FRBR Group 2:
Work:
Title of the work: Mort
Preferred title for the work: Mort
Date of work: 1987
Place of origin of the work: England, UK
Person:
Author: Terry PratchettOK, now we are getting somewhere. We have an author and a title. This is a "unit" that someone could grab or link to and make use of. They aren't really separable, which is what puzzles me a bit about FRBR. It's not like you could re-use this Work for another book with the same title (and there are others with this same title). It's only the Work by Terry Pratchett that this Work entity can represent. As far as I am concerned, the creator entity and the work entity are inseparable in the description of a work. A creator can be associated with many works, but Work cannot be re-used with different creators. Once the creator(s) of the Work are defined, that relationship is fixed as part of the identity of the Work.
We could leave Work as it is here, but if you want to include subject headings in your sharing, they need to be included in the shared Work, because subject headings in FRBR are only associated with the Work. Given that, our sharable Work becomes:
Work:
Title of the work: Mort
Preferred title for the work: Mort
Date of work: 1987
Place of origin of the work: England, UK
Person:
Author: Terry Pratchett
Subject:
Topic: Fantasy fiction, English
Topic: Discworld (Imaginary place) -- Fiction
This is the unit that needs to be created so we can share Works.
Expression
Now let's move on to the Expression, the real bugbear of FRBR. For books, Expression has few data elements. In this case we have:
Date of expression: 1987
Language of expression: English
All perfectly fine and well, but clearly not something that can stand alone. Similar to Work, this expression is not usable with just any English language work written in 1987 -- it's not sharable in that sense. This Expression must be associated irrevocably with a particular Work, in this case the Work we created above. There will be some link that essentially says:
E:identifier --> expresses --> W:identifierSecond thought: Expression can also have an important creator/agent role, such as translator, editor, adaptor -- and possibly others related to music that I'm not knowledgeable about -- so it, too, should include those for sharing. In fact, probably all of the Group2 to Group1 relationships need to be included in a sharing situation. So we get:
Expression
Date of expression: 1987
Language of expression: French
Person
Translator: J-P SartreThe unit of sharing here must be the expanded Expression plus the expanded Work (with Group2 and Group3 entities). This illustrates something that has bothered me a bit about the Group1 FRBR entities, which is the dependency inherent in the hierarchy WEMI. WEMI essentially must be created as a single thing with multiple parts. This is true even of the Manifestation.
Manifestation
The Manifestation is seemingly the richest and therefore the most independent of the FRBR Group1 entities, but as we'll see, without the Work and Expression you do not get a useful set of data elements. Here is what we have for our Manifestation:
Title proper: Mort
Statement of responsibility: Terry Pratchett
Title proper of series: Discworld
Date of publication: 2001
Copyright date: 1987
Place of publication: New York, NY
Publisher's name: HarperTorch
Extent of text: 243 pages
Dimensions: 17 cm
Carrier type: volume
Mode of issuance: single unit
Media type: unmediatedWhat is lacking here? Well, there's no link to the entity for the author, which would provide an identification of the author and any variant forms of the author's name. There's no language of text, because that's in the Expression. And there are no subject headings, because those are associated with the Work. If this were a translation, there would be no link to the Work in the original title. The Manifestation entity is very readable, but if we are sharing for the purposes of copy cataloging, it has to be bundled with the Work and Expression to be usable.
Our Sharable Units
So this is what we get as sharable units:
- Work + Group 2 (creator) + Group 3 (subject)
- Expression + Group2 (creator) + Work + Group 2 (creator) + Group 3 (subject)
- Manifestation + Expression + Group2 (creator) + Work + Group 2 (creator) + Group 3 (subject)
Now we just need a system to test this out.
Categorie: LIS, stranieri
Bib data and the Semantic Web
I know that I've gone on and on about transforming bibliographic data into a semantic web format. And whenever folks have asked me: "What will it look like?" I haven't had a good response. Now there is something to show you: Freebase.
Freebase is a database of interlinked semantic web "statements": essentially what are called by the SemWeb types as "triples." The statements come from a variety of open data sources such as Wikipedia, TVDB.com, a science fiction fan database, and Open Library. By placing a user interface over these data they now have a searchable, navigable site that can link books to movies to (theoretically) music to science to... well, anything where linked data is available.
Their book data isn't as strong as it should be, given that they claim to have imported the Open Library file (I suspect it was only partially imported). When you look at the Freebase entry for Emily Dickinson you only see two works listed. Open library has 137 Works for Dickinson, and WorldCat Identities lists 3, 388. Also, their approach is more "popular" than rigorous. However, there is no reason why this same technique could not be used with "pure" library data, and library catalogs could make use of any of the data in such a database because it is all available through linking and APIs. A database like Freebase essentially serves as a huge pot of available, re-usable information.
In its current form, Freebase would not be sufficient for library data sharing, although it could provide an interesting testing ground. What we need to work out for libraries is a way to version and source content so that you know who provided each statement and when, and to make it easy to contribute new information or improvements to the information in a sensible and automated way. There is no reason why we could not create a "LibBase" that exists solely of what libraries would consider to be authoritative information; a kind of linked data WorldCat. That data would have to be able to interact with other data on the Web, and by doing so libraries would become discoverable on the Web. It would be logical for projects like Freebase to link to the library data. Library users would have a rich, navigable information base that could help them follow (or even make) connections between library resources -- connections that are much less evident in today's catalogs. Some technical magic would need to occur to allow users to move seamlessly from the whole world to their local library, but I don't think that's going to take rocket science to solve.
There is a group of interested souls planning to get together on the Friday morning of ALA DC to begin some exploration of how we might make semantic web technology work for libraries. There will be announcements on various lists (I'm guessing NGC4LIB, CODE4LIB, LITA-L and RDA-L, a the very least). If you can get to ALA a little early, please mark that slot on your calendar. It'll be a free-floating, working, barcamp-style meeting, as I understand it.
Freebase is a database of interlinked semantic web "statements": essentially what are called by the SemWeb types as "triples." The statements come from a variety of open data sources such as Wikipedia, TVDB.com, a science fiction fan database, and Open Library. By placing a user interface over these data they now have a searchable, navigable site that can link books to movies to (theoretically) music to science to... well, anything where linked data is available.
Their book data isn't as strong as it should be, given that they claim to have imported the Open Library file (I suspect it was only partially imported). When you look at the Freebase entry for Emily Dickinson you only see two works listed. Open library has 137 Works for Dickinson, and WorldCat Identities lists 3, 388. Also, their approach is more "popular" than rigorous. However, there is no reason why this same technique could not be used with "pure" library data, and library catalogs could make use of any of the data in such a database because it is all available through linking and APIs. A database like Freebase essentially serves as a huge pot of available, re-usable information.
In its current form, Freebase would not be sufficient for library data sharing, although it could provide an interesting testing ground. What we need to work out for libraries is a way to version and source content so that you know who provided each statement and when, and to make it easy to contribute new information or improvements to the information in a sensible and automated way. There is no reason why we could not create a "LibBase" that exists solely of what libraries would consider to be authoritative information; a kind of linked data WorldCat. That data would have to be able to interact with other data on the Web, and by doing so libraries would become discoverable on the Web. It would be logical for projects like Freebase to link to the library data. Library users would have a rich, navigable information base that could help them follow (or even make) connections between library resources -- connections that are much less evident in today's catalogs. Some technical magic would need to occur to allow users to move seamlessly from the whole world to their local library, but I don't think that's going to take rocket science to solve.
There is a group of interested souls planning to get together on the Friday morning of ALA DC to begin some exploration of how we might make semantic web technology work for libraries. There will be announcements on various lists (I'm guessing NGC4LIB, CODE4LIB, LITA-L and RDA-L, a the very least). If you can get to ALA a little early, please mark that slot on your calendar. It'll be a free-floating, working, barcamp-style meeting, as I understand it.
Categorie: LIS, stranieri
Social aspects of subject headings
You've probably played the "my favorite subject heading" game when geeking out with librarian friends. Here's some additional fuel in case you've run out of zingers.
The Open Library takes the LC subject headings and breaks them apart at the subfield level into subjects, persons, places, genres, and times. It also includes some BISAC headings retrieved from Amazon, so the subject list is not "pure." The separate subject entries obtained are similar to, but not the same as, OCLC's FAST headings, and look much like some facets that appear in library catalogs.
The Open Library database currently holds about 24 million records for books (at least partially de-duped). In a recent dump of subjects, the total number of different subjects came out as 1,278,539. Of those, 336,638 were of the "topical" variety, that is either a 650 $a or a 65X $x. The top 25 are as follows:
825168 History
322928 Biography
212822 Politics and government
206519 Congresses
192968 History and criticism
184183 Fiction
123838 Law and legislation
119333 Bibliography
95555 Juvenile literature
93364 Description and travel
90866 Economic conditions
84787 Criticism and interpretation
74878 Claims
71468 Social life and customs
70926 Social conditions
70563 Catalogs
69205 Private Bills
69191 Private bills
66480 Education
63410 Exhibitions
63301 World War, 1939-1945
60235 Foreign relations
60068 Philosophy
56219 Dictionaries
55460 Study and teaching
I find it interesting that with the exception of "World War, 1939-1945" these appear to have the function of qualifiers, and I'm thinking that it would be interesting to contrast the $a and $x terms. My guess is that these are $x, but that not all $x are of this nature.
Of the subfields, 164,342 appear only once in the database. These are a great source of interesting an unusual headings, including "Social aspects of adzes" and "Deer as pets." In fact, the "Social aspects...." tail is so amusing that I have made a file of those with a count of 1.
The full file of topical subjects is 8 megabytes, but can probably yield innumerable hours of library cocktail hour amusement. (text in format "count - tab - subject") I will also look into names, organizations, places and times as subjects.
The Open Library takes the LC subject headings and breaks them apart at the subfield level into subjects, persons, places, genres, and times. It also includes some BISAC headings retrieved from Amazon, so the subject list is not "pure." The separate subject entries obtained are similar to, but not the same as, OCLC's FAST headings, and look much like some facets that appear in library catalogs.
The Open Library database currently holds about 24 million records for books (at least partially de-duped). In a recent dump of subjects, the total number of different subjects came out as 1,278,539. Of those, 336,638 were of the "topical" variety, that is either a 650 $a or a 65X $x. The top 25 are as follows:
825168 History
322928 Biography
212822 Politics and government
206519 Congresses
192968 History and criticism
184183 Fiction
123838 Law and legislation
119333 Bibliography
95555 Juvenile literature
93364 Description and travel
90866 Economic conditions
84787 Criticism and interpretation
74878 Claims
71468 Social life and customs
70926 Social conditions
70563 Catalogs
69205 Private Bills
69191 Private bills
66480 Education
63410 Exhibitions
63301 World War, 1939-1945
60235 Foreign relations
60068 Philosophy
56219 Dictionaries
55460 Study and teaching
I find it interesting that with the exception of "World War, 1939-1945" these appear to have the function of qualifiers, and I'm thinking that it would be interesting to contrast the $a and $x terms. My guess is that these are $x, but that not all $x are of this nature.
Of the subfields, 164,342 appear only once in the database. These are a great source of interesting an unusual headings, including "Social aspects of adzes" and "Deer as pets." In fact, the "Social aspects...." tail is so amusing that I have made a file of those with a count of 1.
The full file of topical subjects is 8 megabytes, but can probably yield innumerable hours of library cocktail hour amusement. (text in format "count - tab - subject") I will also look into names, organizations, places and times as subjects.
Categorie: LIS, stranieri
OCLC record use policy
OCLC has issued a new draft of its record use policy for member comment. As others have remarked, while better worded and seemingly less draconian than the previous policy (the one that was withdrawn) the substance has not changed one iota. There are many things wrong with the policy itself, but the primary problem with it is not the text of the policy but the way that OCLC has chosen to define the problem it is trying to solve. Here are some of the issues I have with the approach:
1. Pushing the river
The central issue is that OCLC wants to limit downstream use of bibliographic data that is stored in WorldCat. This simply cannot be done. The same data is also stored in individual library catalogs, some union or consortial catalogs, and in bibliographic software used by many hundreds of thousands of researchers around the world. It also often closely resembles data created outside of OCLC's sphere, such as through publisher and retailer channels. Sharing of this data is absolutely necessary for the furtherance of intellectual pursuits and scientific progress, as well as the market for new and used items. Ironically, the policy would restrict use of the data by OCLC members without restricting its use by the multitude of non-members. It would be unacceptable even if it were workable, which it isn't.
2. One-sided
The policy has a section on member rights and responsibilities, but no such section on OCLC's rights and responsibilities. (Nope, I was wrong about that. The section does exist, I must have missed it.) The policy carries the assumption that, if anything, members are the problem, OCLC the solution, and gives no sense of the policy being the result of an agreement between the parties. OCLC can make unilateral decisions about record use, such as its agreement with Google, but members must ask permission of OCLC for many uses. There is nothing here that acknowledges that there could be a situation where the interests of a library and the interests of OCLC are in conflict, nor how that would be resolved. All-in-all, it reads as if the purpose of membership were to sustain OCLC (instead of the purpose of OCLC being to support libraries).
3. Transparency
OCLC, or one of OCLC's governing groups, will make decisions. Yet there are no criteria given for making these decisions, no timelines, no reporting back to members, no mechanism for feedback. Will members know how "their" WorldCat records are being used? Will they have any choice in the matter? Will there be a way to know what requests for use have come in to OCLC, which ones have been accepted, which turned down? If WorldCat is such a "community good" shouldn't the community at least have this information about the use of that good?
4. No options
In most agreements there is some give and take. If you do X, you will get Y. The OCLC record use policy does not give members options. An example of an option would be: if you do your cataloging on OCLC, ILL will cost you $X; if you do not do your cataloging on OCLC, uploading your records will cost you $Y and ILL will cost you $Z. With clear options, libraries can decide what is best for them in their particular situation. Without clear options libraries have no way to make rational decisions about their participation in OCLC. It's not a religion, it's a business relationship, and it should be treated like one.
5. Avoids facing the problem
The problem that OCLC is trying to fix arises, as far as I can tell, because of OCLC's particular mix of costs and expenses. Most of the revenue comes in to OCLC from its cataloging service, so having members choose to catalog elsewhere is the problem. Exhorting members to keep their records in their databases so that others cannot create a large database of bibliographic data is not a solution to this problem. Large bibliographic databases do and will exist. If their existence is a threat to OCLC, then the jig is already up. Rather than stew about what others are doing with bibliographic data, OCLC needs to find a balance of income and revenue that meets the needs of its member libraries, and that might include making some hard decisions about OCLC services.
6. Ignores market forces
If someone can do it better, cheaper, more conveniently, why should libraries stick with OCLC as their vendor? For the purchase of materials or library systems or other services, libraries move to new vendors when they see advantages. With the economic downturn there is a scramble by libraries to cut costs wherever they can. No amount of loyalty to the "collective" can overcome the economic situation libraries find themselves in today. In a sense, OCLC seems to expect the libraries to act irrationally by sticking with the service even if something more economical comes along. Libraries obviously cannot afford to do this.
I cannot tell what steps OCLC's members can take at this point. The web site points to a community forum where people can post comments, but posting comments on the policy doesn't begin to solve the underlying problems as presented here. If I were a member, I think I would feel like a row boat hitching a ride behind the Titanic, hoping it will get me through the ice floes. Nothing is unsinkable, as we have unfortunately found out in the past.
1. Pushing the river
The central issue is that OCLC wants to limit downstream use of bibliographic data that is stored in WorldCat. This simply cannot be done. The same data is also stored in individual library catalogs, some union or consortial catalogs, and in bibliographic software used by many hundreds of thousands of researchers around the world. It also often closely resembles data created outside of OCLC's sphere, such as through publisher and retailer channels. Sharing of this data is absolutely necessary for the furtherance of intellectual pursuits and scientific progress, as well as the market for new and used items. Ironically, the policy would restrict use of the data by OCLC members without restricting its use by the multitude of non-members. It would be unacceptable even if it were workable, which it isn't.
2. One-sided
The policy has a section on member rights and responsibilities, but no such section on OCLC's rights and responsibilities. (Nope, I was wrong about that. The section does exist, I must have missed it.) The policy carries the assumption that, if anything, members are the problem, OCLC the solution, and gives no sense of the policy being the result of an agreement between the parties. OCLC can make unilateral decisions about record use, such as its agreement with Google, but members must ask permission of OCLC for many uses. There is nothing here that acknowledges that there could be a situation where the interests of a library and the interests of OCLC are in conflict, nor how that would be resolved. All-in-all, it reads as if the purpose of membership were to sustain OCLC (instead of the purpose of OCLC being to support libraries).
3. Transparency
OCLC, or one of OCLC's governing groups, will make decisions. Yet there are no criteria given for making these decisions, no timelines, no reporting back to members, no mechanism for feedback. Will members know how "their" WorldCat records are being used? Will they have any choice in the matter? Will there be a way to know what requests for use have come in to OCLC, which ones have been accepted, which turned down? If WorldCat is such a "community good" shouldn't the community at least have this information about the use of that good?
4. No options
In most agreements there is some give and take. If you do X, you will get Y. The OCLC record use policy does not give members options. An example of an option would be: if you do your cataloging on OCLC, ILL will cost you $X; if you do not do your cataloging on OCLC, uploading your records will cost you $Y and ILL will cost you $Z. With clear options, libraries can decide what is best for them in their particular situation. Without clear options libraries have no way to make rational decisions about their participation in OCLC. It's not a religion, it's a business relationship, and it should be treated like one.
5. Avoids facing the problem
The problem that OCLC is trying to fix arises, as far as I can tell, because of OCLC's particular mix of costs and expenses. Most of the revenue comes in to OCLC from its cataloging service, so having members choose to catalog elsewhere is the problem. Exhorting members to keep their records in their databases so that others cannot create a large database of bibliographic data is not a solution to this problem. Large bibliographic databases do and will exist. If their existence is a threat to OCLC, then the jig is already up. Rather than stew about what others are doing with bibliographic data, OCLC needs to find a balance of income and revenue that meets the needs of its member libraries, and that might include making some hard decisions about OCLC services.
6. Ignores market forces
If someone can do it better, cheaper, more conveniently, why should libraries stick with OCLC as their vendor? For the purchase of materials or library systems or other services, libraries move to new vendors when they see advantages. With the economic downturn there is a scramble by libraries to cut costs wherever they can. No amount of loyalty to the "collective" can overcome the economic situation libraries find themselves in today. In a sense, OCLC seems to expect the libraries to act irrationally by sticking with the service even if something more economical comes along. Libraries obviously cannot afford to do this.
I cannot tell what steps OCLC's members can take at this point. The web site points to a community forum where people can post comments, but posting comments on the policy doesn't begin to solve the underlying problems as presented here. If I were a member, I think I would feel like a row boat hitching a ride behind the Titanic, hoping it will get me through the ice floes. Nothing is unsinkable, as we have unfortunately found out in the past.
Categorie: LIS, stranieri