Tuesday, December 18, 2007

Definitions in RDA Scope

Originally posted to the RDA list.

The RDA scope document defines some basic concepts that presumably will be used throughout RDA. Some of these concepts it takes from the Dublin Core Abstract Model. In particular, it uses "literal value surrogate" and "non-literal value surrogate." These are defined in footnotes of the scope document as:

The term literal value surrogate is used as defined in the DCMI Abstract Model: “a value surrogate for a literal value, made up of exactly one value string (a literal that encodes the value)”.

The term non-literal value surrogate is used as defined in the DCMI Abstract Model: “a value surrogate for a non-literal value, made up of a property URI (a URI that identifies a property), zero or one value URI (a URI that identifies the non-literal value associated with the property), zero or one vocabulary encoding scheme URI (a URI that identifies the vocabulary encoding scheme of which the value is a member), zero or more value strings (literals that represent the value)”.

I found a more concise definition of this in a PPT by Lutz Maicher, University of Leipzig:

- a resource which is a non-literal value is represented by a proxy
- a resource which is a literal value is represented as literal

In the above, "literal" means a text string. So "Melville, Herman" is a literal, while "http://www.loc.gov/names/#n_79006936" is a non-literal proxy (because it points to the authority record, which is where the actual value is held).

The scope document then states:

- A label is represented by a literal value surrogate.
- A quantity is represented by a non-literal value surrogate
- A quality is represented by a non-literal value surrogate.
- A type is represented by a non-literal value surrogate
- A role is represented by a non-literal value surrogate.

However, in the element analysis in the scope document, it shows that quantities can be represented identically to labels (and I suspect that all other data types can as well). So that document has (and here there is a diagram that I cannot reproduce in email):

label
[resourceURIref] -> rda:title_proper -> [plain value string]

quantity
[resourceURIref] -> rda:extent -> [typed value string]^^[syntax encoding scheme]
- or -
[resourceURIref] -> rda:non_linear_scale -> [plain value string]

Given that the label example and the second example under quantity are structurally the same, I don't see how one can be a literal and one a non-literal.

I see two possibilities here. One is that all of the above has no real effect on the development of RDA, and therefore any errors in interpretation of the DCMI model can be ignored. The other is that the misunderstanding (which I think it is, but wait to be proven wrong) is significant, and therefore needs to be corrected as part of the development of RDA.

My gut feeling is that it is the former -- I don't see references to these definitions in the RDA text itself, and all values are treated as simple value strings. For example, dates are just text:

Record the date of the expression by giving the year or years alone.
1940 (p. 6-47 5rda_sec2349.pdf)

And quantities also seem to be just text strings as well:
46 slides
12 cm (from 5rda-parta-ch3rev.pdf)

Thus, at least as far as the RDA text is concerned, there are only literal values.

If this is not the case, would some please present the argument for a different understanding. Thank you.

6 comments:

Anonymous said...

I'm afraid this is only tangentially related to your question, Karen, but I was interested in reading any replies. I do not, however, subscribe to the RDA-L list (too much other stuff going on to try to digest the whole firehose of commentary), and the only archives I can find are humungous text files on the JSC website. Before I make a fool of myself by commenting about how a committee designing a metadata standard for the next century should probably be fully exploiting the information dissemination tools of the previous century (web-based mailing list archives, in this case -- but I didn't take on the "jester" moniker for nothing), I thought I should ask about possibility of the existence of a more usable list archive somewhere. Is there such a beast out there?

Karen Coyle said...

Peter, as far as I know those are the only archives. *sigh* In any case, I can tell you that the only comment on the list was someone stating that the language of the definitions was too opaque so there was no way of knowing what they meant. *double sigh* Not another peep. If you have any thoughts on this matter I'd love to hear them. It amazes me that people are not worried that the underlying model could be wrong. Do they think it is irrelevant? Are they willing to ignore it because it is hard to understand?

Anonymous said...

Karen

Let me make my usual disclaimer that I'm no expert on the DCAM but I'm hoping my inaccurate understanding will prompt someone else to correct me. :-) I've thought some more about this since I replied to your post on RDA-L and I think the model RDA presents can be justified.

Firstly, any value that refers to a vocabulary has to be treated as a non-literal, because it refers to a vocab. encoding scheme (VES). I know some of the RDA quality elements do that so they are being modelled by definition as non-literals.

Then, it makes sense to treat anything that is not just a string of characters as a non-literal, because you can make much richer descriptions of it. So, an author (Herman Melville) is an entity with a whole set of attributes that the statement of responsibility label ('by Herman Melville') doesn't have. This also goes for RDA's quantities, which, for example, can represented in different units.

Now, the point you are making, if I understand you correctly, is that the RDA text -- at this stage anyway -- is actually treating quantity values as if they were literals. However, this is not incompatible with modeling quantities as non-literals, because you can, if you wish, still use a literal value string to represent a non-literal. At some point in the future, RDA could change the rule that said, say, 'enter the height of a book as a string of characters "xx cm."' to 'enter an integer representing the height of the book in centimeters, eg "30"'.

The point is, by deciding up front that we may want to describe something as not just a string of characters we have the flexibility to describe it in a number of different ways: a URI, a vocabulary value, or just a plain string of characters.

-Irvin

Karen Coyle said...

Thanks, Irvin. You are right about the possibility of non-literals. As a matter of fact, I was about to make an adjustment to my post to acknowledge that RDA has many vocabulary lists that, as soon as these are given an identifier, are non-literals. The units of extent could indeed be non-literals, although the actual extent ("30") would be a literal. Notes, of course, are literals, especially once one has codified the standard notes like "Includes bibliographical references."

So what we conclude is that one could define literals and non-literals, which I agree is true. But if you look at the RDA Element Analysis, you see that, for example, title of a series is a literal, which this is actually an authority-controlled element that should have an identifier -- thus, non-literal. All identifiers are defined as literals. (This seems especially odd to me.) Names of persons are considered literals, even though we all assume that these would be authority controlled.

I still think that their definitions are off, as are their examples. Rather than defining literal and non-literal based on whether it is a value string or has a surrogate (usually an identifier), they define literal based on the role of the field within the description. So they have these things called "labels" which they consider the fields that distinguish one "thing" from another, and they say that those labels are literals. In fact, some of the things that they consider to be labels are not necessarily literals. So I think they've hung the definition of literal/non-literal off the wrong part of their model. I also think that the definitions in the Element analysis don't necessarily jive with the values in the table that follows. This could be because of the flaws in the definition.

Anonymous said...

Thanks Karen, I've checked that document and I see what you mean. I agree, it seems wrong to declare an authority controlled field as a literal value.

The problem seems to be caused by making the non-literal/literal distinction at the indecs category level. More generally, mixing the indecs and DCAM models, in my opinion, is making things unnecessarily complicated (they're complicated enough as it is!).

I'd suggest taking Ockham's Razor to the models: drop the indecs categories and make the DCAM literal/non-literal distinction on an element -by-element basis.

Karen Coyle said...

Great advice, Irvin. I've started going through the element analysis, marking elements that I think are mis-defined in the literal/non-literal sense. I'll try to make a list or something else useful from it, but I agree that the indecs model isn't helping here.