Tag Archives: cataloguing

Code4lib Day 1: Lightning Talks Notes

Al Cornish – XTF in 300 seconds (Slides in PDF)

  • technology developed and maintained by California Digital Library
  • supports the search/display of digital collections (images, PDFs, etc)
  • fully open source platform, based on Apache Lucene search toolkit
  • Java framework, runs in Tomcat or Jetty servlet engine
  • extensive customization possible through XSLT programming
  • user and developer group communication through Google Groups
  • search interface running on Solr with facets
  • can output in RSS
  • has a debug mode

Makoto Okamoto – saveMLAK (English)

  • Aid activities for the Great East Japan Earthquake through collaboration via wiki
  • input from museum, library, archive, kominkan = MLAK
  • 20,000 data of damaged area
  • Information about places, damages, and relief support
  • Key Lessons
    • build synergy with twitter
    • have offline meet ups & training

Andrew Nagy – Vendors Suck

  • vendors aren’t really that bad
  • used to think vendors suck, and that they don’t know how to solve libraries’ problems
  • but working for a vendor allows to make a greater impact on higher education, more so than from one university (he started to work for SerialsSolution)
  • libraries’ problems aren’t really that unique
  • together with the vendor, a difference can be made
  • call your vendors and talk to the product managers
  • if they blow you off, you’ve selected the wrong vendor
  • sometimes vendor solutions can provide a better fit

Andreas Orphanides - Heat maps

The library needed grad students to teach instructional sessions, but how to set schedule when classes have a very inflexible schedule? So, he used the data of 2 semesters of instructional sessions using date and start time, but there were inconsistent start times and duration. The question is how best to visualize the data.

  • heatmap package from clickheat
  • time of day – x-dimension
  • day of the week – y-dimension
  • could see patterns in way that you can’t in histogram or bar graph
  • heat map needn’t be spatial
  • heat maps can compare histogram-like data along a single dimension or scatter-like plot data to look for high density areas

Gabriel Farrell – ElasticSearch

Nettie Lagace from NISO

  • National Information Standards Organization (NISO)
  • work internationally
  • want to know: What environment or conditions are needed to identify and solve the problem of interoperability problems?

Eric Larson – Finding images in book page images

A lot of free books exist out there, but you can’t have the time to read them all. What if you just wanted to look at the images? Because a lot of books have great images.

He used curl to pull all those images out, then use imagemagick to manage the images. The processing steps:

  1. Convert to greyscale
  2. Contrast boost x8
  3. Covert image to 1px by height
  4. Sharpen image
  5. Heavy-handed grayscaling
  6. Convert to text
  7. Look for long continuous line of black to pull pages with images

Code is on github

Adam Wead – Blacklight at the Rock Hall

  • went live, soft launch about a month ago
  • broken down to the item level
  • find bugs he doesn’t know about for a beer!

Kelley McGrath – Finding Movies with FRBR & Facets

  • users are looking for movies, either particular movie or genre/topic
  • libraries describe publications e.g. date by DVD, not by movie
  • users care about versions e.g. Blu-Ray, language
  • Try the prototyped catalog
  • Hit list provides one result per movie, can filter by different facets

Bohyun Kim – Web Usability in terms of words

  • don’t over rely on the context
  • but context is still necessary for understanding e.g. “mobile” – means on the go, what they want on the go
  • sometimes there is no better term e.g. “Interlibrary Loan”
  • brevity will cost you “tour” vs. “online tour”
  • Time ran out, but check out the rest of the slides

Simon Spero – Restriction Classes, Bitches

OWL:

  • lets you define properties
  • control what the property can apply to
  • control the values the property can take
  • provides an easy way to do this
  • provides a really confusing way to do this

The easy way is usually wrong!

When defining what can apply to and the range, this applies to every use of the property. An alternative is Attempto.

Cynthia Ng – Processing & ProcessingJS

  • Processing: open source visual programming language
  • Processing.js: related project to make processing available through web browsers without plugins
  • While both tend to focus on data visualizations, digital art, and (in the case of PJS) games, there are educational oriented applications.
  • Examples:
    • Kanji Compositing – allows visual breakdown of Japanese kanji characters, interact with parts, and see children.
    • Primer on Bezier Curves – scroll down to see interactive (i.e. if you move points, replots on the fly) and animated graphs.
  • Obvious use might be instructional materials, but how might we apply it in this context? What other applications might we think of in the information organization world?

Since doing the presentation, I have already gotten one response by Dan Chudnov who did a quick re-rendering of newspaper data from OCR data. Still thinking on (best) use in libraries and other information organizations.

It’s over for today, but if you’d like more, do remember that there is a livestream and you can follow on twitter, #c4l12 or IRC.

Evaluating the UBC Catalogue

Disclaimer: This is actually a copy of my assignment for cataloguing class, so I was being as critical as possible within a set page limit. Although it could use a few improvements, there are actually a lot of things I like about the UBC catalogue, but which isn’t reflected in here.

University of British Columbia (UBC) Library Catalogue Evaluation

  • Type: Academic library
  • Size: 6.1 million volumes
  • Key characteristics: Large research and teaching collection, diverse, multilingual, depository, unique subject descriptions for First Nations materials, large digital collection
  • OPAC: Custom on top of Voyager ILS;  Discovery Layer: Summon

Last year, our library, the University of British Columbia (UBC) Library, completed a five year strategic plan. As part of the strategic plan, the library has put focus on advancing research, learning and teaching excellence by putting emphasis on certain values, including services excellence and stewardship of collections and institutional resources. The library’s catalogue is the key resource in order to provide users access to and related services of our physical and e-book collections. Nevertheless, the catalogue can be improved in many ways to help achieve the library’s mission.

Searching & Viewing Records

One of the best features of the catalogue is the number of search and browse options, including call number browsing, subject browsing, and the various search options, in addition to the sorting options of results. However, some of the features do not work as expected and can use improvement.

Some issues are fairly small and can be easily changed, which will improve finding items for users. For example, the brief view shows a blank space next to ‘Author’ if no ‘main author’ exists, but the first or all of the ‘other authors’ can be listed in this case to help users find specific items. Similarly, in full record view, the statement of responsibility is marked as ‘Title’, which may confuse users because it contains more than just the title. Either the label could be changed or removed, or the subfields and punctuation could be used to separate title from contributors. Title keyword search also includes contents, which makes sense in the case that users are searching chapter titles, but this is unlikely what users would expect and will also result in too many results, particularly because it searches the entire contents field, which may include authors. Users should be given the option to include contents.

Item availability also uses unclear terms when there are multiple items, showing ‘multiple holdings available’ even when all or none are available. This feature could be further improved by showing availability in a preferred branch, particularly when filtered for a specific location. Similarly, e-books should be marked ‘online’ or ‘eBook’ (to match the Summon display)  instead of ‘no item information, ONLINE’ as if somehow implying that the item is not ‘in’ the library, and a separate filter should be available instead of showing online and physical books when choosing a specific location.

Perhaps one of the most critical issues is the sorting of results by publication date. At times it is incorrect due to the lack of a subfield resulting in incorrect order, such as:

260__  |b Four Worlds International Institute for Human and Community Development  |c Lethbridge, Alberta: 1998 (bid: 2836032)[1]

Another reason for records seeming to be out of order is because the sorting seems to be based on manufacture date, but the display shows the publication date such that the following is among 2011:

260__  |a [Ottawa, Ont.] :  |b [Interagency Secretariat on Research Ethics],  |c 2010  |e (Saint-Lazare, Quebec :  |f Canadian Electronic Library,  |g 2011). (bid: 5031001)

From the user’s perspective, the edition or publication date is likely to be of more importance and sorting by the subfield $c (instead of $g) would provide a displayed order that would not be confusing. However, even taking into these factors into account, there seems to be no consistent order, particularly with records that have no publication date. When in ascending order, the items with no dates are interspersed with those with dates, and some records will show out of order, such as:

260__ |c 2002 (bid: 2822328) above 260__ |b Brock University  |c 1996 (bid: 2835632)

Descending works better, but when users see one sort function not working, they may assume others do not function either, and may be deterred from using the catalogue again in either case.

Marginalized Collections Needing Improvement

While catalogue records are overall well formed, some collections (where there may not be full copy catalogue records) are lacking in comparison. For example, English non-fiction records are overall of high quality, particularly new titles, most of which have tables of contents. When the contents are detailed, the contents need to be well formatted, which is one area that could be improved upon for easier reading, or some of the content could be stripped in cases, such as:

… Chapter 1 : number relationships / senior author and senior consultant, Marian Small ; student book authors, Jack Hope … [et al.] ; teacher’s resource chapter authors, Jason Chenier, Katherine Pratt ; assessment consultants, Sandra Carl Townsend, Gerry Varty… (bid: 4005625).

There are also some cases where minor errors occur, such as extra punctuation at the end of a note, but none which may significantly impact a user’s experience. The records are also generally up to date with most records being last updated in 2008 even for older items (e.g. bid: 1651678 from 1902).

In comparison, records for the non-fiction First Nations collection are generally very brief. Although some exceptions exist with electronic or new popular non-fiction books, contents are frequently empty, or contain partial contents (only 1-3 lines), often poorly formatted. Records for First Nations resources are also more likely to have multiple records for the same work, where duplicate records have only minor differences, such as First Nations education policy in Canada  (bid: 4598655 & 4598483) where only contents differ in format. An addition which would greatly improve the use of the catalogue for the collection would be to make the local subject access fields (690), such as:

FIRST NATIONS – BAND GOVERNMENT – HISTORY – ONTARIO
FIRST NATIONS – BANDS – ELECTIONS – HISTORY – ONTARIO (bid: 2833524)

browsable, listing all items for a subject as with other subject access points . It is particularly important for these records to be well maintained as the library promotes unique services and subject descriptors for the First Nations collection in support of the growing First Nations programs at the university.

Similarly, French non-fiction records tend to be somewhat brief with sparse or non-existent contents even for new works with the exception of electronic monographs. Many of the French works only have one or two subject access points, and while rare, some have none at all (e.g. L’évaluation formative des apprentissages en français, langue seconde bid: 2697408). Errors are also frequent, particularly with series entries, such as:

 830 _0 |a Bibliothèque française et romane. Sér. D: Initiation, textes et documents ;  |v 5 (bid: 1430529)

which is missing the part subfield, $p, resulting in a narrower series search. Another example:

 830 _0  |a Bibliothèque française et romane. Sér. A: Manuels et études linguisitiques, 14. (bid: 1879073)

is not subfielded at all, resulting in a series search which would include the volume number.

More problematic is that few records use uniform title, only variant titles or notes. Although a title or variant title search may not be a problem with multiple editions, the lack of uniform title is especially a problem with translations, which is more prominent in the French collection as many monographs are translations from other languages. While a French title may have a note specifying it is a translation, such as with Enseigner la lecture : revenir a` l’essentiel (bid: 3807358), the reverse is not true, meaning the user cannot search for or even know of translations of a text except possibly by searching or browsing by author. In addition, some records do not have a note of the original work name, particularly in the case that the original work is not in English, such as with Spinoza contre Kant, et la cause de la verité spirituelle (bid: 1656761) which is a translation of Spinoza und sein Kreis : historisch-kritische Studien u¨ber holla¨ndische Freigeister (bid: 1656978). Furthermore, French titles must be searched with diacritics and will provide incorrect results otherwise, inconveniencing searchers.

Recommendations

To summarize, the following actions are recommended:

  • Check for and merge duplicate records
  • Check for consistent and correct use of subfields, particularly when copy cataloguing
  • Improve records in currently marginalized collections
  • Use uniform titles when appropriate
  • Make all subject access points browsable, including local First Nations subjects
  • Make display and search for user friendly:
    • Always show author names, not only when it is a principal access point
    • Show simply ‘online’ or ‘eBook’ for location of electronic monographs
    • Change or remove ‘title’ label in full record view
    • Provide option for user to search chapter titles or contents in ‘title keyword’ search
    • Change availability to show preferred or filtered location
    • Fix publication sort and change to publication date (instead of manufacture date)
    • Allow searching with and without diacritics in all languages

While not all of these actions are feasible, particularly in the short term, many of the recommendations can be implemented over time, integrated into the workflow or as part of catalogue maintenance.

Possible Solution

While the catalogue could use many improvements, many have to do with the interface in terms of display and searching. Rather than putting effort into implementing the related recommendations, time and resources could be focused on ameliorating the MARC records for use in the web discovery layer, Summon. The data from our MARC records operate well with Summon, which already properly organizes by date and filters by location without online resources and shows location with available item first. While it does not have all the features of the OPAC, it may be possible to add them. Furthermore, Summon has a mobile version, allowing greater, more flexible access to our records.


[1] Refers to the bib record ID in the permanent URL, http://resolve.library.ubc.ca/cgi-bin/catsearch?bid=

AACR2 and MARC: Rules that Give You Individuality

The last couple of weeks in cataloguing have been on descriptive cataloguing using AACR2 (Anglo-American Cataloguing Rules) rules and MARC (MAchine Readable Code) coding. If ever we think that librarians cannot be decisive, then one area where they can be is cataloguing. Our instructors did not lie about this, and yet, being decisive and being consistent is not entirely the same thing.

Considering the number of rules in AACR2, I was initially under the impression that it would be like APA citation. Essentially, that there is a rule for everything and no matter who does it, it will look the same.  Obviously, the areas left for local use (such as most of the MARC fields with a 9) will differ between libraries, as well as specific code classification, but I thought the descriptive part would be uniform. Then I discovered that I was quite wrong.

Title Information or Not?

Despite the numerous rules, there are many areas that leave room for interpretation. One of the items I had for our assignment was a directory for an auto exhibition. The main title was fairly clear, but then I wondered whether the location (the exhibition hall) which was on the title page should be listed as other title information considering it was written underneath the title almost as if it was a subtitle.

Another issue which to consider the primary language (which would be listed first) in a bilingual book. [insert pictures] Would you do it based on the primary language of your library or would you use any other clues you could find? (I used both since the centrefold picture was in the same direction as the primary language of the assignment.)

How much Publisher Information to include?

As publisher information can be from a variety of sources, how much would you include? In the case that there is no (clear) publisher, which is more important? Distributor? Printer? Copyright holder?

Taiwan Directory Verso

In the end, I somewhat made up the statement of responsibility and came up with this:

Taipei : Printed by Wuchou Color Phtoengraving for Taiwan External Trade Development Council [organizer], 2008

Notes

Finally, there’s notes. The extend to which it’s filled out and exactly how is up to the cataloguer, which of course means that it will differ. Interestingly, they may not be as different as one might think as there are a far amount of rules surrounding the order, how one might format it, and MARC coding will even separate numerous notes into specific fields. It may be more or less complete, but having looked at various catalogue entries for the same item, they are fairly consistent.

Right or Wrong?

What I begin to wonder is who’s to say which way is right or wrong? Who might be able to say which way is better? I’m starting to think there must be a listserv of some sort for this sort of thing that maybe us students just don’t know about yet…