Using Regex in MarcEdit to Fix Repeated Subfields in MARC records

Can you tell I’ve been doing a lot of MARC work?

UPDATE: Apparently this is possible as a one step regex process. Go see Terry’s comment below! Ah well, live and learn.

The Problem

Today’s problem is repeated subfields. I don’t even have a particular use case for this except that I received a set of records where a bunch of fields with the same number was all mushed into one line. For example,

650 \$aWomen travelers–British Columbia–Biography.$aFrontier and pioneer life–British Columbia.$aBritish Columbia–Description and travel.

Of course, if you try to validate this, then it will say that for 650, subfield $a is not repeatable. I needed to somehow get all the data from each subfield a to its own 650. If you only have one or two records, then obviously, easiest to just do it manually, but when you have lots of records, that doesn’t work so well.

The Solution

Unfortunately, even with MarcEdit, I could not find a solution without using regular expressions. I tried all sorts of subfield swapping, but still couldn’t get it to do what I wanted to, so regex it is. I’ll be using the above example as a point of reference, but obviously it applies to any repeated subfield on a single entry.

Changing the Subfields

The first step was actually to change the repeated $a subfields to a non-existing subfield. Why? Because if you try to move any subfield $a, it will move all of them, so the easiest way is to make them all different then move them one at a time. Using the “Edit Field Data” option (under Tools), type in the field, in my case, that’s “650”, then

Find: (\$a.+)\$a Replace: $1$m

The find statement essentially says look for ‘$a’, one or more character, followed by $a, then replace it with the first group (that is everything between the first set of parentheses), followed by ‘$m’. I specifically chose ‘$m’ because m and most of the subsequent letters are non-existant subfields for 650. Took me a little time to discover myself that regex is more or less standard in MarcEdit, but $1 (instead of \1) for 1st group, which I believe is normal in certain flavours of regex (but not what I’m used to). Repeat as many times as needed. Since the records I worked with had as many as 4 subfields ‘a’ in one entry, I started with ‘m’ and ended with ‘p’. I actually kept going until I tried ‘q’, when MarcEdit told me no modifications were made.

Moving Subfields to New Entries

This is a fairly standard procedure if you’re familiar with MarcEdit. Simply use the ”Swap Field Utility”. Choose one of your non-existant subfields and move it to an unused field. For example:

650 m TO 699 a

Depending on what fields you’re editing, you may want to set the indicators on the modified data. Then cycle through all the non-existant subfields you used until you’re done. For example,

650 n TO 698 a

Changing New Entries to the Correct One

Finally, use the ‘Copy Field Data’ to move all the fields you used (699, 698, etc. in my example) to the correct one (650 in my example). Remember to check the “Delete Source Field” option.

Extending It

So, it gets more complicated if you have multiple repeated subfields that need to stay together. For example,

650 \$aWomen travelers$zBritish Columbia$aFrontier and pioneer life$zBritish Columbia$aBritish Columbia$xDescription and travel.

This is particularly difficult because the subfields that follow each subfield $a may be different ($z, $x, $y, etc.). I won’t work out all the regex here, but in terms of logical steps, I suggest this: 1. For every $[character] that is not $a, replace the $ with an unused character, such as ^. You would get \$a…^z…$a…^z…$a…^x 2. Now follow the steps above to move all the repeated subfields into their own entries. 3. As the last step, replace all ^ with $ in the field you’re working with (650 in the example).

Last Thought

It took me a while to work out everything out, so hopefully this might help someone in the future (even if it’s just myself needing to do something like this again).

Reference Documents

Published by

Cynthia

A librarian learning the ways of technology, accessibility, metadata, and people

One thought on “Using Regex in MarcEdit to Fix Repeated Subfields in MARC records”

  1. Actually, there is an easier way to do this in one step. You can use the edit field function, and the special mnemonic “/r” to handle recursive processing on a field. This was developed specifically for this type of problem. You can read about it generically here: http://blog.reeset.net/archives/1295.

    In your example, the approach I would have taken to take each subfield $a and move it into its own subject field, would have been as follows:
    1) Open the Edit Field Data Function (under the Tools Menu in the MarcEditor)
    2) The use the following criteria:
    Field: 650
    Find: (\$a[^$]*)
    Replace: $+/r
    Check the Use Regular Expression Option

    This will allow you to perform this operation in one step.

    –tr

Leave a Comment

Please log in using one of these methods to post your comment:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s