Skip to Main Content

Brown University Scholarly Resources

Open Refine


OpenRefine is a great tool for cleaning data, but especially for reconcialing FAST subject headings. The program runs locally on your computer using terminal while using your web browser as the user interface. You can download the program here.


Importing Data

Openrefine supports the upload of multiple files including mrc files. However, using MARCEDit to import mrc files creates an easier to use structure.

  • In MARCEdit under Tools > OpenRefine, select Export.
  • Select your mrc or mrk file and export it as a .json file
  • Upload the json file into OpenRefine
  • In the preview box, parse all the data by clicking next to the first curly bracket
  • Click 'Create Project'
Animated image of selecting how to parse the data

Editing

Facets

Facets are excellent ways to filter and examine the data in your project. It will list each unique column content and keep a count of the number of rows in the column that contain that exact word or phrase. This is a great way to limit the data you are viewing to a specific field, or to keep track of the number of fields that contain a unique phrase or note. The Facet function can be found in the column dropdown menu under Facets.

Transform

Using basic GREL you can batch transform cells. Using facets to limit the visible data to only your selected cells then using transform, is a great way to batch edit specific cells. Learn more about GREL here. The Transform function can be found in the column dropdown menu under Edit Cells > Transform.

Reconcile

Using OpenRefine's reconcile feature can be a powerful way to add linked data or find subject headings based on keywords. Brown has its own FAST reconciliating service you can query using OpenRefine. Below is an example of how to Reconcile a 653 field that is full of comma separated keywords and then use those keywords to create FAST subject headings.

  • Filter “_-rows-_-Tag”
    • Click on the down arrow button next to the column title
    • Facet > Text Facet
    • Click on “651” to limit the display to only 651 fields
  • Split “_-rows-_-Content” into several cells
    • Click the down arrow button next to the column title
    • Edit cells > Split multi-valued cells…
    • Split cells by the comma
  • Select both the 651 and the (blank) facet on the Tag filter
  • Open Reconcile on the Content column
    • Click the down arrow button next to the column title
    • Reconcile > Start reconciling…
    • Add Standard Service
    • https://dlibwwwcit.services.brown.edu/fast-reconcile/reconcile
    • Select the Fast Reconciliation Service
      • Select “All FAST terms”
      • Start Reconciling
    • Use your best judgment in selecting FAST headings. Typically I auto approve anything that has a >90% match, and discard any fields that have a <70% match. This leaves anything between 70-90% to be manually reviewed.
  • Update fields so the second indicator of FAST headings are \7, tags are 652, and the Record number matches the above record’s recordnumber. (the blank fields made during the cell split will all belong to the above 651 row’s recordnumber)
  • Update the fields
    • Click the down arrow button next to the column title
    • Edit cells > Transform…
      • "$a"+cells["keywords"].recon.match.name+"$2fast$0(OCoLC)fst"+cells["keywords"].recon.match.id[28,6000]
    • Ok

 

Example MARC record that has only local 653 keywords for subjects

While the record is open in MARCEdit:

  1. Select OpenRefine from File>Export for>OpenRefine
  2. Save the file as a json

In OpenRefine:

  1. Open the json file as a new project
  2. Use Text facets for the MARC tag fields and restrict to 653
  3. Under the Content column, Select Split multi-valued cells from Edit Cells > Split multi-valued cells
    1. Use "," as a delimiter to break apart multi-valued keywords
    2. Name the new column something simple for easy reference later.
  4. Reconcile the keywords in the new column. Under the column options, Select Reconcile> Start Reconciling.
  5. In the new pop-up, you can add a new reconciliation service.
    1. In the bottom left-hand corner, select 'Add Standard Service'
    2. In the new pop-up, add the url: https://dlibwwwcit.services.brown.edu/fast-reconcile/reconcile
    3. Add Service
  6. Select a FAST topic or select ALL
  7. Reconciled keywords will appear as either full match, partial match, or no match. Partial matches will have suggestions. It is recommended you do not spend very long on this stage and accept that some keywords will not have matches.
  8. Under the Content Column, Select Transform from Edit Cells > Transform
    1. In the Expression box, enter GREL code: "$a"+cells["NEW COLUMN NAME"].recon.match.name+"$2fast $0(OCoLC)fst"+cells["NEW COLUMN NAME"].recon.match.id[28,20000]
    2. Transform!
  9. Delete the New Column and keep the Content Column
  10. Change MARC field tag facet to "(blank)"
    1. Rename "(blank)" in the facet to '650' or whichever subject heading is appropriate based off your FAST topic
    2. Repeat this process for the indicators and set the blank indicators to '\7'
  11. Make sure all columns are in order by: Record Number, MARC field, Indicators, Content
  12. Export as Tab-separated value (.tsv)
  13. In MARCEdit Import the .tsv file via Tools > OpenRefine > Import
    1. Save as .mrk
  14. To avoid any loss of 653 fields, use MARCEdit's merge records tool to add the new 65x fields to the original file.

 


Exporting Data

Unfortunately, while OpenRefine supports importing marc files, it does not support exporting as marc files. To turn your data back into a marc file, you will need to export as a .tsv then use MARCEdit's OpenRefine Import tool.

  • In the top right corner of OpenRefine, select Export
  • Select 'Tab-separated value' and the file should automatically download into your download folder.
  • In MARCEdit under Tools > OpenRefine, select Import.
  • Select your .tsv file from your downloads and save it as a .mrk or .mrc file.