OM allows you to define a “terminology” to ease translation between XML and ruby objects – you can query the xml for Nodes or node values without ever writing a line of XPath.
OM “terms” are ruby symbols you define (in the terminology) that map specific XML content into ruby object attributes.
The API documentation at http://hudson.projecthydra.org/job/om/Documentation/OM.html provides additional, more targeted information. We will provide links to the API as appropriate.
- Install OM and run it in IRB
- Build an OM Terminology
- Use OM XML Document class
- Create XML from the OM XML Document
- Load existing XML into an OM XML Document
- Query OM XML Document to get term values
- Access the Terminology of an OM XML Document
- Retrieve XPath from the terminology
To get started, you will create a new folder, set up a Gemfile to install OM, and then run bundler.
mkdir omtest cd omtest
Using whichever editor you prefer, create a file (in omtest directory) called Gemfile with the following contents:
source 'http://rubygems.org' gem 'om'
Now run bundler to install the gem: (you will need the bundler Gem)
bundle install
You should now be set to use irb to run the following example.
To experiment with abbreviated terminology examples, irb is your friend. If you are working on a persistent terminology and have to experiment to make sure you declare your terminology correctly, we recommend writing test code (e.g. with rspec). You can see examples of this here
irb require "rubygems" => true require "om" => true
Create a simple (simplish?) Terminology Builder (OM::XML::Terminology::Builder") based on a couple of elements from the MODS schema.
terminology_builder = OM::XML::Terminology::Builder.new do |t| t.root(:path=>"mods", :xmlns=>"http://www.loc.gov/mods/v3", :schema=>"http://www.loc.gov/standards/mods/v3/mods-3-2.xsd") # This is a mods:name. The underscore is purely to avoid namespace conflicts. t.name_ { t.namePart t.role(:ref=>[:role]) t.family_name(:path=>"namePart", :attributes=>{:type=>"family"}) t.given_name(:path=>"namePart", :attributes=>{:type=>"given"}, :label=>"first name") t.terms_of_address(:path=>"namePart", :attributes=>{:type=>"termsOfAddress"}) } # Re-use the structure of a :name Term with a different @type attribute t.person(:ref=>:name, :attributes=>{:type=>"personal"}) t.organization(:ref=>:name, :attributes=>{:type=>"corporate"}) # This is a mods:role, which is used within mods:namePart elements t.role { t.text(:path=>"roleTerm",:attributes=>{:type=>"text"}) t.code(:path=>"roleTerm",:attributes=>{:type=>"code"}) } end
Now tell the Terminology Builder to build your Terminology (OM::XML::Terminology"):
terminology = terminology_builder.build
Generally you will use an OM::XML::Document to work with your xml. Here’s how to define a Document class that uses the same Terminology as above.
In a separate window (so you can keep irb running), create the file my_mods_document.rb in the omtest directory, with this content:
class MyModsDocument < ActiveFedora::NokogiriDatastream include OM::XML::Document set_terminology do |t| t.root(:path=>"mods", :xmlns=>"http://www.loc.gov/mods/v3", :schema=>"http://www.loc.gov/standards/mods/v3/mods-3-2.xsd") # This is a mods:name. The underscore is purely to avoid namespace conflicts. t.name_ { t.namePart t.role(:ref=>[:role]) t.family_name(:path=>"namePart", :attributes=>{:type=>"family"}) t.given_name(:path=>"namePart", :attributes=>{:type=>"given"}, :label=>"first name") t.terms_of_address(:path=>"namePart", :attributes=>{:type=>"termsOfAddress"}) } t.person(:ref=>:name, :attributes=>{:type=>"personal"}) t.organization(:ref=>:name, :attributes=>{:type=>"corporate"}) # This is a mods:role, which is used within mods:namePart elements t.role { t.text(:path=>"roleTerm",:attributes=>{:type=>"text"}) t.code(:path=>"roleTerm",:attributes=>{:type=>"code"}) } end # Generates an empty Mods Article (used when you call ModsArticle.new without passing in existing xml) # (overrides default behavior of creating a plain xml document) def self.xml_template # use Nokogiri to build the XML builder = Nokogiri::XML::Builder.new do |xml| xml.mods(:version=>"3.3", "xmlns:xlink"=>"http://www.w3.org/1999/xlink", "xmlns:xsi"=>"http://www.w3.org/2001/XMLSchema-instance", "xmlns"=>"http://www.loc.gov/mods/v3", "xsi:schemaLocation"=>"http://www.loc.gov/mods/v3 http://www.loc.gov/standards/mods/v3/mods-3-3.xsd") { xml.titleInfo(:lang=>"") { xml.title } xml.name(:type=>"personal") { xml.namePart(:type=>"given") xml.namePart(:type=>"family") xml.affiliation xml.computing_id xml.description xml.role { xml.roleTerm("Author", :authority=>"marcrelator", :type=>"text") } } } end # return a Nokogiri::XML::Document, not an OM::XML::Document return builder.doc end end
(Note that we are now also using the ActiveFedora gem.)
OM::XML::Document provides the set_terminology method to handle the details of creating a TerminologyBuilder and building the terminology for you. This allows you to focus on defining the structures of the Terminology itself.
By default, new OM Document instances will create an empty xml document, but if you override self.xml_template to return a different object (e.g. Nokogiri::XML::Document), that will be created instead.
In the example above, we have overridden xml_template to build an empty, relatively simple MODS document as a Nokogiri::XML::Document. We use Nokogiri::XML::Builder and call its .doc method at the end of xml_template in order to return the Nokogiri::XML::Document object. Instead of using Nokogiri::XML::Builder, you could put your template into an actual xml file and have xml_template use Nokogiri::XML::Document.parse to load it. That’s up to you. Create the documents however you want, just return a Nokogiri::XML::Document.
to use Nokogiri::XML::Builder
require "my_mods_document" newdoc = MyModsDocument.new newdoc.to_xml => NoMethodError: undefined method `to_xml' for nil:NilClass
To load existing XML into your OM Document, use #from_xml"
For an example, download hydrangea_article1.xml into your working directory (omtest), then run this in irb:
sample_xml = File.new("hydrangea_article1.xml") doc = MyModsDocument.from_xml(sample_xml)
Take a look at the document object’s xml that you’ve just populated. We will use this document for the next few examples.
doc.to_xml
Using the Terminology associated with your Document, you can query the xml for nodes or node values without ever writing a line of XPath.
You can use OM::XML::Document.find_by_terms to retrieve xml nodes from the datastream. It returns Nokogiri::XML::Node objects:
doc.find_by_terms(:person) doc.find_by_terms(:person).length doc.find_by_terms(:person).each {|n| puts n.to_xml}
You might prefer to use nodes as a way of getting multiple values pertaining to a node, rather than doing more expensive lookups for each desired value.
If you want to get directly to the values within those nodes, use OM::XML::Document.term_values:
doc.term_values(:person, :given_name) doc.term_values(:person, :family_name)
If the xpath points to XML nodes that contain other nodes, the response to term_values will contain Nokogiri::XML::Node objects instead of text values:
doc.term_values(:name)
For more examples of Querying OM Documents, see Querying Documents
For more examples of Updating OM Documents, see Updating Documents
If you have an XML schema defined in your Terminology’s root Term, you can validate any xml document by calling “.validate” on any instance of your Document classes.
doc.validate
Note: this method requires an internet connection, as it will download the XML schema from the URL you have specified in the Terminology’s root term.
Directly accessing the Nokogiri::XML::Document and the OM::XML::Terminology
OM::XML::Document is implemented as a container for a Nokogiri::XML::Document. It uses the associated OM Terminology to provide a bunch of convenience methods that wrap calls to Nokogiri. If you ever need to operate directly on the Nokogiri Document, simply call ng_xml and do what you need to do. OM will not get in your way.
ng_document = doc.ng_xml
If you need to look at the Terminology associated with your Document, call #terminology on the Document’s class.
MyModsDocument.terminology doc.class.terminology
Because the Terminology is essentially a mapping from XPath queries to ruby object attributes, in most cases you won’t need to know the actual XPath queries. Nevertheless, when you do want to know the Xpath (e.g. for ensuring your terminology is correct) for a term, the Terminology can generate xpath queries based on the structures you’ve defined (OM::XML::TermXPathGenerator).
Here are the xpaths for :name and two variants of :name that were created using the :ref argument in the Terminology Builder:
terminology.xpath_for(:name) => "//oxns:name" terminology.xpath_for(:person) => "//oxns:name[@type=\"personal\"]" terminology.xpath_for(:organization) => "//oxns:name[@type=\"corporate\"]"
The solrizer gem provides support for indexing XML documents into Solr based on OM Terminologies. That process is documented in the solrizer documentation