Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add "diff" functionality between objects of same class #18

Open
ronaldtse opened this issue Jul 14, 2024 · 2 comments
Open

Add "diff" functionality between objects of same class #18

ronaldtse opened this issue Jul 14, 2024 · 2 comments
Assignees
Labels
enhancement New feature or request

Comments

@ronaldtse
Copy link
Contributor

Often I want to compare model objects to see if two objects are identical.

The definition of two model objects being identical is:

  • They are of the same class (same model)
  • Their attributes have identical values (recursive)

I have written a comparator for Shale-based classes in the loc_mods gem.

These are the relevant files:

This task is to port the comparison functionality to lutaml-model, so that objects using lutaml-model are comparable by default.

@ronaldtse ronaldtse added the enhancement New feature or request label Jul 14, 2024
@ronaldtse
Copy link
Contributor Author

Example:

Extract this directory which gives you test and test2 directories:
Archive.zip

Run in loc_mods:

$ bundle exec exe/loc-mods detect-duplicates test/
Duplicate set #1 found for URL: https://doi.org/10.6028/NIST.TN.1630
  Comparison 1:
  File 1: test/allrecords-MODS-991000002489708106.xml
  File 2: test/allrecords-MODS-991000091869708106.xml
  Differences:
  └── LocMods::Record
      └── record_info (collection):
          └── [1] (LocMods::RecordInfo)
              └── [1] (LocMods::RecordIdentifier)
                  └── content (Shale::Type::String):
                      ├── - (String) "991000002489708106"
                      └── + (String) "991000091869708106"
  Similarity score: 99.91%
$ bundle exec exe/loc-mods detect-duplicates test2
Duplicate set #1 found for URL: https://doi.org/10.6028/NIST.IR.6659
  Comparison 1:
  File 1: test2/allrecords-MODS-991000009289708106.xml
  File 2: test2/allrecords-MODS-991000179879708106.xml
  Differences:
  └── LocMods::Record
      ├── identifier (collection):
      │   └── - [2] (LocMods::Identifier)
      │       ├── content (Shale::Type::String):
      │       │   └── (String) "994303379"
      │       ├── display_label (Shale::Type::String):
      │       │   └── (nil)
      │       ├── type (Shale::Type::String):
      │       │   └── (String) "oclc"
      │       ├── type_uri (Shale::Type::Value):
      │       │   └── (nil)
      │       ├── invalid (Shale::Type::Value):
      │       │   └── (nil)
      │       └── alt_rep_group (Shale::Type::String):
      │           └── (nil)
      ├── note (collection):
      │   ├── [2] (LocMods::Note)
      │   │   └── content (Shale::Type::String):
      │   │       ├── - (String) "July 1, 2010."
      │   │       └── + (String) "2010."
      │   └── [3] (LocMods::Note)
      │       └── content (Shale::Type::String):
      │           ├── - (String) "Title from PDF title page (viewed June 5, 2017)."
      │           └── + (String) "Title from PDF title page."
      ├── origin_info (collection):
      │   └── [2] (LocMods::OriginInfo)
      │       ├── [1] (LocMods::Place)
      │       │   └── [1] (LocMods::PlaceTerm)
      │       │       └── content (Shale::Type::String):
      │       │           ├── - (String) "Gaithersburg, MD:"
      │       │           └── + (String) "Gaithersburg, MD :"
      │       └── [1] (LocMods::Publisher)
      │           └── content (Shale::Type::String):
      │               ├── - (String) "U.S. Dept. of Commerce, National Institute of Standards and
                Technology;"
      │               └── + (String) "U.S. Dept. of Commerce, National Institute of Standards and
                Technology; "
      ├── record_info (collection):
      │   └── [1] (LocMods::RecordInfo)
      │       ├── [1] (LocMods::Date)
      │       │   └── content (Shale::Type::String):
      │       │       ├── - (String) "190912"
      │       │       └── + (String) "160922"
      │       ├── [1] (LocMods::Date)
      │       │   └── content (Shale::Type::String):
      │       │       ├── - (String) "20200114084208.0"
      │       │       └── + (String) "20160922100247.0"
      │       └── [1] (LocMods::RecordIdentifier)
      │           └── content (Shale::Type::String):
      │               ├── - (String) "991000009289708106"
      │               └── + (String) "991000179879708106"
      ├── subject (collection):
      │   ├── - [1] (LocMods::Subject)
      │   │   ├── id (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── authority (Shale::Type::String):
      │   │   │   └── (String) "lcsh"
      │   │   ├── authority_uri (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── value_uri (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── lang (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── script (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── transliteration (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── display_label (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── alt_rep_group (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── usage (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── topic (Shale::Type::String):
      │   │   │   └── (Array) [(String) "Jet planes", (String) "Fuel", (String) "Thermal properties"]
      │   │   ├── geographic (Shale::Type::String):
      │   │   │   └── (Array) 0 items
      │   │   ├── temporal (LocMods::Temporal):
      │   │   │   └── (Array) 0 items
      │   │   ├── title_info (LocMods::SubjectTitleInfo):
      │   │   │   └── (Array) 0 items
      │   │   ├── name (LocMods::SubjectName):
      │   │   │   └── (Array) 0 items
      │   │   ├── geographic_code (LocMods::GeographicCode):
      │   │   │   └── (Array) 0 items
      │   │   ├── hierarchical_geographic (LocMods::HierarchicalGeographic):
      │   │   │   └── (Array) 0 items
      │   │   ├── cartographics (LocMods::Cartographics):
      │   │   │   └── (Array) 0 items
      │   │   ├── occupation (LocMods::Occupation):
      │   │   │   └── (Array) 0 items
      │   │   ├── genre (LocMods::Genre):
      │   │   │   └── (Array) 0 items
      │   │   └── href (Shale::Type::String):
      │   │       └── (String) "https://id.loc.gov/authorities/subjects/sh2001009121"
      │   ├── - [2] (LocMods::Subject)
      │   │   ├── id (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── authority (Shale::Type::String):
      │   │   │   └── (String) "fast"
      │   │   ├── authority_uri (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── value_uri (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── lang (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── script (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── transliteration (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── display_label (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── alt_rep_group (Shale::Type::String):
      │   │   │   └── (nil)
      │   │   ├── usage (Shale::Type::Value):
      │   │   │   └── (nil)
      │   │   ├── topic (Shale::Type::String):
      │   │   │   └── (Array) [(String) "Jet planes", (String) "Fuel", (String) "Thermal properties"]
      │   │   ├── geographic (Shale::Type::String):
      │   │   │   └── (Array) 0 items
      │   │   ├── temporal (LocMods::Temporal):
      │   │   │   └── (Array) 0 items
      │   │   ├── title_info (LocMods::SubjectTitleInfo):
      │   │   │   └── (Array) 0 items
      │   │   ├── name (LocMods::SubjectName):
      │   │   │   └── (Array) 0 items
      │   │   ├── geographic_code (LocMods::GeographicCode):
      │   │   │   └── (Array) 0 items
      │   │   ├── hierarchical_geographic (LocMods::HierarchicalGeographic):
      │   │   │   └── (Array) 0 items
      │   │   ├── cartographics (LocMods::Cartographics):
      │   │   │   └── (Array) 0 items
      │   │   ├── occupation (LocMods::Occupation):
      │   │   │   └── (Array) 0 items
      │   │   ├── genre (LocMods::Genre):
      │   │   │   └── (Array) 0 items
      │   │   └── href (Shale::Type::String):
      │   │       └── (String) "https://id.worldcat.org/fast/982434"
      │   └── - [3] (LocMods::Subject)
      │       ├── id (Shale::Type::Value):
      │       │   └── (nil)
      │       ├── authority (Shale::Type::String):
      │       │   └── (String) "fast"
      │       ├── authority_uri (Shale::Type::Value):
      │       │   └── (nil)
      │       ├── value_uri (Shale::Type::Value):
      │       │   └── (nil)
      │       ├── lang (Shale::Type::String):
      │       │   └── (nil)
      │       ├── script (Shale::Type::String):
      │       │   └── (nil)
      │       ├── transliteration (Shale::Type::String):
      │       │   └── (nil)
      │       ├── display_label (Shale::Type::String):
      │       │   └── (nil)
      │       ├── alt_rep_group (Shale::Type::String):
      │       │   └── (nil)
      │       ├── usage (Shale::Type::Value):
      │       │   └── (nil)
      │       ├── topic (Shale::Type::String):
      │       │   └── (Array) [(String) "Jet planes", (String) "Fuel"]
      │       ├── geographic (Shale::Type::String):
      │       │   └── (Array) 0 items
      │       ├── temporal (LocMods::Temporal):
      │       │   └── (Array) 0 items
      │       ├── title_info (LocMods::SubjectTitleInfo):
      │       │   └── (Array) 0 items
      │       ├── name (LocMods::SubjectName):
      │       │   └── (Array) 0 items
      │       ├── geographic_code (LocMods::GeographicCode):
      │       │   └── (Array) 0 items
      │       ├── hierarchical_geographic (LocMods::HierarchicalGeographic):
      │       │   └── (Array) 0 items
      │       ├── cartographics (LocMods::Cartographics):
      │       │   └── (Array) 0 items
      │       ├── occupation (LocMods::Occupation):
      │       │   └── (Array) 0 items
      │       ├── genre (LocMods::Genre):
      │       │   └── (Array) 0 items
      │       └── href (Shale::Type::String):
      │           └── (String) "https://id.worldcat.org/fast/982425"
      └── title_info (collection):
          └── [1] (LocMods::TitleInfo)
              └── [1] (String)
                  └── :
                      ├── - (String) "Thermodynamic, transport, and chemical properties of reference
                JP-8"
                      └── + (String) "Thermodynamic, transport, and chemical properties of reference JP-8
"
  Similarity score: 92.75%

In Lutaml::Model, since we don't know the type of object parsed, we have to generalize it so that the user is able to define the class and then parse files and then do the comparison to print out such a tree.

@ronaldtse
Copy link
Contributor Author

The compare code is done in #34

Need to provide documentation on how to use the compare.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement New feature or request
Projects
None yet
Development

No branches or pull requests

2 participants