Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add raw option for attributes #94

Merged
merged 2 commits into from
Oct 7, 2024
Merged

Conversation

HassanAkbar
Copy link
Member

Add raw tag for attributes to keep the raw xml as value of that attribute.

fixes #90

@HassanAkbar HassanAkbar marked this pull request as draft September 26, 2024 12:56
@HassanAkbar HassanAkbar marked this pull request as ready for review September 26, 2024 14:47
@HassanAkbar
Copy link
Member Author

HassanAkbar commented Sep 27, 2024

@ronaldtse by looking at this comment it seems like a case for raw option but in this case we need to define it for the current model.
Do you think it would be beneficial to add raw option for the model as well? may be at the root, something like below (it might need some adjusting)

class RichText  < Lutaml::Model::Serializable
  attribute :raw_string, :string

  xml do
    root 'RichText', raw: true

    map_content to: :raw_string
  end
end

class Docid < Lutaml::Model::Serializable
  attribute :raw_string, RichText
  
  xml do
    root 'docid'

    map_content to: :raw_string
  end
end

@ronaldtse
Copy link
Contributor

@HassanAkbar I think there are two reasons why @opoudjis and @andrew2net want an enhancement here:

  1. Using raw because the underlying XML structure is complex or unknown, and it is time consuming to parse the underlying XML structure using Lutaml::Model. This is handled via raw.
  2. Text models for rich text is complicated, and actually things like attribute :sup... is totally useless for the user, because you won't be "selecting every bold text instance" but you just want to deal with it as XML. This should be supported as commented by @andrew2net . I don't know what is the best API to do so, and we should support it.

child.children.map do |c|
next c.text if c.text?

self.class.new(c).to_xml
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@HassanAkbar let's merge this first, but is it possible to obtain the full raw string instead of serializing it again? I just don't want to modify something that is meant to be "raw".

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronaldtse the conversion in lutaml-model has 2 steps

  1. Data is parsed using the adapter and saved in the adapter document class (at this point we have no idea which classes have which mappings defined)
  2. Mappings are used to convert that data to ruby object

The problem is that when we are parsing the data we do not have access to the mappings so we have no idea which content to read as raw string and which to parse. So that is why we are converting the parsed data back to XML string when applying mappings.

Copy link
Contributor

@ronaldtse ronaldtse Oct 8, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can’t the parser return us the raw XML string at a certain node?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ronaldtse It will require some changes in the adapters but I think it will work something like

value = if child.text?
          child.text
        elsif attr&.raw?
          child.to_xml
        else
          parse_element(child, attr&.type || klass, format)
        end

I'll open a separate PR for this.

@ronaldtse
Copy link
Contributor

@HassanAkbar can we rebase and merge this? Thanks!

@HassanAkbar HassanAkbar merged commit 76e456b into main Oct 7, 2024
13 of 14 checks passed
@HassanAkbar HassanAkbar deleted the add_raw_option_for_attributes branch October 7, 2024 15:02
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Naive normalisation of XML if it contains unknown markup
2 participants