Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

xml.mod does not set content in children #36

Open
MidimasterSoft opened this issue Dec 9, 2024 · 4 comments
Open

xml.mod does not set content in children #36

MidimasterSoft opened this issue Dec 9, 2024 · 4 comments

Comments

@MidimasterSoft
Copy link

I have this given XML-document with a child <ChapterString>Chapter 01</ChapterString>

<?xml version="1.0"?>
<!-- <!DOCTYPE Chapters SYSTEM "matroskachapters.dtd"> -->
<Chapters>
  <EditionEntry>
    <EditionFlagHidden>0</EditionFlagHidden>
    <EditionFlagDefault>1</EditionFlagDefault>
    <EditionUID>9866912279059327637</EditionUID>
    <ChapterAtom>
      <ChapterUID>1286159179066931568</ChapterUID>
      <ChapterTimeStart>00:00:00.000000000</ChapterTimeStart>
      <ChapterFlagHidden>0</ChapterFlagHidden>
      <ChapterFlagEnabled>1</ChapterFlagEnabled>
      <ChapterTimeEnd>00:01:49.234125000</ChapterTimeEnd>
      <ChapterDisplay>
        <ChapterString>Chapter 01</ChapterString>
        <ChapterLanguage>eng</ChapterLanguage>
      </ChapterDisplay>
    </ChapterAtom>
  </EditionEntry>
</Chapters>

And I want to change the content of this child with...

(runnable code example)

SuperStrict

Framework text.xml
Import BRL.StandardIO

Local doc:TxmlDoc = TxmlDoc.parseFile("chapter.xml")
Local root:TxmlNode = doc.getRootElement()

ParseChildren(root.getChildren())

Function ParseChildren(children:TList)
	For Local child:TxmlNode = EachIn children
		Local name:String = child.GetName()
		Print  "NAME =" + name '+ " CONTENT=" + child.toString() + "+++++" 
		If name = "ChapterString"
			Print "*******************"
			Print "content before: --|" + child.GetContent()  + "|--"
			child.setContent("Happy")
			Print "content after: --|" +child.GetContent()   + "|--"
			Print "--done---------   "
			Print 
		EndIf
		ParseChildren(child.getChildren())
	Next
End Function 

But afterwards I observe an empty content after SetContent()

This is definitely a bug, as the user expects to exchange contents with SetContent()

I tested this with the latest version of xml.mod from yesterday night.

Protocol:

Building XMLChange
[ 86%] Processing:XMLChange.bmx
[ 93%] Compiling:XMLChange.bmx.gui.debug.win32.x64.c
[100%] Linking:XMLChange.debug.exe
Executing:XMLChange.debug.exe
NAME =EditionEntry
NAME =EditionFlagHidden
NAME =EditionFlagDefault
NAME =EditionUID
NAME =ChapterAtom
NAME =ChapterUID
NAME =ChapterTimeStart
NAME =ChapterFlagHidden
NAME =ChapterFlagEnabled
NAME =ChapterTimeEnd
NAME =ChapterDisplay
NAME =ChapterString
*******************
content before: --|Chapter 01|--
content after: --||--
--done---------   
@GWRon
Copy link

GWRon commented Dec 9, 2024

I prepared some (valid) XML which exposes where text can be "stored" in a node - and that this is something which should be "editable".
If MXML does no support it, then we might consider adding some "warning" if a "mixed content" node is used inside a "setContent".

<?xml version="1.0"?>
<!-- some comment -->
<library>
	<book id="book1">
		<title>Book 1</title>
		<year>1990</year>
		<reviews>
		"text" of node reviews
			<review>
				<author>John Doe</author>
				<rating>5</rating>
			</review>
		"tail" of 1st review node
			<review>
				<author>Jane Doe</author>
				<rating>2</rating>
			</review>
		"tail" of 2nd review node
			<something_else>"text" of something_else node</something_else>
		</reviews>
	</book>
	<book id="book2">
		<title>Book 2</title>
		<year>1992</year>
		<reviews>
			<review>
				<author>John Doe</author>
				<rating>7</rating>
			</review>
		</reviews>
	</book>
</library>

MXL does not use the python lib "ElementTree" - but it should be possible to use mxmlGetOpaque to retrieve the "text" value of a node.
With mxmlWalkNext we could traverse through the children of a node - these "tail" of 1st review node (and others) will be text nodes there (mxml does not differentiate there)

@GWRon
Copy link

GWRon commented Dec 9, 2024

Just to not loose it... I tested to extract the "text" from a mixed content node with mxml.

This is for now C code to directly use the mxml commands (I exposed it as some simple "node.TestCode()" to BlitzMax).

<?xml version="1.0"?>
<!-- some comment -->
<example>
    This is text.
    <child>Child text 1</child>
    More text here.
    <child>Child text 2</child>
</example>
// Helper function to trim leading and trailing whitespace
char *trim_whitespace(const char *str) {
    if (!str) return NULL;

    // Find the first non-whitespace character
    while (isspace(*str)) str++;

    // Find the last non-whitespace character
    const char *end = str + strlen(str) - 1;
    while (end > str && isspace(*end)) end--;

    // Length of trimmed string
    size_t len = end - str + 1;

    // Allocate and copy the trimmed string
    char *trimmed = malloc(len + 1);
    if (trimmed) {
        strncpy(trimmed, str, len);
        trimmed[len] = '\0';
    }
    return trimmed;
}



void bmx_testcode(mxml_node_t * node) {
	 // Get the parent node's text (if any)
	const char *parent_text = mxmlGetOpaque(node);
	char *trimmed_parent_text = NULL;
	if (parent_text) {
		trimmed_parent_text = trim_whitespace(parent_text);
		if (trimmed_parent_text && strlen(trimmed_parent_text) > 0) {
			printf("Text in <example>: %s\n", trimmed_parent_text);
		}
	}

	// Iterate through child nodes
	for (mxml_node_t *child = mxmlWalkNext(node, node, MXML_DESCEND);
		 child != NULL;
		 child = mxmlWalkNext(child, node, MXML_DESCEND)) {
		mxml_type_t type = mxmlGetType(child);

		if (type == MXML_OPAQUE) {
			// Get opaque text for child
			const char *child_text = mxmlGetOpaque(child);
			if (child_text) {
				char *trimmed_child_text = trim_whitespace(child_text);
				if (trimmed_child_text && strlen(trimmed_child_text) > 0) {
					// Skip if the text matches the parent's text
					if (!trimmed_parent_text || strcmp(trimmed_child_text, trimmed_parent_text) != 0) {
						printf("Child Text: %s\n", trimmed_child_text);
					}
				}
				free(trimmed_child_text);
			}
		} else if (type == MXML_ELEMENT) {
			// Handle element nodes
			printf("Element: <%s>\n", mxmlGetElement(child));
		}
	}

	// Free trimmed parent text
	free(trimmed_parent_text);
}

Output is:

Text in <example>: This is text.
Element: <child>
Child Text: Child text 1
Child Text: More text here.
Element: <child>
Child Text: Child text 2

@GWRon
Copy link

GWRon commented Dec 9, 2024

Ok ... I now prepared some "bmx_replace_content" function:

glue.c:

// Function to replace the content of a node
void bmx_replace_node_content(mxml_node_t *node, BBString *bb_new_content, int remove_children, int remove_subsequent_text) {
	if (!node || !bb_new_content) return;

	const char *new_content = bbStringToUTF8String(bb_new_content);

	if (remove_children) {
		// Remove all child nodes
		mxml_node_t *child;
		while ((child = mxmlGetFirstChild(node)) != NULL) {
			mxmlDelete(child);
		}
		// Add the new content as a single text node after removing all children
		mxmlNewOpaque(node, new_content);
	} else {

		// Retain child elements but replace text parts
		mxml_node_t *child = mxmlGetFirstChild(node);
		int first_text_replaced = 0;

		while (child) {
			mxml_type_t type = mxmlGetType(child);

			if (type == MXML_OPAQUE) {
				// Replace text node
				if (!first_text_replaced) {
					mxmlSetOpaque(child, new_content);
					first_text_replaced = 1; // Mark that the first text node has been replaced
				} else if (remove_subsequent_text) {
					// Remove subsequent text nodes if the flag is set
					mxml_node_t *next_sibling = mxmlGetNextSibling(child); // Save the next sibling
					mxmlDelete(child);  // Delete the current text node
					child = next_sibling; // Continue with the next sibling
					continue;
				}
			}
			child = mxmlGetNextSibling(child);
		}

		// If no text node was found, add the new content
		if (!first_text_replaced) {
			mxmlNewOpaque(node, new_content);
		}
	}

	bbMemFree(new_content);
}

common.bmx:
Function bmx_replace_node_content(handle:Byte Ptr, newContent:String, removeChildren:Int, removeSubsequentText:Int)

xml.bmx - TxmlNode:

	Method replaceContent(newContent:string, removeChildren:Int = False, removeSubsequentText:Int = True)
		bmx_replace_node_content(nodePtr, newContent, removeChildren, removeSubsequentText)
	End Method

test_replace.xml:

<?xml version="1.0"?>
<example>
    This is text.
    <child>Child text 1</child>
    More text here.
    <child>Child text 2</child>
</example>

Test code:

SuperStrict

Framework text.xml
Import BRL.StandardIO


Local doc:TxmlDoc = TxmlDoc.parseFile("test_replace.xml")
Local root:TxmlNode

print "Orig XML:"
print "---------"
doc.saveFile("-", True, True)
print "---------"

root = doc.getRootElement()
root.replaceContent("hello :-)", False, True)
print "Modified XML (keep children, remove subsequent text):"
print "---------"
doc.saveFile("-", True, True)
print "---------"

doc = TxmlDoc.parseFile("test_replace.xml")
root = doc.getRootElement()
root.replaceContent("hello :-)", False, False)
print "Modified XML (keep children, keep subsequent text):"
print "---------"
doc.saveFile("-", True, True)
print "---------"

doc = TxmlDoc.parseFile("test_replace.xml")
root = doc.getRootElement()
root.replaceContent("hello :-)", True, False) 'False not useful here...
print "Modified XML (remove children, keep subsequent text):"
print "---------"
doc.saveFile("-", True, True)
print "---------"


doc = TxmlDoc.parseFile("test_replace.xml")
root = doc.getRootElement()
root.replaceContent("hello :-)", True, True) 'same result as True,False
print "Modified XML (remove children, remove subsequent text):"
print "---------"
doc.saveFile("-", True, True)
print "---------"

Output:

Orig XML:
---------
<?xml version="1.0"?>
<example>
    This is text.
  <child>Child text 1</child>
    More text here.
  <child>Child text 2</child>
</example>
---------
Modified XML (keep children, remove subsequent text):
---------
<?xml version="1.0"?>
<example>hello :-)
  <child>Child text 1</child>
  <child>Child text 2</child>
</example>
---------
Modified XML (keep children, keep subsequent text):
---------
<?xml version="1.0"?>
<example>hello :-)
  <child>Child text 1</child>
    More text here.
  <child>Child text 2</child>
</example>
---------
Modified XML (remove children, keep subsequent text):
---------
<?xml version="1.0"?>
<example>hello :-)</example>
---------
Modified XML (remove children, remove subsequent text):
---------
<?xml version="1.0"?>
<example>hello :-)</example>
---------

So: the third parameter (remove subsequent text) is only useful if you do not ask remove any children of a node.
Means "normally" this should not be two bools but one "mode" which then could be:
remove_children_and_subsequent_text, keep_children_but_remove_subsequent_text, keep_children_and_replace_first_text.

@GWRon
Copy link

GWRon commented Dec 9, 2024

@MidimasterSoft

Brucey's fix indeed creates a "XML string" containing all the nodes and your "happy" string in it. But he currently adds it as as a "text" instead of "opaque" node. When using "getContent()" only CDATA and "opaque" nodes are taken into consideration.
This is why your "node.GetContent()" does not return "happy" but the "doc.ToString()" would make it visible.

When reloading such an "adjusted" xml file (or reparse it) then this node would become an "opaque" node ... and node.GetContent() would suddenly print it out.

I described these findings to Brucey via discord but am not sure if he reads it before fixing it on its own :-)

PS: "text" instead of "opaque" has the benefit of preserving white space if I understood that right (albeit docs confuse me there) ... so maybe this is the reason to use this instead.

According to Mini-XML (mxml) docs there is mxmlGetText to retrieve text values ... but there is also this note there:

Note: Text nodes consist of whitespace-delimited words. You will only get single words of text when reading an XML file with MXML_TYPE_TEXT nodes. If you want the entire string between elements in the XML file, you MUST read the XML file with MXML_TYPE_OPAQUE nodes and get the resulting strings using the mxmlGetOpaque function instead.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants