You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I searched the LangChain.js documentation with the integrated search.
I used the GitHub search to find a similar question and didn't find it.
I am sure that this is a bug in LangChain.js rather than my code.
The bug is not resolved by updating to the latest stable version of LangChain (or the specific integration package).
Example Code
import{XMLOutputParser}from"./output_parsers/xml.js";constxml=`<?xml version="1.0" encoding="UTF-8"?><userProfile> <userID>12345</userID> <email>[email protected]</email> <bio><![CDATA[John is a senior developer with <10 years> of experience. He uses <Typescript> at work.]]></bio></userProfile>`;constparser=newXMLOutputParser();parser.parse(xml).then(console.log).catch(console.error);
{
userProfile: [
{ userID: '12345' },
{ email: '[email protected]' },
{
bio: 'John is a senior developer with <10 years> of experience. He uses <Typescript> at work.'
}
]
}
As we can see XMLOutputParser does not parse CDATA text.
Error Message and Stack Trace (if applicable)
No response
Description
The XMLOutputParser in the LangChain library is not correctly parsing XML content that includes CDATA sections when used in a Node.js environment. The parser appears to be ignoring the CDATA content, resulting in incomplete parsing of XML.
System Info
OS: mac os
Node version: 20.14.0
Yarn version: 1.22.22
The text was updated successfully, but these errors were encountered:
The issue with the XMLOutputParser not parsing CDATA sections correctly is due to the sax.parser used in the parseXMLMarkdown function not handling CDATA sections. The current implementation lacks an event handler for CDATA sections.
To fix this, you need to add an event handler for oncdata in the sax.parser configuration. Here is how you can modify the code:
Checked other resources
Example Code
Output:
Expected Output:
As we can see XMLOutputParser does not parse CDATA text.
Error Message and Stack Trace (if applicable)
No response
Description
The XMLOutputParser in the LangChain library is not correctly parsing XML content that includes CDATA sections when used in a Node.js environment. The parser appears to be ignoring the CDATA content, resulting in incomplete parsing of XML.
System Info
OS: mac os
Node version: 20.14.0
Yarn version: 1.22.22
The text was updated successfully, but these errors were encountered: