-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fails to parse XML comment as stream #14
Comments
that looks like a bug, thanks, I will have a test till the weekend. |
I just added comment support in streams, that was missing: 5e55ea2#diff-39fbd46f4381411507cbf464fd4fea02R491 And had some more tests. But first I want to help you with your task at hand. I think it is better to read the complete svg file and parse it in one go, even if the file is 20mb. If you need to do lots of processing, I would advise to use workers processes, maybe using piscina. So, back to the comments in stream issue. I found this svg file currently only works when the svg file has a trailing comma. You have to know, that when parsing the svg not skipping the opening svg tag, will in the The stream was implemented and debugged using the OSM planet file. there is the initial type-definition, then a genral the current implementation ignores the comments. I will have some more test before publish to NPM. Thanks for opening this issue, let me know your thoughts. |
@TobiasNickel: The current This is the parsing part of SVGO: https://github.com/svg/svgo/blob/master/lib/svgo/svg2js.js#L26 When you could drop-in/integrate your |
It would be cool, if svgo could profit from this library. still, the streaming works much different than on other libraries. Not every tag, text, attribute is an event/item, but complete sub-trees. ) At the weekend I made two things, fork svgo and re-implement the scg2js.js. Second, an update on the parser is needed. Throw error when some tag was broken and extened parsing parsing for the doctype. With it currently I was able to fulfill all of the libs own tests, but not the plugin tests jet. @strarsis what do you think? for now the changes are in a branch, as i think they are breaking changes in txml. I need some more tests. now that there are a number of users for this lib, we have to be more careful. however woth this work, this parser can get more stable. |
@TobiasNickel: Awesome!
|
hi, just want to say, this matter is not forgotten. Last week I was working on a blog post about Web-Push. Today, I continued a little with debugging the |
hi, little update, Today, I was able to run tall tests of svgo !!! before making a PR, I first have to make an update to txml. absolutely, some update to the parser was needed. The work of working through this, has lead to some very good changes, to txml, that I am going to publish soon as version 4. |
@TobiasNickel: This is awesome! Finally some solid progress is possible with svgo. |
wow, version 4 of txml is published. now it does not directly export the parse function, but an object with a parse function. this is for better compatibility between node and typescript modules. and here is the PR to svgo: only one file changed and all its tests passed. |
and, as for this issue, there is now the |
I wansn't sure about opening another issue but processing instructions also cause <?xml version="1.0" encoding="UTF-8"?>
<a>b</a> |
Hi @santialbo. Sorry for by late reply, I was kind of busy during chinese new year. The parsing of the processing instructions was quite a new feature for the main If your file is not to big (even up to ~50-100mb and if you give node.js enough memory, even parsing 2gb would be possible, but you can imagine not optimal) it is ok to parse the complete thing at once (as long as it is only one file at a time). usually large xml files (wikipedia export or open street map planet file) have a structure like this: <? processing instructions ?>
<somewrapper>
<item><child>data</child></item>
<item><child>data1</child></item>
<item><child>data2</child></item>
<item><child>data3</child></item>
</somewrapper> When parsing large files, with txml you need to first skip over the pre-ample in my example that is until after As for now, I am not sure. Are there xml files with a structure without a wrapper?: <? processing instructions ?>
<item><child>data</child></item>
<item><child>data1</child></item>
<item><child>data2</child></item>
<item><child>data3</child></item> Currently you can still parse the const processingInstructionStream = fs.createReadStream(files.processingInstructionStream)
.pipe(tXml.transformStream(`<?xml version="1.0" encoding="UTF-8"?>`));
for await(let element of processingInstructionStream) {
console.log(element)
} note: that the first argument to I hope this explanation is clear. can I ask what kind of xml documents you want to process? |
Hi @TobiasNickel, first thanks for this awesome lib. I have a pretty large xml file ~200MB or more and trying to use the <?xml version="1.0" encoding="utf8"?>
<note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note> const xmlStream = fs
.createReadStream('note.xml')
.pipe(txml.transformStream())
for await(let element of xmlStream) {
console.log(element)
} Also if the file is formatted without the <note>
<to>Tove</to>
<from>Jani</from>
<heading>Reminder</heading>
<body>Don't forget me this weekend!</body>
</note> like with spaces and tabs it only works with this const xmlStream = fs
.createReadStream('note.xml')
.pipe(txml.transformStream('<')) // this works
const xmlStream = fs
.createReadStream('note.xml')
.pipe(txml.transformStream('0')) // this works
const xmlStream = fs
.createReadStream('note.xml')
.pipe(txml.transformStream()) // this does not work could you explain more how to use this parameter, in what cases |
the idea of the offset(the first argument is to skip the preample that is not data). in your case that would be: or does this work for you? |
When a SVG XML file begins with an XML comment, the SVG file is not parsed (stream ends immediately):
Without the comment, parsing as stream works fine:
The text was updated successfully, but these errors were encountered: