-
Notifications
You must be signed in to change notification settings - Fork 28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Parse special entinties #44
Comments
this is an idea, but as it is a breaking either need to be implemented behind an option or via a filter function. not sure what the best appoach would be. i will look into it. |
Yes, it is "breaking" your existing behavior, but actually moving toward correctness... it is adhering to the XML standard, and means that your parser is isomorphic to the DOM parser. This just bit me. Yes, an opt-in option at minimum, please. |
Motivation ---------- Currently, we relied on our WebAssembly MPD parser for two different scenarios: 1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of `SegmentTimeline` to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much GC pressure which was also a big issue. 2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`, being not usable in other threads. Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was because of the first scenario). Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use `DOMParser` there. Having to provide the WebAssembly code to the `MULTI_THREAD` feature is a little awkward. We recommend to applications that they use our "embedded" versions to make it more simple, though it weighs in the 400+KB. Even if it compresses very very well, it is still a huge file. It also turn out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser, like we already have on main thread today. It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which [I guess almost everyone could prefer for their applications](https://caniuse.com/webworkers). Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature. Previous work ------------- In a previous work (I never made the Pull Request for it yet), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase. I also made quick tests with dependencies like `fast-xml-parser` but performances appeared poor so I did not continue in this path. This solution ------------- To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their XML parsing. It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly. So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser). If this goes very well, we could even imagine doing like the Shaka-player and completely remove the DOMParser - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though. The code was a little hard to integrate through `npm` in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into `src/utils/xml-parser.ts`. I also had to update its code, this means that code updates on their side will have to be backported on ours. However, the code seems to not be much maintained anymore, so this is not that much of an issue. Remaining issues ---------------- There are some remaining issues: - First I did not yet add parsing for `EventStream` elements nor for `SegmentTimeline` elements yet. Both seems doable, and the latter will be the real-world test (as it can be incredibly huge on some contents). - From what I understand from TobiasNickel/tXml#44, It doesn't translate entities (like `>` to `>`). This doesn't seem to hard to implement though and is rarely important. Maybe others. There doesn't seem to be a lot of issues (but it doesn't seem to be a hugely-relied on project either) so I'll look at each of them in the future.
Motivation ---------- Currently, we relied on our WebAssembly MPD parser for two different scenarios: 1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of `SegmentTimeline` data to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much less GC pressure which was also a big issue. 2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`, is not usable in other threads. Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was because of the first scenario). Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use `DOMParser` there. Having to provide the WebAssembly code to the `MULTI_THREAD` feature is a little awkward. We recommend to applications that they use our "embedded" versions to make it more simple, though it weighs in the 400+KB. Even if it compresses very very well, it is still a huge file. It also turn out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser, like we already have on main thread today. It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which [I guess almost everyone could prefer for their applications](https://caniuse.com/webworkers). Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature. Previous work ------------- In a previous work (I never made the Pull Request for it yet), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase. I also made quick tests with dependencies like `fast-xml-parser` but performances appeared poor so I did not continue in this path. This solution ------------- To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their XML parsing. It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly. So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser). If this goes very well, we could even imagine doing like the Shaka-player and completely remove the DOMParser - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though. The code was a little hard to integrate through `npm` in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into `src/utils/xml-parser.ts`. I also had to update its code, this means that code updates on their side will have to be backported on ours. However, the code seems to not be much maintained anymore, so this is not that much of an issue. Remaining issues ---------------- There are some remaining issues: - First I did not yet add parsing for `EventStream` elements nor for `SegmentTimeline` elements yet. Both seems doable, and the latter will be the real-world test (as it can be incredibly huge on some contents). - From what I understand from TobiasNickel/tXml#44, It doesn't translate entities (like `>` to `>`). This doesn't seem to hard to implement though and is rarely important. Maybe others. There doesn't seem to be a lot of issues (but it doesn't seem to be a hugely-relied on project either) so I'll look at each of them in the future.
Motivation ---------- Currently, we relied on our WebAssembly MPD parser for two different scenarios: 1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of `SegmentTimeline` data to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much less GC pressure which was also a big issue. 2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`, is not usable in other threads. Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was because of the first scenario). Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use `DOMParser` there. Having to provide the WebAssembly code to the `MULTI_THREAD` feature is a little awkward. We recommend to applications that they use our "embedded" versions to make it more simple, though it weighs in the 400+KB. Even if it compresses very very well, it is still a huge file. It also turn out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser, like we already have on main thread today. It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which [I guess almost everyone could prefer for their applications](https://caniuse.com/webworkers). Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature. Previous work ------------- In a previous work (I never made the Pull Request for it yet), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase. I also made quick tests with dependencies like `fast-xml-parser` but performances appeared poor so I did not continue in this path. This solution ------------- To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their XML parsing. It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly. So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser). If this goes very well, we could even imagine doing like the Shaka-player and completely remove the DOMParser - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though. The code was a little hard to integrate through `npm` in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into `src/utils/xml-parser.ts`. I also had to update its code, this means that code updates on their side will have to be backported on ours. However, the code seems to not be much maintained anymore, so this is not that much of an issue. Remaining issues ---------------- There are some remaining issues: - First I did not yet add parsing for `EventStream` elements nor for `SegmentTimeline` elements yet. Both seems doable, and the latter will be the real-world test (as it can be incredibly huge on some contents). - From what I understand from TobiasNickel/tXml#44, It doesn't translate entities (like `>` to `>`). This doesn't seem to hard to implement though and is rarely important. Maybe others. There doesn't seem to be a lot of issues (but it doesn't seem to be a hugely-relied on project either) so I'll look at each of them in the future.
Motivation ---------- Currently, we relied on our WebAssembly MPD parser for two different scenarios: 1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of `SegmentTimeline` data to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much less GC pressure which was also a big issue. 2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`, is not usable in other threads. Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was because of the first scenario). Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use `DOMParser` there. Having to provide the WebAssembly code to the `MULTI_THREAD` feature is a little awkward. We recommend to applications that they use our "embedded" versions to make it more simple, though it weighs in the 400+KB. Even if it compresses very very well, it is still a huge file. It also turn out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser, like we already have on main thread today. It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which [I guess almost everyone could prefer for their applications](https://caniuse.com/webworkers). Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature. Previous work ------------- In a previous work (I never made the Pull Request for it yet), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase. I also made quick tests with dependencies like `fast-xml-parser` but performances appeared poor so I did not continue in this path. This solution ------------- To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their XML parsing. It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly. So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser). If this goes very well, we could even imagine doing like the Shaka-player and completely remove the DOMParser - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though. The code was a little hard to integrate through `npm` in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into `src/utils/xml-parser.ts`. I also had to update its code, this means that code updates on their side will have to be backported on ours. However, the code seems to not be much maintained anymore, so this is not that much of an issue. Remaining issues ---------------- There are some remaining issues: - First I did not yet add parsing for `EventStream` elements nor for `SegmentTimeline` elements yet. Both seems doable, and the latter will be the real-world test (as it can be incredibly huge on some contents). - From what I understand from TobiasNickel/tXml#44, It doesn't translate entities (like `>` to `>`). This doesn't seem to hard to implement though and is rarely important. Maybe others. There doesn't seem to be a lot of issues (but it doesn't seem to be a hugely-relied on project either) so I'll look at each of them in the future.
Motivation ---------- Currently, we relied on our WebAssembly MPD parser for two different scenarios: 1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of `SegmentTimeline` data to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much less GC pressure which was also a big issue. 2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`, is not usable in other threads. Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was because of the first scenario). Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use `DOMParser` there. Having to provide the WebAssembly code to the `MULTI_THREAD` feature is a little awkward. We recommend to applications that they use our "embedded" versions to make it more simple, though it weighs in the 400+KB. Even if it compresses very very well, it is still a huge file. It also turn out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser, like we already have on main thread today. It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which [I guess almost everyone could prefer for their applications](https://caniuse.com/webworkers). Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature. Previous work ------------- In a previous work (I never made the Pull Request for it yet), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase. I also made quick tests with dependencies like `fast-xml-parser` but performances appeared poor so I did not continue in this path. This solution ------------- To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their XML parsing. It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly. So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser). If this goes very well, we could even imagine doing like the Shaka-player and completely remove the DOMParser - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though. The code was a little hard to integrate through `npm` in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into `src/utils/xml-parser.ts`. I also had to update its code, this means that code updates on their side will have to be backported on ours. However, the code seems to not be much maintained anymore, so this is not that much of an issue. Remaining issues ---------------- There are some remaining issues: - First I did not yet add parsing for `EventStream` elements nor for `SegmentTimeline` elements yet. Both seems doable, and the latter will be the real-world test (as it can be incredibly huge on some contents). - From what I understand from TobiasNickel/tXml#44, It doesn't translate entities (like `>` to `>`). This doesn't seem to hard to implement though and is rarely important. Maybe others. There doesn't seem to be a lot of issues (but it doesn't seem to be a hugely-relied on project either) so I'll look at each of them in the future.
Motivation ---------- Currently, we relied on our WebAssembly MPD parser for two different scenarios: 1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of `SegmentTimeline` data to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much less GC pressure which was also a big issue. 2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`, is not usable in other threads. Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was because of the first scenario). Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use `DOMParser` there. Having to provide the WebAssembly code to the `MULTI_THREAD` feature is a little awkward. We recommend to applications that they use our "embedded" versions to make it more simple, though it weighs in the 400+KB. Even if it compresses very very well, it is still a huge file. It also turn out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser, like we already have on main thread today. It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which [I guess almost everyone could prefer for their applications](https://caniuse.com/webworkers). Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature. Previous work ------------- In a previous work (I never made the Pull Request for it yet), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase. I also made quick tests with dependencies like `fast-xml-parser` but performances appeared poor so I did not continue in this path. This solution ------------- To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their XML parsing. It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly. So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser). If this goes very well, we could even imagine doing like the Shaka-player and completely remove the DOMParser - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though. The code was a little hard to integrate through `npm` in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into `src/utils/xml-parser.ts`. I also had to update its code, this means that code updates on their side will have to be backported on ours. However, the code seems to not be much maintained anymore, so this is not that much of an issue. Remaining issues ---------------- There are some remaining issues: - First I did not yet add parsing for `EventStream` elements nor for `SegmentTimeline` elements yet. Both seems doable, and the latter will be the real-world test (as it can be incredibly huge on some contents). - From what I understand from TobiasNickel/tXml#44, It doesn't translate entities (like `>` to `>`). This doesn't seem to hard to implement though and is rarely important. Maybe others. There doesn't seem to be a lot of issues (but it doesn't seem to be a hugely-relied on project either) so I'll look at each of them in the future.
Motivation ---------- Currently, we relied on our WebAssembly MPD parser for two different scenarios: 1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of `SegmentTimeline` data to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much less GC pressure which was also a big issue. 2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`, is not usable in other threads. Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was because of the first scenario). Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use `DOMParser` there. Having to provide the WebAssembly code to the `MULTI_THREAD` feature is a little awkward. We recommend to applications that they use our "embedded" versions to make it more simple, though it weighs in the 400+KB. Even if it compresses very very well, it is still a huge file. It also turn out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser, like we already have on main thread today. It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which [I guess almost everyone could prefer for their applications](https://caniuse.com/webworkers). Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature. Previous work ------------- In a previous work (I never made the Pull Request for it yet), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase. I also made quick tests with dependencies like `fast-xml-parser` but performances appeared poor so I did not continue in this path. This solution ------------- To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their XML parsing. It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly. So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser). If this goes very well, we could even imagine doing like the Shaka-player and completely remove the DOMParser - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though. The code was a little hard to integrate through `npm` in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into `src/utils/xml-parser.ts`. I also had to update its code, this means that code updates on their side will have to be backported on ours. However, the code seems to not be much maintained anymore, so this is not that much of an issue. Remaining issues ---------------- There are some remaining issues: - First I did not yet add parsing for `EventStream` elements nor for `SegmentTimeline` elements yet. Both seems doable, and the latter will be the real-world test (as it can be incredibly huge on some contents). - From what I understand from TobiasNickel/tXml#44, It doesn't translate entities (like `>` to `>`). This doesn't seem to hard to implement though and is rarely important. Maybe others. There doesn't seem to be a lot of issues (but it doesn't seem to be a hugely-relied on project either) so I'll look at each of them in the future.
Motivation ---------- Currently, we relied on our WebAssembly MPD parser for two different scenarios: 1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of `SegmentTimeline` data to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much less GC pressure which was also a big issue. 2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`, is not usable in other threads. Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was because of the first scenario). Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use `DOMParser` there. Having to provide the WebAssembly code to the `MULTI_THREAD` feature is a little awkward. We recommend to applications that they use our "embedded" versions to make it more simple, though it weighs in the 400+KB. Even if it compresses very very well, it is still a huge file. It also turn out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser, like we already have on main thread today. It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which [I guess almost everyone could prefer for their applications](https://caniuse.com/webworkers). Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature. Previous work ------------- In a previous work (I never made the Pull Request for it yet), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase. I also made quick tests with dependencies like `fast-xml-parser` but performances appeared poor so I did not continue in this path. This solution ------------- To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their XML parsing. It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly. So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser). If this goes very well, we could even imagine doing like the Shaka-player and completely remove the DOMParser - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though. The code was a little hard to integrate through `npm` in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into `src/utils/xml-parser.ts`. I also had to update its code, this means that code updates on their side will have to be backported on ours. However, the code seems to not be much maintained anymore, so this is not that much of an issue. Remaining issues ---------------- There are some remaining issues: - First I did not yet add parsing for `EventStream` elements nor for `SegmentTimeline` elements yet. Both seems doable, and the latter will be the real-world test (as it can be incredibly huge on some contents). - From what I understand from TobiasNickel/tXml#44, It doesn't translate entities (like `>` to `>`). This doesn't seem to hard to implement though and is rarely important. Maybe others. There doesn't seem to be a lot of issues (but it doesn't seem to be a hugely-relied on project either) so I'll look at each of them in the future.
Motivation ---------- Currently, we relied on our WebAssembly MPD parser for two different scenarios: 1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of `SegmentTimeline` data to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much less GC pressure which was also a big issue. 2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`, is not usable in other threads. Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was because of the first scenario). Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use `DOMParser` there. Having to provide the WebAssembly code to the `MULTI_THREAD` feature is a little awkward. We recommend to applications that they use our "embedded" versions to make it more simple, though it weighs in the 400+KB. Even if it compresses very very well, it is still a huge file. It also turn out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser, like we already have on main thread today. It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which [I guess almost everyone could prefer for their applications](https://caniuse.com/webworkers). Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature. Previous work ------------- In a previous work (I never made the Pull Request for it yet), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase. I also made quick tests with dependencies like `fast-xml-parser` but performances appeared poor so I did not continue in this path. This solution ------------- To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their XML parsing. It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly. So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser). If this goes very well, we could even imagine doing like the Shaka-player and completely remove the DOMParser - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though. The code was a little hard to integrate through `npm` in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into `src/utils/xml-parser.ts`. I also had to update its code, this means that code updates on their side will have to be backported on ours. However, the code seems to not be much maintained anymore, so this is not that much of an issue. Remaining issues ---------------- There are some remaining issues: - First I did not yet add parsing for `EventStream` elements nor for `SegmentTimeline` elements yet. Both seems doable, and the latter will be the real-world test (as it can be incredibly huge on some contents). - From what I understand from TobiasNickel/tXml#44, It doesn't translate entities (like `>` to `>`). This doesn't seem to hard to implement though and is rarely important. Maybe others. There doesn't seem to be a lot of issues (but it doesn't seem to be a hugely-relied on project either) so I'll look at each of them in the future.
Motivation ---------- Currently, we relied on our WebAssembly MPD parser for two different scenarios: 1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of `SegmentTimeline` data to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much less GC pressure which was also a big issue. 2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`, is not usable in other threads. Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was because of the first scenario). Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use `DOMParser` there. Having to provide the WebAssembly code to the `MULTI_THREAD` feature is a little awkward. We recommend to applications that they use our "embedded" versions to make it more simple, though it weighs in the 400+KB. Even if it compresses very very well, it is still a huge file. It also turn out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser, like we already have on main thread today. It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which [I guess almost everyone could prefer for their applications](https://caniuse.com/webworkers). Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature. Previous work ------------- In a previous work (I never made the Pull Request for it yet), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase. I also made quick tests with dependencies like `fast-xml-parser` but performances appeared poor so I did not continue in this path. This solution ------------- To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their XML parsing. It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly. So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser). If this goes very well, we could even imagine doing like the Shaka-player and completely remove the DOMParser - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though. The code was a little hard to integrate through `npm` in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into `src/utils/xml-parser.ts`. I also had to update its code, this means that code updates on their side will have to be backported on ours. However, the code seems to not be much maintained anymore, so this is not that much of an issue. Remaining issues ---------------- There are some remaining issues: - First I did not yet add parsing for `EventStream` elements nor for `SegmentTimeline` elements yet. Both seems doable, and the latter will be the real-world test (as it can be incredibly huge on some contents). - From what I understand from TobiasNickel/tXml#44, It doesn't translate entities (like `>` to `>`). This doesn't seem to hard to implement though and is rarely important. Maybe others. There doesn't seem to be a lot of issues (but it doesn't seem to be a hugely-relied on project either) so I'll look at each of them in the future.
Motivation ---------- Currently, we relied on our WebAssembly MPD parser for two different scenarios: 1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of `SegmentTimeline` data to parse). Also Relying on WebAssembly here instead of DOM parsing led us to much less GC pressure which was also a big issue. 2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`, is not usable in other threads. Though that second scenario only relied on the WebAssembly parser because it was already written before (and it was because of the first scenario). Nothing stops us from relying on a JavaScript MPD parser in Multithread mode, we only cannot use `DOMParser` there. Having to provide the WebAssembly code to the `MULTI_THREAD` feature is a little awkward. We recommend to applications that they use our "embedded" versions to make it more simple, though it weighs in the 400+KB. Even if it compresses very very well, it is still a huge file. It also turn out that WebAssembly is much more recent than the WebWorker API and as such we're currently not able to rely on the Multithread mode on very old devices like old smart TV models and old game consoles. Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device might be compatible to WebAssembly but might fail to compile it for various reasons, leading to a fallback to main thread - it would be better to have a fallback to a JS Parser, like we already have on main thread today. It may even make Multithread-only RxPlayer builds (e.g. not having to import both monothread code and multithread code, just in case multithreading is not possible) much more doable, which [I guess almost everyone could prefer for their applications](https://caniuse.com/webworkers). Thus, relying on a JavaScript parser in a Multithread scenario could be a very nice feature. Previous work ------------- In a previous work (I never made the Pull Request for it yet), I compiled down the WebAssembly file to JavaScript (through binaryen's wasm2js util), but it involved a lot of manual maintainance so I quickly abandonned it (I may re-explore that way in the future). This could have been nice as it prevented adding yet another MPD parser to the codebase. I also made quick tests with dependencies like `fast-xml-parser` but performances appeared poor so I did not continue in this path. This solution ------------- To be perfectly honest, it was only after looking at some Shaka-player code that I noticed that they now rely on a "txml" dependency for their XML parsing. It's actually very recent: shaka-project/shaka-player@7116a34 (the recent-ness of it made me feel that I may be looking at their codebase a little too much ^^) and it seems to be on their side for performance reason - very interestingly. So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/). It is a fairly minimal XML DOM parser with a specific focus on speed. It advertises speed competitive with the native DOMParser API and quite amazingly it actually was, sometimes it was even faster (though still slower than our WebAssembly parser). If this goes very well, we could even imagine doing like the Shaka-player and completely remove the DOMParser - even opening the way to also do things like parsing subtitles in a worker. For that, there is still a lot to do though. The code was a little hard to integrate through `npm` in a TypeScript client-side project (for various reasons) so I made the same choice than Shaka-player by completely copying its code (keeping the licence in the file) into `src/utils/xml-parser.ts`. I also had to update its code, this means that code updates on their side will have to be backported on ours. However, the code seems to not be much maintained anymore, so this is not that much of an issue. Remaining issues ---------------- There are some remaining issues: - First I did not yet add parsing for `EventStream` elements nor for `SegmentTimeline` elements yet. Both seems doable, and the latter will be the real-world test (as it can be incredibly huge on some contents). - From what I understand from TobiasNickel/tXml#44, It doesn't translate entities (like `>` to `>`). This doesn't seem to hard to implement though and is rarely important. Maybe others. There doesn't seem to be a lot of issues (but it doesn't seem to be a hugely-relied on project either) so I'll look at each of them in the future.
With DomParser, this attributes with special attributes like
>
,<
are replaced with the corresponding character, will be possible txml do the same?The text was updated successfully, but these errors were encountered: