Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Parse special entinties #44

Open
ecarreras opened this issue Sep 20, 2022 · 2 comments
Open

Parse special entinties #44

ecarreras opened this issue Sep 20, 2022 · 2 comments

Comments

@ecarreras
Copy link

With DomParser, this attributes with special attributes like >, < are replaced with the corresponding character, will be possible txml do the same?

@TobiasNickel
Copy link
Owner

this is an idea, but as it is a breaking either need to be implemented behind an option or via a filter function. not sure what the best appoach would be. i will look into it.

@vorth
Copy link

vorth commented Jul 22, 2023

Yes, it is "breaking" your existing behavior, but actually moving toward correctness... it is adhering to the XML standard, and means that your parser is isomorphic to the DOM parser. This just bit me. Yes, an opt-in option at minimum, please.

peaBerberian added a commit to canalplus/rx-player that referenced this issue Feb 13, 2024
Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much GC pressure which was also a big
     issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     being not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
peaBerberian added a commit to canalplus/rx-player that referenced this issue Feb 13, 2024
Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` data to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much less GC pressure which was also
     a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
peaBerberian added a commit to canalplus/rx-player that referenced this issue Feb 13, 2024
Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` data to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much less GC pressure which was also
     a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
peaBerberian added a commit to canalplus/rx-player that referenced this issue Feb 14, 2024
Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` data to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much less GC pressure which was also
     a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
peaBerberian added a commit to canalplus/rx-player that referenced this issue Feb 15, 2024
Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` data to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much less GC pressure which was also
     a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
peaBerberian added a commit to canalplus/rx-player that referenced this issue Feb 20, 2024
Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` data to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much less GC pressure which was also
     a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
peaBerberian added a commit to canalplus/rx-player that referenced this issue Feb 20, 2024
Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` data to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much less GC pressure which was also
     a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
peaBerberian added a commit to canalplus/rx-player that referenced this issue Feb 20, 2024
Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` data to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much less GC pressure which was also
     a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
peaBerberian added a commit to canalplus/rx-player that referenced this issue Feb 20, 2024
Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` data to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much less GC pressure which was also
     a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
peaBerberian added a commit to canalplus/rx-player that referenced this issue Feb 23, 2024
Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` data to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much less GC pressure which was also
     a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
peaBerberian added a commit to canalplus/rx-player that referenced this issue Jun 13, 2024
Motivation
----------

Currently, we relied on our WebAssembly MPD parser for two different
scenarios:

  1. performance reasons (on **HUGE** MPD of tens of MB, with a lot of
     `SegmentTimeline` data to parse). Also Relying on WebAssembly here
     instead of DOM parsing led us to much less GC pressure which was also
     a big issue.

  2. Multithread scenarios as the browser's own fast XML parser, `DOMParser`,
     is not usable in other threads.

Though that second scenario only relied on the WebAssembly parser
because it was already written before (and it was because of the first
scenario).
Nothing stops us from relying on a JavaScript MPD parser in Multithread
mode, we only cannot use `DOMParser` there.

Having to provide the WebAssembly code to the `MULTI_THREAD` feature is
a little awkward. We recommend to applications that they use our
"embedded" versions to make it more simple, though it weighs in the 400+KB.
Even if it compresses very very well, it is still a huge file.

It also turn out that WebAssembly is much more recent than the WebWorker
API and as such we're currently not able to rely on the Multithread
mode on very old devices like old smart TV models and old game consoles.
Even worse, an issue in the `v4.0.0-rc.1` made us realize that a device
might be compatible to WebAssembly but might fail to compile it for
various reasons, leading to a fallback to main thread - it would be
better to have a fallback to a JS Parser, like we already have on main
thread today.

It may even make Multithread-only RxPlayer builds (e.g. not having to import
both monothread code and multithread code, just in case multithreading is
not possible) much more doable, which [I guess almost everyone could prefer
for their applications](https://caniuse.com/webworkers).

Thus, relying on a JavaScript parser in a Multithread scenario could be
a very nice feature.

Previous work
-------------

In a previous work (I never made the Pull Request for it yet), I
compiled down the WebAssembly file to JavaScript (through binaryen's
wasm2js util), but it involved a lot of manual maintainance so I quickly
abandonned it (I may re-explore that way in the future). This could have
been nice as it prevented adding yet another MPD parser to the
codebase.

I also made quick tests with dependencies like `fast-xml-parser` but
performances appeared poor so I did not continue in this path.

This solution
-------------

To be perfectly honest, it was only after looking at some Shaka-player
code that I noticed that they now rely on a "txml" dependency for their
XML parsing.
It's actually very recent: shaka-project/shaka-player@7116a34
(the recent-ness of it made me feel that I may be looking at their codebase a
little too much ^^) and it seems to be on their side for performance
reason - very interestingly.

So I looked up that txml thing (repo available here: https://github.com/TobiasNickel/tXml/).
It is a fairly minimal XML DOM parser with a specific focus on speed.
It advertises speed competitive with the native DOMParser API and quite
amazingly it actually was, sometimes it was even faster (though still slower
than our WebAssembly parser).

If this goes very well, we could even imagine doing like the
Shaka-player and completely remove the DOMParser - even opening the way
to also do things like parsing subtitles in a worker. For that, there is
still a lot to do though.

The code was a little hard to integrate through `npm` in a TypeScript
client-side project (for various reasons) so I made the same choice than
Shaka-player by completely copying its code (keeping the licence in the
file) into `src/utils/xml-parser.ts`.

I also had to update its code, this means that code updates on their
side will have to be backported on ours.
However, the code seems to not be much maintained anymore, so this is
not that much of an issue.

Remaining issues
----------------

There are some remaining issues:

  - First I did not yet add parsing for `EventStream` elements nor for
    `SegmentTimeline` elements yet. Both seems doable, and the latter
    will be the real-world test (as it can be incredibly huge on some
    contents).

  - From what I understand from TobiasNickel/tXml#44,
    It doesn't translate entities (like `>` to `>`). This doesn't seem to
    hard to implement though and is rarely important.

Maybe others. There doesn't seem to be a lot of issues (but it doesn't
seem to be a hugely-relied on project either) so I'll look at each of
them in the future.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants