-
Notifications
You must be signed in to change notification settings - Fork 416
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Fix playback of AAC in TS files #722
base: main
Are you sure you want to change the base?
Conversation
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
1141d35
to
9fc4438
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the contribution and fixing the bug in the library.
Please clarify some of the minor comments added to the PR and I should be able to merge once they are resolved.
libraries/extractor/src/main/java/androidx/media3/extractor/ts/AdtsReader.java
Show resolved
Hide resolved
} | ||
|
||
// Calculate total PCE size including initial PCE tag and a zero length comment. | ||
int pceSize = (pceBuffer.getPosition() + extraBits + 7) / 8 + 1; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why do we assume a zero length comment
here? As per the spec, I cannot see to find such an assumption.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Well, we don't need to copy across the comment so there's no point buffering it up. If we had to copy the comment we'd need to increase AAC_PCE_MAX_SIZE
from 49 to 305, which will increase the pceBuffer size.
Having said that, if we increase AAC_PCE_MAX_SIZE to 50 we'd get the byte representing the comment size. We could read that to more robustly check if the PCE fits into the sample in the below conditional. I'll make this change.
libraries/extractor/src/main/java/androidx/media3/extractor/ts/AdtsReader.java
Outdated
Show resolved
Hide resolved
@@ -66,6 +67,9 @@ public final class AdtsReader implements ElementaryStreamReader { | |||
private static final byte[] ID3_IDENTIFIER = {'I', 'D', '3'}; | |||
private static final int VERSION_UNSET = -1; | |||
|
|||
private static final int AAC_PCE_MIN_SIZE = 6; | |||
private static final int AAC_PCE_MAX_SIZE = 49; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This values are not defined in the spec, I'm assuming you're calculating these based on min and max of values/fields present in the spec right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes.
The AAC_PCE_MIN_SIZE is the theoretical minimum size a PCE block can be, based on the possible values of fields in the spec. I'm using that to check against the sample size to determine whether the sample could even hold a PCE block. This is for safety. However, it also has the side effect of making sure we never allocate a pceBuffer of less than 6 bytes in size, which happens to be the exact amount we parse before calculating the actual PCE size which we test against the actual sample size.
The AAC_PCE_MAX_SIZE is the theoretical maximum size, minus the 1 byte comment size and possible 255 bytes of comment that follows. I'm using this to limit the amount of data we're buffering up in pceBuffer.
|
||
System.arraycopy(oldConfig, 0, newConfig, 0, oldConfig.length); | ||
pceBuffer.setPosition(3 /* PCE tag */); | ||
pceBuffer.readBits(newConfig, oldConfig.length, numPceBits); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
While copying the pceBuffer
we are copying only until numPceBits
but when defining the new configSize
we use configSize += (numPceBits + 7) / 8 + 1
, why do we need to add zero length comment
to the size when we don't copy it?
Should this be simply configSize += (numPceBits + 7) / 8
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@lawadr Please see this comment which was added later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess so that we are appending a valid PCE payload to the config. If we leave out the zero comment size byte then something that parses this as a PCE underneath could fail.
Anyway, a better reason is that if you remux the TS exhibiting this issue to an MP4 or MKV, where this whole config (including the PCE part that this PR appends) is stored somewhere in the container's header/metadata, you'll see that this change matches up 1:1 with it. It results in a TS file submitting the exact same config as its original MP4/MKV file.
If you check the original bug report:
TS: {0x11, 0x80}
MKV: {0x11, 0x80, 0x04, 0xC8, 0x09, 0x00, 0x01, 0x08, 0xC8, 0x00, 0x00}
That last byte would coincide with the comment size byte.
With this PR:
TS: {0x11, 0x80, 0x04, 0xC8, 0x09, 0x00, 0x01, 0x08, 0xC8, 0x00, 0x00}
MKV: {0x11, 0x80, 0x04, 0xC8, 0x09, 0x00, 0x01, 0x08, 0xC8, 0x00, 0x00}
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This example is for that particular media, and it may be the case that the value of commentLength
for it is zero. We cannot generalise and write code based on one example right?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The last byte there is definitely the comment size. It's in the ISO 14496-3 (2009) spec. This whole series of bytes that makes up the config is described in Table 1.15 – Syntax of AudioSpecificConfig(). For the AAC-LC object type (which parseAdtsHeader
forces the stream to be treated as), the bits after the first 13 are described in Table 4.1 – Syntax of GASpecificConfig(). As you can see, if the channel configuration is 0, the PCE is output, as described in Table 4.2 – Syntax of program_config_element() (identical to the PCE block as defined in the MPEG-2 spec), which includes a pascal string comment at the end.
Are you asking if we should copy across a comment if it does exist and therefore doesn't have a size of 0? There's no harm in doing so, though that will requiring making our pceBuffer over 6x larger to accommodate a potential 255 byte comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
No, but if we are already reading the commentLength
should we at least copy that across?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Or because we don't pass the comments
, we should advertise commentLength
as 0
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Exactly that. If we copy across the comment length and it isn't 0, then a parser of this data may read that many bytes past the end of the config. In other words, if we don't want to bother copying the comment, we must set the size to 0.
Hi @lawadr, Can we also please add a test for this in |
No problem. I've just pushed a couple. It needed a second test because the first would succeed with the old behaviour (pre-this PR) as we can't easily assert the submitted format's initializationData. The second ensures that the format is withheld until the PCE is found, so it's a test for failure. This second test would "succeed" with the old code where as it shouldn't now. |
acdc660
to
70d4eb5
Compare
When the channel config is 0 in the ADTS header, delay the submission of the format to the output until a Program Config Element (PCE) is found and appended to the format's initialization data. This reproduces the initialization data payload that is submitted by the same AAC stream when inside other containers such as MP4 or MKV.
70d4eb5
to
ff515d3
Compare
I'm going to send this for internal review now. You may see some more commits being added as I make changes in response to review feedback. Please refrain from pushing any more substantive changes as it will complicate the internal review - thanks! |
b2c1100
to
9bb0004
Compare
83cc9e2
to
6d1951a
Compare
int offset = data.getPosition(); | ||
int limit = offset + min(sampleSize, AAC_PCE_MAX_SIZE); | ||
ParsableBitArray pceBuffer = | ||
new ParsableBitArray(Arrays.copyOfRange(data.getData(), offset, limit)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
How does this work when data
doesn't contain the entire PCE? If you don't use continueRead
to buffer the PCE up then you need some sort of byte-by-byte state tracking inside readAacProgramConfigElement
, which you don't have.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What do you mean by entire PCE?
Instead of calling continueRead
which moved the data
position further ahead to read pceBuffer
, we are now copying the pceBuffer
from the data
without changing its position. Once PCE data is read and appended to the initializationData
of format
, setReadingState
will be triggered which reads the sample and moves the position in data as we would have done when this method was not in place.
We're now copying the data only once here instead of doing it twice in the previous approach. Please explain if you have something else in mind with this comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Also instead of relying on IllegalStateException
when format
is not set for the case when channelConfig
is 0
and not enough bytes in PCE
, we now throw ParserException
as PCE data should be present in this case.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I understand why you've done that as it does simplify the code. It just doesn't work.
Take this test for example:
@Test
public void aacPceSplit() throws ParserException {
byte[] first = Arrays.copyOf(AAC_PCE_TEST_DATA, AAC_PCE_ADTS_HEADER.length + 1);
byte[] second = Arrays.copyOfRange(AAC_PCE_TEST_DATA, AAC_PCE_ADTS_HEADER.length + 1, AAC_PCE_TEST_DATA.length);
data = new ParsableByteArray(first);
feed();
data = new ParsableByteArray(second);
feed();
assertSampleCounts(0, 1);
adtsOutput.assertSample(0, AAC_PCE_ADTS_CONTENT, 0, C.BUFFER_FLAG_KEY_FRAME, null);
}
This test passes enough data to cover just the ADTS header and the first byte of the PCE, then follows it up with the rest of the PCE. This doesn't work with your change.
Without storing the entire PCE in a buffer (using continueRead
), you've lost the ability to break up the PCE into multiple calls to consume
.
Imagine getting rid of continueRead
from the STATE_READING_ADTS_HEADER
case. It would break every time an ADTS header staddles the boundary of multiple consume
calls. It's the same issue here but with the PCE.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation. Makes sense to me now. I'll also add this test case now.
Do you agree with the change that once readAacProgramConfigElement
is called and not enough bits are present or the PCE tag is not present, it is better to throw ParserException
instead of silently not setting the format and relying on failing later because of this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Firstly, ignore my prior post as it's full of inaccuracies.
Also based on the spec, the channel configuration is part of the fixed adts header and will be same in all the headers, so if one adts header says channel configuration is 0, the rest of the headers would do the same and it should be followed by PCE data in the raw blocks.
This isn't true. Once a PCE is given inside a raw_data_block
, it applies from that point onwards until another PCE is defined inside another raw_data_block
. It says so in the specification:
"The configuration indicated by the PCE takes effect at the raw_data_block() containing the
PCE. The number of front, side and back channels as specified in the PCE must be present
in that block and all subsequent raw_data_block()'s until a raw_data_block() containing a
new PCE is transmitted."
Typically, as streams do not tend to change their channel setups, you'll only see one PCE at the very start. This is what I see in my sample. The first block starts with a PCE, while the following blocks jump straight into the SCEs and CPEs.
Can you please shed more light on how this can be legitimately reached? I think it is safe to throw an exception even in this case. Also I couldn't find anything about PCE just present once per file, I think in that case if we start from the middle we're gonna crash anyways right?
So this brings us to TS files, specifically as part of an HLS playlist. If a PCE is present at the start of the first AAC block in a particular TS file, that TS file can be played in isolation and used as the starting point of the HLS.
You're right that if a PCE has not been encountered then samples cannot be decoded and we should throw. However, the issue is that we are determining whether to enter readAacProgramConfigElement
based on the hasOutputFormat
member of the AdtsReader
class. The problem with that is if you play an HLS from the start and skip far enough ahead to another TS file, a new AdtsReader
will be constructed and hasOutputFormat
will be initialised to false
again, even though we'll be reusing the previously created TrackOutput
that has already received the format from before. In other words, I think we need to base whether to go into readAacProgramConfigElement
on whether the currentOutput
has previously received a format.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thank you for the explanation. Can you please add a commit which makes the changes you suggested above?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I guess in that case we will be entering readAacProgramConfigElement
only when we expect PCE data to be present and can throw an error when non-PCE tag (!= 5) is encountered.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a closer look and discussing it internally, it seems you can't really check if currentOutput
has a set Format
or not.
The configuration indicated by the PCE takes effect at the raw_data_block() containing the PCE. The number of front, side and back channels as specified in the PCE must be present in that block and all subsequent raw_data_block()'s until a raw_data_block() containing a new PCE is transmitted.
The quoted documentation only makes sense within a single adts frame. From what we understand, it seems every adts frame should have a header and n
number of raw blocks. Ts
container is made for broadcasting and it is supposed to resume playback from the start of any frame in the sample. If the PCE
data is only present at the start of first frame, playback would not be possible if you're starting playback from a frame which is not the start.
To reach a conclusion and merge this PR, I would request you to share the HLS stream (if possible) so that we can debug the sample on our end and understand what exactly is happening. The current file sent to us is just a sample file and I cannot reproduce the scenarios as stated above i.e. seeking inside a HLS stream.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Taking a closer look and discussing it internally, it seems you can't really check if currentOutput has a set Format or not.
Yes. This is a problem.
The quoted documentation only makes sense within a single adts frame. From what we understand, it seems every adts frame should have a header and n number of raw blocks. Ts container is made for broadcasting and it is supposed to resume playback from the start of any frame in the sample. If the PCE data is only present at the start of first frame, playback would not be possible if you're starting playback from a frame which is not the start.
It does have a header and a number of raw blocks. You can see that in adts_frame
in the spec. There's also something defined above adts_frame
though, and that's adts_sequence
which loosely defines a series of frames. I can see nothing that indicates a PCE only applies to the end of its adts_frame
instead of adts_sequence
.
To reach a conclusion and merge this PR, I would request you to share the HLS stream (if possible) so that we can debug the sample on our end and understand what exactly is happening. The current file sent to us is just a sample file and I cannot reproduce the scenarios as stated above i.e. seeking inside a HLS stream.
I've got a test HLS here that exhibits this behaviour. It's 1.2gb though so I'm not sure how to send it across.
If you have ffmpeg you can create it yourself in no time:
- Download an MP4 of Blender's Sprite Fright short: https://dl.acm.org/doi/10.1145/3550339.3556000
ffmpeg -i <input.mp4> -c:v copy -c:a aac -aac_pce 1 -f hls -hls_list_size 0 <output.m3u8>
This HLS plays fine in VLC, for example. You can play it from the beginning, skip forward and backwards. You can even start playback from a position that has no PCE using the --start-time=
command line argument. I guess it either always checks the first TS file in the playlist to find the PCE or it works out the channel layout implicitly (see 8.5.2.3 Implicit channel mapping in the spec)?
If we throw whenever we can't find a PCE then you can only play the stream from the start and can never skip forward or backwards. If we don't throw (or somehow check whether currentOutput
already has a format) then you can skip forward and backwards as long as the stream was played from the start. You still won't be able to start part way through the stream unless it's reencoded to replicate the PCE in every frame, we parse the stream from the start until the PCE is found, or we implicitly work out the PCE as described in the spec.
6a1649d
to
3adabe8
Compare
3adabe8
to
5a446c2
Compare
5a446c2
to
1d573e0
Compare
Hi @lawadr, I created the HLS file using the command stated above. Here are my observations: when
Found a similar issue reported on Is there a use case where media files are encoded using the command above? I'll try and discuss internally if it is fine to not throw an error when a PCE data is already read before as it is may not be mandatory to send PCE data in each frame. Although that will still mean starting the stream from middle (not start) would still fail the playback. |
Not really, because the example command above is to force a PCE for a standard 2.1 audio stream for testing/reproduction purposes. Typically you would see a PCE when the standard 7 channel types enumerated in the ADTS header aren't sufficient. In the original sample stream I provided, the channel setup was:
I've no idea why the second CPE isn't a side CPE (and therefore standard 7.1) as I didn't author the sample, but nevertheless, the likes of VLC handles it fine, and with the fix in this PR so does ExoPlayer+MediaCodec.
Yes. This would unfortunately be the case. Perhaps this could be for later enhancement. For my use case, being able to play from the beginning of the stream is vital. Being able to jump 10s back when you miss something is almost as vital. Being able to start from the middle isn't the most important thing as you can just start from the beginning and skip ahead to where you were. |
is there any update on this issue? |
When the channel config is 0 in the ADTS header, delay the submission of the format to the output until a Program Config Element (PCE) is found and appended to the format's initialization data.
This reproduces the initialization data payload that is submitted by the same AAC stream when inside other containers such as MP4 or MKV.
Fixes issue #695