Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Missing hex and decoded data in .ccd output for old broadcast recording #11

Open
micolous opened this issue Sep 20, 2021 · 1 comment
Open

Comments

@micolous
Copy link

micolous commented Sep 20, 2021

I attempted to run caption-inspector on an old US TV broadcast with CEA-608 captions, reproduced from hls.js demo page: https://playertest.longtailvideo.com/adaptive/captions/playlist.m3u8

I downloaded the recording using youtube-dl, and have attached it to this issue (in a ZIP file so GitHub doesn't try to transcode it): cnn-live.mp4.zip

I was able to play back the downloaded file with captions fine in VLC:

vlcsnap-2021-09-20-11h20m57s499

I ran caption-inspector on Ubuntu 20.04 at commit 476326f, and patched the Makefile to build with gcc 9.3 rather than clang (Issue #13).

I then tried to extract the CEA-608 tracks with:

mkdir /tmp/cnn
./caption-inspector -o /tmp/cnn cnn-live.mp4
cd /tmp/cnn
zip -9 cnn.zip cnn-live*

All outputs I got are as attached: cnn.zip

I got a correct-looking cnn-live-C1.608 with captions from the program:

00:00:00,755 - {RCL} {ENM} {ENM} {R1:C4} {R1:C4} {TO2} {TO2} "BUT HE HAD PURINA CAT CHOW" {R2:C16} {R2:C16} {TO3} {TO3} "INDOOR."
00:00:01,930 - {EOC}

However, cnn-live.ccd appears to have timestamps and fully-decoded data, but appears to be missing "hex data" and "decoded data":

00:00:01,049  
TEXT: Ch1 - "BU" 

00:00:01,091  
TEXT: Ch1 - "T " 

00:00:01,133  
TEXT: Ch1 - "HE" 

00:00:01,175  
TEXT: Ch1 - " H" 

00:00:01,217  
TEXT: Ch1 - "AD" 

I was able to run caption-inspector against a different US broadcast capture which is a little more modern (720p59.94 with CEA-608 and 708 captions) and files created with libcaption's flv+srt tool (which produces possibly-not-quite-valid CEA-608 captions), and I got proper "hex data" and "decoded data":

00:00:01,936  F1:5468  PS:4322  PD:5468  PD:0000  XD:0000    Ch1: "Th"  <-Srvc:01  G0:T|G0:h  ?00?|?00?  _________    Chan-1:  "T"  "h"  <--Seq:1 P006-B02  G0Svc:01|G0Svc:01  ???-0x00|???-0x00  _________________  
              XD:0000  XD:0000  XD:0000  XD:0000  XD:0000    _________  _________  _________  _________  _________    _________________  _________________  _________________  _________________  _________________  
TEXT: Ch1 - "Th" Svc1 - "Th" 

00:00:01,952  F2:8080  XD:0000  XD:0000  XD:0000  XD:0000    F2 - NULL  _________  _________  _________  _________    608: Field 2 NULL  _________________  _________________  _________________  _________________  
              XD:0000  XD:0000  XD:0000  XD:0000  XD:0000    _________  _________  _________  _________  _________    _________________  _________________  _________________  _________________  _________________  

00:00:01,969  F1:E5F2  PS:8322  PD:6572  PD:0000  XD:0000    Ch1: "er"  <-Srvc:01  G0:e|G0:r  ?00?|?00?  _________    Chan-1:  "e"  "r"  <--Seq:2 P006-B02  G0Svc:01|G0Svc:01  ???-0x00|???-0x00  _________________  
              XD:0000  XD:0000  XD:0000  XD:0000  XD:0000    _________  _________  _________  _________  _________    _________________  _________________  _________________  _________________  _________________  
TEXT: Ch1 - "er" Svc1 - "er" 
00:28:17,000  F1:94AE  F1:9420  F1:9140  F1:C7F2  F1:E561    Ch1 {ENM}  Ch1 {RCL}  Ch1 - PAC  Ch1: "Gr"  Ch1: "ea"    Erase NonDisp Mem  ResumeCaptLoading  _Row:01 -  White_  Chan-1:  "G"  "r"  Chan-1:  "e"  "a"  
              F1:F420  F1:F7EF  F1:F26B  F1:AE80  F1:91E0    Ch1: "t "  Ch1: "wo"  Ch1: "rk"  Ch1 - "."  Ch1 - PAC    Chan-1:  "t"  " "  Chan-1:  "w"  "o"  Chan-1:  "r"  "k"  Channel - 1:  "."  _Row:02 -  White_  
              F1:5B4C  F1:6175  F1:6768  F1:F4E5  F1:F25D    Ch1: "[L"  Ch1: "au"  Ch1: "gh"  Ch1: "te"  Ch1: "r]"    Chan-1:  "["  "L"  Chan-1:  "a"  "u"  Chan-1:  "g"  "h"  Chan-1:  "t"  "e"  Chan-1:  "r"  "]"  
TEXT: Ch1 - "Great work.[Laughter]." 
@micolous
Copy link
Author

micolous commented Oct 3, 2021

I'm pretty sure that the issue is triggered by the source file having cc_count < 5. Caption Inspector only tries to print anything if there are at least 5 blocks:

if( lineOut.numElements >= NUM_CC_DATA_ELEMENTS_PER_LINE ) {
if( printNewline == TRUE ) {
writeToFile(ctxPtr->fp, "\n ");
}
for( int iloop = 0; iloop < lineOut.numElements; iloop++ ) {
writeToFile(ctxPtr->fp, "%s ", lineOut.element[iloop].hexStr);
}
writeToFile(ctxPtr->fp, " ");
for( int iloop = 0; iloop < lineOut.numElements; iloop++ ) {
writeToFile(ctxPtr->fp, "%s ", lineOut.element[iloop].tagStr);
}
writeToFile(ctxPtr->fp, " ");
for( int iloop = 0; iloop < lineOut.numElements; iloop++ ) {
writeToFile(ctxPtr->fp, "%s ", lineOut.element[iloop].decStr);
}
lineOut.numElements = 0;
printNewline = TRUE;
}

This then trips an assert later on:

ASSERT(lineOut.numElements == 0 );

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant